The Sesame modeling and simulation environment
Our system-level modeling and simulation environment is called Sesame. It facilitates performance analysis of embedded systems architectures according to the increasingly popular Y-chart design approach. This means that we recognize separate application and architecture models within a system simulation. An application model describes the functional behavior of an application, including both computation and communication behavior. An architecture model defines architecture resources and captures their performance constraints. After explicitly mapping an application model onto an architecture model, they are co-simulated via trace-driven simulation. This allows for evaluation of the system performance of a particular application, mapping, and underlying architecture. Essential in this modeling methodology is that an application model is independent from architectural specifics, assumptions on hardware/software partitioning, and timing characteristics. As a result, a single application model can be used to exercise different hardware/software partitionings and can be mapped onto a range of architecture models, possibly representing different system architectures or simply modeling the same system architecture at various levels of abstraction. The layered infrastructure of Sesame is shown below.
For application modeling, Sesame uses the Kahn Process Network (KPN) model of computation in which parallel processes communicate with each other via unbounded FIFO channels. In the Kahn paradigm, reading from channels is done in a blocking manner, while writing is non-blocking. We use KPNs for application modeling because they nicely fit with the targeted media-processing application domain and they are deterministic. The latter implies that the same application input always results in the same application output, irrespective of the scheduling of the KPN processes. This provides us with a lot of scheduling freedom when mapping KPN processes onto architecture models for quantitative performance analysis.
The workload of an application is captured by (partly manually) instrumenting the code of each Kahn process with annotations that describe the application's computational and communication actions. By executing the Kahn model, these annotations cause the Kahn processes to generate traces of application events which subsequently drive the underlying architecture model. There are three types of application events: the communication events read and write, and the computational event execute. These application events typically are coarse grained, such as execute(DCT) or read(pixel-block,channel_id).
To execute Kahn application models, and thereby generate the application events that represent the workload imposed on the architecture, Sesame features a process network execution engine supporting Kahn semantics. This execution engine runs the Kahn processes, which are written in C++, as separate threads using the Pthreads package. To allow for rapid creation and modification of models, the structure of the application models (i.e., which processes are used in the model and how they are connected to each other) is not hard-coded in the C++ implementation of the processes. Instead, it is described in a language called YML (Y-chart Modeling Language). YML is an XML-based language which also contains built-in scripting support. This allows for loop-like constructs, mapping and connectivity functions, and so on, which facilitate the description of large and complex models. In addition, it enables the creation of libraries of parameterized YML component descriptions that can be instantiated with the appropriate parameters, thereby fostering re-use of component descriptions. To simplify the use of YML even further, a YML editor has also been developed to compose model descriptions using a GUI. The figure below gives an impression of the YML editor's GUI, showing its layered layout (in correspondance to Figure 1), and illustrating hierarchical decomposition of components
Architecture models in Sesame, which typically operate at the so-called transaction level, simulate the performance consequences of the computationand communication events generated by an application model. These architecturemodels solely account for architectural performance constraints and do not need to model functional behavior. This is possible because the functional behavior is already captured in the application models, which subsequently drive the architecturesimulation. An architecture model is constructed from generic buildingblocks provided by a library, which contains template performance models forprocessing cores, communication media (like busses) and various types of memory. The structure of architecture models -- specifying which building blocks are used from the library and the way they are connected -- is also described in YML.
Sesame's architecture models are implemented using either Pearl or SystemC. Pearl is a small but powerful discrete-event simulation language which provides easy construction of the models and fast simulation. For our SystemC architecture models, we provide an add-on library to SystemC, called SCPEx (SystemC Pearl Extension), which extends SystemC's programming model with Pearl's message-passing paradigm and which provides SystemC with YML support. SCPEx raises the abstraction level of SystemC models, thereby reducing the modeling effort required for developing transaction-level architecture models and making the modeling process less prone to programming errors.
To map Kahn processes (i.e., their event traces) from an application model onto architecture model components and to support the scheduling of application events from different event traces when multiple Kahn processes are mapped onto a single architecture component (e.g., a programmable processor), Sesame provides an intermediate mapping layer. This layer consists of virtual processor components and FIFO buffers for communication between the virtual processors. There is a one-to-one relationship between the Kahn processes in the application model and the virtual processors in the mapping layer. This is also true for the Kahn channels and the FIFO buffers in the mapping layer, except for the fact that the latterare limited in size. Their size is parameterized and dependent on the modeled architecture. As the structure of the mapping layer closely resembles the structure of the application model under investigation, Sesame provides a tool that is able to automatically generate the mapping layer from the YML description of an application model.
A virtual processor in the mapping layer reads in an application trace from a Kahn process via a trace event queue and dispatches the events to a processing component in the architecture model. The mapping of a virtual processor onto a processing component in the architecture model is freely adjustable, facilitated by the fact that the mapping layer and its mapping onto the architecture model are also described in YML (and manipulated using the YML editor, see above). Communication channels -- i.e., the buffers in the mapping layer -- are also mapped onto the architecture model. In the example above, one buffer is placed in shared memory while the other buffer is mapped onto a point-to-point FIFO channel between processors 1 and 2.
The mechanism used to dispatch application events from a virtual processor to an architecture model component guarantees deadlock-free scheduling of the application events from different event traces. In this mechanism, computation events are always directly dispatched by a virtual processor to the architecture component onto which it is mapped. The latter schedules incoming events that originate from different event queues according to a given policy (FCFS, round-robin, or customized) and subsequently models their timing consequences. Communication events, however, are not directly dispatched to the underlying architecture model. Instead, a virtual processor that receives a communication event first consults the appropriate buffer at the mapping layer to check whether or not the communication is safe to take place so that no deadlock can occur. Only if it is found to be safe (i.e., for read events the data should be available and for write events there should be room in the target buffer), then communication events may be dispatched. As long as a communication event cannot be dispatched, the virtual processor blocks. This is possible because the mapping layer executes in the same simulation as the architecture model. Therefore, both the mapping layer and the architecture model share the same simulation-time domain. This also implies that each time a virtual processor dispatches an application event (either computation or communication) to an architecture model component, the virtual processor is blocked in simulated time until the event's latency has been simulated by the architecture model. In other words, virtual processors can be seen as abstract representations of application processes at the system architecture level.
When architecture model components are gradually refined to disclose more implementation details, Sesame follows an approach in which the virtual processors at the mapping layer are also refined. The latter is done by incorporating dataflow graphs in virtual processors such that it allows us to perform architectural simulation at multiple levels of abstraction without modifying the application model. The example above illustrates this dataflow-based refinement by refining the virtual processor for process B with a fictive dataflow graph. In this approach, the application event traces specify what a virtual processor executes and with whom it communicates, while the internal dataflow graph of a virtual processor specifies how the computations and communications take place at the architecture level. In our publications, we provide more insight on how this refinement approach works by explaining the relation between trace transformations for refinement and dataflow actors at the mapping layer.
Mapping support using multi-objective optimization
To facilitate effective design space exploration, Sesame provides some (initial) support for finding promising candidate application-to-architecture mappings to guide a designer during the system-level simulation stage. To this end, we have developed a mathematical model that captures several trade-offs faced during the process of mapping. In this model, we take into account the computational and communication demands of an application as well as the properties of an architecture, in terms of computational and communication performance, power consumption, and cost. The resulting trade-offs with respect to performance, power consumption and cost are formulated as a multi-objective combinatorial optimization problem. Using an optimization software tool, which is based on a widely-known evolutionary algorithm, the mapping decision problem is solved by providing the designer with a set of approximated Pareto-optimal mapping solutions that can be further evaluated using system-level simulation.