Simulink®-based heterogeneous multiprocessor SoC design flow for mixed hardware/software refinement and simulation

doi:10.1016/j.vlsi.2008.08.003

Integration

Volume 42, Issue 2, February 2009, Pages 227-245

https://doi.org/10.1016/j.vlsi.2008.08.003 Get rights and content

Abstract

As a solution for dealing with the design complexity of multiprocessor SoC architectures, we present a joint Simulink-SystemC design flow that enables mixed hardware/software refinement and simulation in the early design process. First, we introduce the Simulink combined algorithm/architecture model (CAAM) unifying the algorithm and the abstract target architecture. From the Simulink CAAM, a hardware architecture generator produces architecture models at three different abstract levels, enabling a trade-off between simulation time and accuracy. A multithread code generator produces memory-efficient multithreaded programs to be executed on the architecture models. To show the applicability of the proposed design flow, we present experimental results on two real video applications.

Introduction

Current embedded systems require flexible and high-performance architectures to concurrently perform multiple applications. An attractive architecture for these systems can be the use of heterogeneous multiprocessor SoC (MPSoC), which provide highly concurrent computation and flexible programmability [1]. Recent platforms such as CT3600™ [2] and Cell™ [3] are examples of heterogeneous multiprocessor architectures with 10–20 heterogeneous processors. A network router CRS-1 [4] based on an array of 192 configurable processor cores also illustrates this trend.

A typical multiprocessor architecture includes a set of CPU and memory subsystems (SS) interconnected via a communication network [5], as depicted in Fig. 1(c). The CPU subsystem includes one or more different kinds of processors (e.g. DSP for data-oriented operations, GPP for control-oriented operations or ASIP for application-specific computation), specific hardware components, and specific I/O communication as shown in Fig. 1(d). The heterogeneity of processors implies the need for multiple software stacks that may require different computation and communication performance. The software stack is organized in three layers as shown in Fig. 1(e): application software, hardware-dependent software (HdS), and the hardware abstraction layer (HAL) [6]. The application software may be a multithreaded application description, which makes use of high-level primitives (HdS API) to abstract the underlying platform. The HdS, which is made up of a thread library and specific I/O communication library, is responsible for providing application software with architecture-independent services (HdS API) such as thread scheduling and communication between different threads. The HAL is responsible for architecture-specific services (HAL API), such as context switching, interrupting service routines, specific hardware components, and specific I/O controls.¹

As the complexity of embedded applications and systems grows, the heterogeneous multiprocessor architecture integrates an increasing number of processors and hardware components. Designing and programming such complex multiprocessor architecture is now becoming a major challenge in the embedded system development. In conventional design approaches, hardware and software are usually considered separately and the hardware/software integration is done only at a late stage of the embedded system design process, when the hardware for the SoC is fully defined. The lack of early coordination between the design of hardware and software causes unacceptable delay and cost overheads. The most effective method for improving design efficiency is raising the level of abstraction to enable design space exploration and hardware/software codesign from the early stages of the SoC design process [7], [8].

SystemC [9] has become the preferred hardware–software codesign language, because it enables one to specify and simulate both software and hardware within a wide range of abstraction levels [10]. ROSES [11] and COSY [12] are examples of SystemC-based hardware/software codesign environments for SoC architecture. The design tools take a high-level SystemC model in which abstract modules embedding a set of functions communicate among themselves via abstract communication channels, and the tools gradually refine this set of functions to a detailed hardware and software architecture. However, these SystemC-based approaches have several limitations. First, a SystemC model is built manually after hardware–software partitioning. This manual build is error-prone and time-consuming, which limits the design space exploration. Second, SystemC is not a popular language for system designers to describe complex systems at an algorithm level of abstraction. Finally, a fixed sequence of functions calls for a module limits software code optimization such as buffer memory minimization. One needs to raise the abstraction level up to the algorithm level in order to overcome the above-mentioned limitations.

On the other side of the automation spectrum, Simulink [13] has been widely adopted as the prevailing environment for modeling and simulating complex systems at an algorithm level of abstraction. Designers can easily build algorithm models by assembling user or predefined functional blocks via a graphical user interface. Real-Time Workshop™ (RTW) and Simulink HDL Coder, tools provided by the environment, can automatically generate software and hardware from the algorithm model, respectively [13]. However, mapping and refining algorithm models onto complete MPSoCs are open issues in the Simulink community.

This paper presents an MPSoC design flow that enables systematic and automated MPSoC design from a high-level specification using the Simulink environment and SystemC language. More specifically, the proposed design flow allows a system designer to specify both a system at an algorithm level of abstraction and also a high level of mixed hardware–software system in Simulink, as shown in Fig. 1(a) and (b), respectively, and then automatically refine it to the targeted MPSoC hardware and software using SystemC, as shown in Fig. 1(c)–(e). In this way, one can have benefits from the use of Simulink during the higher-level modeling and SystemC during the lower levels.

For seamless hardware–software refinement, we first defined a Simulink combined algorithm/architecture model (CAAM) to specify abstract hardware and software architecture. This Simulink CAAM is the first level of mixed hardware–software model. From the Simulink CAAM, a hardware architecture generator produces architecture models at three different abstract levels: (1) virtual architecture for early development and validation of the multithreaded application software, (2) transaction-accurate model for fast verification of hardware architecture and OS library, and (3) virtual prototype for accurate system verification and performance estimation [6]. The design flow is followed by a multithread code generator that builds software stacks executable on the generated architecture models at different abstraction levels from the Simulink CAAM. The multithread code generator applies copy removal and buffer sharing techniques to produce memory efficient thread code [14].

The major contribution of this work is proposing an MPSoC design flow starting from an algorithm-level specification based on mixed hardware–software refinement. The secondary contribution is the memory optimization techniques applied during the multithread code generation and integrated into the proposed design flow. Although the current design flow does not support automatic parallelization techniques, different architectures and algorithm mappings can be manually evaluated at a fraction of the required time for manual work, since designers can easily capture high-level mixed hardware–software models from a Simulink algorithm model by using the graphical user interface and evaluating the generated target MPSoC at the implementation level in a short amount of time. This paper includes experimental results and analysis with a Motion-JPEG decoder and an H.264 video decoder as test cases to show the effectiveness of the proposed design flow.

The paper is organized as follows. Section 2 describes related work on MPSoC design flow. Section 3 explains the proposed design flow and the details of the mixed hardware–software abstraction levels. 4 Hardware architecture generator, 5 Multithread code generator describe the generation of hardware architecture model and multithread code generation, respectively. Section 6 summarizes our experimental results and analysis. Section 7 concludes and highlights directions for future work.

Section snippets

Related work

The MPSoC design environments can be classified according to the system specification language and refinement methodology. Simulink, which supports high-level system specification, simulation, and hardware/software code generation, has been widely used to specify complex systems at an algorithm level of abstraction. However, most tools for Simulink-based design have been good at automatically generating only software for limited architectures or only hardware at the arithmetic level. For

Design flow

Traditional design flow makes use of two separate models: application and architecture. The application is generally specified as a model composed of a set of multiple cooperating threads, each of which performs a subset of the functions of the application. These multiple threads are mapped onto the target architecture, which is specified as a set of processor SS that interact via a communication network. Our design flow allows the system designer to derive these two models in a mixed manner at

Hardware architecture generator

The Hardware architecture generator builds an MPSoC hardware description at each proposed abstraction level in two stages, CPU subsystem generation and system architecture generation, as shown in the flow illustrated in Fig. 8. In the first stage, the Hardware architecture generator produces a set of subsystem models, each of which corresponds to a CPU subsystem in the input Colif CAAM. In the second stage, the Hardware architecture generator produces a system architecture code that

Multithread code generator

The Multithread code generator takes a Colif CAAM, generates a set of software thread codes and builds software stacks executing on the generated hardware architecture. The major challenges are to maximize the efficiency of the generated code while maintaining the flexibility to adapt codes for different processors, communication protocols, and abstraction levels.

Fig. 10 shows the global flow of the multithread code generation that produces set of memory-efficient thread C codes, a main C code

Experimental results

To check the effectiveness of the proposed design flow, we applied it to two real applications: a Motion-JPEG decoder and an H.264 baseline decoder. First, we developed a Simulink algorithm model for the Motion-JPEG decoder and one for the H.264 baseline decoder, and validated their functionalities with the Simulink simulation environment. After that, we transformed these algorithm models into Simulink CAAMs according to the chosen platforms, which are explained in 6.1 Motion-JPEG decoder case,

Conclusion

To cope with the design complexity of MPSoC architecture, we propose a Simulink-SystemC-based design flow. The proposed design flow allows a designer to specify the target system at both an algorithm level and a high-level mixed hardware–software level in Simulink, and then automatically refine it to the detailed hardware and software using SystemC. First, this paper introduces a mixed hardware–software model called Simulink combined architecture application model (CAAM) as the first abstract

Sang-Il Han received the B.S., M.S., and Ph.D. degrees in electrical engineering from the Seoul National University, Seoul, Korea, in 1999, 2001, and 2008, respectively. He is currently with the SoC Division, GCT semiconductor, Seoul, Korea. His research interests include the electronic system level (ESL) design methodologies, communication architecture design, and high-performance multimedia system design.

References (50)

A.A. Jerraya et al.
Guest Editors’ introduction: multiprocessor systems-on-chips
IEEE Comput.
(2005)
Cradle, Inc., CT3600 Family™...
IBM, Inc., Cell™...
Cisco, Inc. CRS-1 carrier router system...
D. Culler et al.
Parallel Computer Architecture: A Hardware/Software Approach
(1998)
A.A. Jerraya, A. Bouchhima, F. Petrot, Programming models and HW–SW interfaces abstraction for multi-processor SoC, in:...
K. Keutzer et al.
System-level design: orthogonalization of concerns and platform-based design
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
(2000)
A.A. Jerraya et al.
Hardware/software interface codesign for embedded systems
Computer
(2005)
T. Grotker et al.
System Design with SystemC
(2002)
Open SystemC Initiative, available at...

W. Cesario et al.

Multiprocessor SoC platforms: a component-based design approach

IEEE Des. Test Comput.

(2002)

J.-Y. Brunel, W.M. Kruijtzer, H.J.H.N. Kenter, F. Petrot, L. Pasquier, E.A. de Kock, W.J.M. Smits, COSY Communication...

Mathworks, Inc....

S.-I. Han, G. Guerin, S.-I. Chae, A.A. Jerraya, Buffer memory optimization for video codec application modeled in...

Dspace, Inc. RTI-MP...

Xilinx, Inc., System Generator...

Altera, Inc. DSP Builder...

J. Ou et al.

Design space exploration using arithmetic level hardware–software co-simulation for configurable multi-processor platforms

ACM Trans. Embed. Comput. Syst.

(2005)

T. Kempf, M. Doerper, R. Leupers, G. Ascheid, H. Meyr, T. Kogel, B. Vanthournout, A modular simulation framework for...

Coware, Inc. ConvergenSC...

Summit Design, Inc., Visual Elite ESC...

Synopsys, Inc., Virtio...

ARM, Inc. RealView MaxSim...

Ptolemy Project, 2006...

S. Ha et al.

Hardware–software codesign of multimedia embedded systems: the PeaCE

Cited by (20)

ASP-based optimized mapping in a simulink-to-MPSoC design flow
2014, Journal of Systems Architecture
Citation Excerpt :
A few works explored methodologies supporting high-level MPSoC design flows. The authors of [12] presented a Simulink-based development platform providing four levels of abstraction. At the highest level, the system is modeled as a set of functions grouped into tasks connected by abstract communication links.
This paper presents an approach to the automated identification of optimal mapping choices in a Simulink-to-MPSoC design flow. The mapping process relies on an appropriately chosen model of computation, capturing the high-level structure of the Simulink application as well as enabling formal checking of several relevant properties, such as boundedness, liveness, as well as throughput and latency formulas. The optimization approach exploits an emerging logic programming language, Answer Set Programming (ASP), for design space exploration. The proposed ASP-based solution can be used in the context of Simulink-to-MPSoC translation as it provides a technique to automate the optimization of design choices aimed at resource utilization and execution time. A case-study and the related experimental results, presented at the end of the paper, demonstrate the effectiveness of the proposed approach.
MPSoC mapping and scheduling approach with multi-grained communication optimizations
2017, Journal of Zhejiang University, Science Edition
Accelerating video and image processing design for FPGA using HDL coder and simulink
2016, 2015 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology, CSUDET 2015
Towards closing the specification gap by integrating algorithm-level and system-level design
2015, Design Automation for Embedded Systems
Software pipeline-based partitioning method with trade-off between workload balance and communication optimization
2015, ETRI Journal
From UML diagrams to Simulink models: A precise and verified translation
2015, Proceedings of the ACM Symposium on Applied Computing

View all citing articles on Scopus

Soo-Ik Chae received the B.S. and M.S. degrees in electrical engineering from the Seoul National University, Seoul, Korea, in 1976 and 1978, respectively, and the Ph.D. degree in electrical engineering from Stanford University, Stanford, California, in 1987. He was an Instructor in the Electronics Department, Korea Air-force Academy, from 1978 to 1982. He worked as a Manager in the ASIC design group of ZyMOS Corporation and Daewoo Telecom from 1987 to 1990. He joined the Inter-University Semi-conductor Research Center and the Department of Electrical Engineering, Seoul National University. His research interests are ultra-low-energy circuits, VLSI system designs, multi-media system design, and system level design methodology. He is also a member of the IEEE.

Lisane Brisolara is graduated from the Catholic University of Pelotas in Computer Science in 1999. She received the M.Sc and Dr. degrees from the Federal University of Rio Grande do Sul, UFRGS, Brazil, in 2002 and 2007, respectively, all in Computer Science. She is presently a professor at the Computer Science Department at the Federal University of Pelotas (UFPel), in charge of Software Engineering and Information Systems disciplines at the undergraduate levels. Her research interests include embedded systems modeling, design, validation, automation and test, and embedded software development.

Luigi Carro received the Dr. degree from the Universidade Federal do Rio Grande do Sul (UFRGS), Brazil, in 1996 after a period working at ST-Microelectronics (1989 to 1991), Agrate, Italy, in the R&D group. He is presently a professor at the Applied Informatics Department at the Informatics Institute of UFRGS. His primary research interests include embedded systems design, validation, automation and test, fault tolerance for future technologies, and rapid system prototyping. He has advised more than 20 graduate students (Master and Dr. levels). He has published more than 150 technical papers on those topics and is the author of the book “Digital Systems Design and Prototyping (2001—in Portuguese) and co-author of Fault-Tolerance Techniques for SRAM-based FPGAs (2006—Springer).

Katalin Popovici received her Engineer Degree in Computer Science from the University of Oradea, Romania in 2004 and her Ph.D. in Micro and Nano Electronics from the Grenoble Institute of Technology, France in 2008. Her research interests include system level modeling and design of MPSoC, programming models, and code generation for embedded multimedia applications. Dr. Katalin Popovici joined The Mathworks, Inc. in April 2008, where she is working as Senior Software Engineer in the Simulink Core development team.

Xavier Guérin received an M.S. degree in Computer Science from the Université Joseph Fourier, Grenoble, France, and is currently a third-year doctorate student at the SLS Group of the TIMA Laboratory, Grenoble, France. His research interests include Operating System architecture, parallel computation, and embedded software design.

Ahmed Jerraya graduated from the University of Tunis in 1980 and the “Docteur Ingénieur”, and the “Docteur d’Etat” degrees from the University of Grenoble in 1983 and 1989, respectively, all in computer sciences. From April 1990 to March 1991, he was a Member of the Scientific Staff at Nortel in Canada. Dr. Jerraya got the grade of Research Director within CNRS (French National Research Center). He was General Chair for DATE Conference in 2001. He is the Director of Strategic Design Programs at CEA-LETI one of the largest European nanoelectronics research institutes.

Kai Huang was born in November 1980. He received BSEE from Nanchang University, China, in 2002. Then he obtained Ph.D. in Engineering Circuit and System from Zhejiang University, China in 2008. Since September 2008, he worked as post-Ph.D. in institute of VLSI design of Zhejiang University. His research interests include processor and SoC system-level design methodology and platform.

Lei LI received the Ph.D. degree in Electronic Systems from Zhejiang University in 2007. He was with the institute of VLSI design, Zhejiang University and now he is affiliated with STMicroelectronics as a design engineer. His research interests include communication systems between multiprocessors, security data transfer for networks on chip, information encryption/decryption algorithms, and digital circuit design.

Xiaolang Yan was born in January 1947. He obtained BSEE and MSEE from Zhejiang University, China, in 1968 and 1981, respectively. From September 1993 to May 1994, he was a Visiting Scholar at Stanford University, Palo Alto, CA, USA. From 1994 to 1999, He was a professor and dean of Hangzhou Institute of Electronic Engineering, Hangzhou, China. From October 1999 to present, he was a professor, dean of Information Science and Engineering College, and director of Institute of VLSI Design, Zhejiang University. Prof. Yan's current research interests include VLSI/SoC design, IC design methodology, and design for manufacturability.

View full text

Simulink®-based heterogeneous multiprocessor SoC design flow for mixed hardware/software refinement and simulation

Abstract

Introduction

Section snippets

Related work

Design flow

Hardware architecture generator

Multithread code generator

Experimental results

Conclusion

Guest Editors’ introduction: multiprocessor systems-on-chips

IEEE Comput.

Parallel Computer Architecture: A Hardware/Software Approach

System-level design: orthogonalization of concerns and platform-based design

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Hardware/software interface codesign for embedded systems

Computer

System Design with SystemC

Multiprocessor SoC platforms: a component-based design approach

IEEE Des. Test Comput.

Design space exploration using arithmetic level hardware–software co-simulation for configurable multi-processor platforms

ACM Trans. Embed. Comput. Syst.

Hardware–software codesign of multimedia embedded systems: the PeaCE

Simulink^®-based heterogeneous multiprocessor SoC design flow for mixed hardware/software refinement and simulation