ABSTRACT
Multiprocessor SoC systems have led to the increasing use of parallel hardware along with the associated software. These approaches have included coprocessor, homogeneous processor (e.g. SMP) and application specific architectures (i.e. DSP, ASIC). ASIPs have emerged as a viable alternative to conventional processing entities (PEs) due to its configurability and programmability. In this work, we introduce a heterogeneous multi-processor system using ASIPs as processing entities in a pipeline configuration. A streaming application is taken and manually broken into a series of algorithmic stages (each of which make up a stage in a pipeline). We formulate the problem of mapping each algorithmic stage in the system to an ASIP configuration, and propose a heuristic to efficiently search the design space for a pipeline-based multi ASIP system.
We have implemented the proposed heterogeneous multiprocessor methodology using a commercial extensible processor (Xtensa LX from Tensilica Inc.). We have evaluated our system by creating two benchmarks (MP3 and JPEG encoders) which are mapped to our proposed design platform. Our multiprocessor design provided a performance improvement of at least 4.11X (JPEG) and 3.36X (MP3) compared to the single processor design. The minimum cost obtained through our heuristic was within 5.47% and 5.74% of the best possible values for JPEG and MP3 benchmarks respectively.
- Altera Nios Processor. Altera Corp. (http://www.altera.com).Google Scholar
- ARCtangent. ARC International (http://www.arc.com).Google Scholar
- SP-5flex. 3DSP Corp. (http://www.3dsp.com).Google Scholar
- SystemC Initiative. (http://www.systemc.org).Google Scholar
- Xtensa Processor. Tensilica Inc. (http://www.tensilica.com).Google Scholar
- Flix: Fast relief for performance-hungry embedded applications. Tensilica Inc. (http://www.tensilica.com/pdf/FLIX_White_Paper_v2.pdf), 2005.Google Scholar
- J. Axelsson. A Case Study in Heterogeneous Implementation of Automotive Real-Time Systems. In CODES'98, Seattle, 1998.Google Scholar
- S. Banerjee, T. Hamada, P. M. Chau, and R. D. Fellman. Macro Pipelining Based Scheduling on High Performance Heterogeneous Multiprocessor Systems. Signal Processing, IEEE Transactions on, 43(6):1468--1484, 1995. Google ScholarDigital Library
- S. Baruah. Task partitioning upon heterogeneous multiprocessor platforms. In RTAS'04, pages 536--543, 2004. Google ScholarDigital Library
- A. Berić, R. Sethuraman, C. A. Pinto, H. Peters, G. Veldman, P. van de Haar, and M. Duranton. Heterogeneous Multiprocessor for High Definition Video. In ICCE'06, pages 401--402, 2006.Google Scholar
- T. D. Braun, H. J. Siegel, and A. A. Maciejewski. Heterogeneous computing: Goals, methods, and open problems. In HiPC 2001, volume 2228, pages 302--320, Hyderabad, India, 2001. Springer. Google ScholarDigital Library
- K. S. Chatha and R. Vemuri. A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems. In ISSS'98, pages 145--151, Hsinchu, 1998. Google ScholarDigital Library
- CriticalBlue. Coprocessor synthesis - increassing system on chip platform ROL Technical report, CriticalBlue, June 2004.Google Scholar
- T. Givargis, F. Vahid, and J. Henkel. System-Level Exploration for Pareto-Optimal COnfigurations in Parameterized System-on-a-Chip. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 10(4):416--422, 2002. Google ScholarDigital Library
- S. Gopalakrishnan and M. Caccamo. Task Partitioning with Replication upon Heterogeneous Multiprocessor Systems. In RTAS'06, pages 199--207, 2006. Google ScholarDigital Library
- J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 3rd edition, 2003. Google ScholarDigital Library
- J. Jeon and K. Choi. Loop Pipelining in Hardware-Software Partitioning. In ASP-DAC'98, pages 361--366, Yokohama, Japan, 1998.Google Scholar
- G. Kahn. The semantics of a simple language for parallel programming. In IFIP'74, pages 471--475, Stockolm, Sweden, 1974.Google Scholar
- M. Kim, D. Kim, and G. E. Sobelman. MPEG-4 performance analysis for a CDMA network-on-chip. In ICCCAS'05, pages 493--496, 2005.Google Scholar
- T. Kodaka, K. Kimura, and H. Kasahara. Multigrain Parallel Processing for JPEG Encoding on a Single Chip Multiprocessor. In IWIA'02, pages 57--63, 2002. Google ScholarDigital Library
- R. Kumar, D. Tullsen, N. Jouppi, and P. Ranganathan. Heterogeneous Chip Multiprocessors. Computer, 38(11):32--38, November 2005. Google ScholarDigital Library
- D. Pham. The design and implementation of a first-generation cell processor. In ISSCC 2005, pages 184--186. IEEE CS Press, 2005.Google ScholarCross Ref
- F. Salice, L. Del Vecchio, L. Pomante, and W. Fornaciari. Partitioning of Embedded Applications onto Heterogeneous Multiprocessor Architectures. In ACM symposium on Applied computing, pages 661--665, Melbourne, Florida, 2003. Google ScholarDigital Library
- S. L. Shee, A. Erdos, and S. Parameswaran. Heterogeneous Multiprocessor Implementations for JPEG: A Case Study. In CODES+ISSS'06, Seoul, Korea, 2006.Google Scholar
- G. C. Sih and E. A. Lee. Declustering: A New Multiprocessor Scheduling Technique. IEEE Transactions of Parallel and Distributed Systems, 4(6):625--637, 1993. Google ScholarDigital Library
- J. E. Smith and G. S. Sohi. The Microarchitecture of Superscalar Processors. Proceedings of the IEEE, 83(12):1609--1624, 1995.Google ScholarCross Ref
- M. T. J. Strik, A. H. Timmer, J. L. van Meerbergen, and G.-J. van Rootselaar. Heterogeneous multiprocessor for the management of real-time video and graphics streams. Solid-State Circuits, IEEE Journal of, 35(11):1722--1731, 2000.Google Scholar
- F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha. Custom-instruction synthesis for extensible-processor platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 23(2):216--228, 2004. Google ScholarDigital Library
- F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha. Synthesis of Application-specific Heterogeneous Miltiprocessor Architectures using Extensible Processors. In VL-SID'05, pages 551--556, 2005. Google ScholarDigital Library
- V. Živojnović, S. Pees, and H. Myer. LISA-machine description language and generic machine model for HW/SW co-design. In Workshop on VLSI Signal Processing, pages 127--136, 1996.Google ScholarCross Ref
- A. Wieferink, M. Doerper, R. Leupers, G. Ascheid, H. Meyr, T. Kogel, G. Braun, and A. Nohl. System Level Processor/Communication Co-exploration Methodology for Multiprocessor System-on-Chip Platforms. Computers and Digital Techniques, IEE Proceedings, 152(1):3--11, 2005.Google Scholar
- N. Zhang and C.-H. Wu. Study on Adaptive Job Assignment for Multiprocessor Implementation of MPEG2 Video Encoding. Industrial Electronics, IEEE Transactions on, 44(5):726--734, 1997.Google Scholar
Index Terms
- Design methodology for pipelined heterogeneous multiprocessor system
Recommendations
Development Methodology of ASIP Based on Java Byte Code Using HW/SW Co-Design System for Processor Design
ICDCSW '04: Proceedings of the 24th International Conference on Distributed Computing Systems Workshops - W7: EC (ICDCSW'04) - Volume 7To develop an ASIP (Application Specific Instruction setProcessor), development of HW (hardware) and developmentof SWDE (software development environments) arerequired. Separate develops of HW and SWDE in a shorttime are difficult. So HW/SW co-design ...
Dual-pipeline heterogeneous ASIP design
CODES+ISSS '04: Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisIn this paper we demonstrate the feasibility of a dual pipeline Application Specific Instruction Set Processor. We take a C program and create a target instruction set by compiling to a basic instruction set, from which some instructions are merged, ...
Novel architecture for loop acceleration: a case study
CODES+ISSS '05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisIn this paper, we show a novel approach to accelerate loops by tightly coupling a coprocessor to an ASIP. Latency hiding is used to exploit the parallelism available in this architecture. To illustrate the advantages of this approach, we investigate a ...
Comments