ABSTRACT
In this paper we present a design space exploration flow to achieve energy efficiency for streaming applications on MPSoCs while meeting the specified throughput constraints. The public domain simulators Sim-Panalyzer and Cacti are used to estimate the energy dissipations of the parameterized architectural components. As the main contributions, we schedule the streaming applications on a multi-clock synchronous modeling framework, guarantee the application timing properties by throughput analysis, and customize both processor voltage-frequency levels and memory sizes in the design space to optimize the application pipeline parallelism for energy efficiency. Two widely used heuristic algorithms (i.e., greedy and taboo search) are used during the design optimization process. Our experiments show an energy reduction of 21% without any loss in application throughput compared with an ad-hoc approach.
- ARM Ltd. http://www.arm.com.Google Scholar
- The SimpleScalar-RM power modeling project. http://www.eecs.umich.edu/~panalyzer/.Google Scholar
- L. Benini, M. Ferrero, A. Macii, E. Macii, and M. Poncino. Supporting system-level power exploration for DSP applications. In GLSVLSI '00, pages 17--22, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
- F. Boussinot and R. De Simone. The ESTEREL language. Proceedings of the IEEE, 79(9):1293--1304, September 1991.Google ScholarCross Ref
- D. Cvijovic and J. Klinowski. Taboo Search: An Approach to the Multiple Minima Problem. Science, 267:664--666, Feb. 1995.Google ScholarCross Ref
- M. Duranton. The challenges for high performance embedded systems. In DSD '06, pages 3--7, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- C. Erbas. System-Level Modeling and Design Space Exploration for Multiprocessor Embedded System-on-Chip Architectures. PhD thesis, 2006.Google Scholar
- M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. Source ACM SIGOPS Operating Systems Review archive, 40(5):151--162, 2006. Google ScholarDigital Library
- R. Govindarajan, G. R. Gao, and P. Desai. Minimizing buffer requirements under rate-optimal schedule in regular dataflow networks. Journal of VLSI Signal Processing, 31(3):207--229, July 2002. Google ScholarDigital Library
- P. L. Guernic, J. Talpin, and J. L. Lann. Polychrony for system design. Journal of Circuits, Systems and Computers. Special Issue on Application Specific Hardware Design, 2002.Google Scholar
- N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data flow programming language LUSTRE. Proceedings of the IEEE, 79(9):1305--1320, September 1991.Google ScholarCross Ref
- J. Hu and R. Marculescu. Energy-aware mapping for tile-based noc architectures under performance constraints. In ASPDAC '03, pages 233--239, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- A. Jantsch and I. Sander. Models of computation and languages for embedded system design. In IEE Proceedings on Computers and Digital Techniques, pages 114--129, 2005.Google ScholarCross Ref
- P. Le Guernic, T. Gautier, M. Le Borgne, and C. Le Marie. Programming real-time applications with SIGNAL. Proceedings of the IEEE, 79(9):1321--1335, September 1991.Google ScholarCross Ref
- E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, C-36(1):24--35, January 1987. Google ScholarDigital Library
- E. A. Lee and T. M. Parks. Dataflow process networks. IEEE Proceedings, 83(5):773--799, May 1995.Google ScholarCross Ref
- E. A. Lee and A. Sangiovanni-Vincentelli. A framework for comparing models of computation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17(12):1217--1229, December 1998. Google ScholarDigital Library
- M. Millberg, E. Nilsson, R. Thid, and A. Jantsch. Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip. In DATE '04, page 20890, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
- I. Sander and A. Jantsch. System modeling and transformational design refinement in ForSyDe. IEEE Trans. on CAD of Integrated Circuits and Systems, 23(1):17--32, 2004. Google ScholarDigital Library
- S. Sriram and S. S. Bhattacharyya. Embedded multiprocessors: Scheduling and synchronization. CRC Press, 2000. Google ScholarDigital Library
- S. Stuijk, M. Geilen, and T. Basten. Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs. In DAC '06, pages 899--904, CA, USA, July 2006. Google ScholarDigital Library
- S. wei Liao, Z. Du, G. Wu, and G.-Y. Lueh. Data and computation transformations for Brook streaming applications on multiprocessors. In CGO '06, pages 196--207, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- S. J. Wilton and N. P. Jouppi. Cacti: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits, 31(5):677--688, 1996.Google ScholarCross Ref
Index Terms
- Energy efficient streaming applications with guaranteed throughput on MPSoCs
Recommendations
Energy-Efficient Multiprocessor Systems-on-Chip for Embedded Computing: Exploring Programming Models and Their Architectural Support
In today's multiprocessor SoCs (MPSoCs), parallel programming models are needed to fully exploit hardware capabilities and to achieve the 100 Gops/W energy efficiency target required for Ambient Intelligence Applications. However, mapping abstract ...
Efficient Data Migration to Conserve Energy in Streaming Media Storage Systems
Reducing energy consumption has been an important design issue for large-scale streaming media storage systems. Existing energy conservation techniques are inadequate to achieve high energy efficiency for streaming media computing environments due to ...
Which On-Chip Interconnection Network for 16-core MPSoCs?
CISIS '10: Proceedings of the 2010 International Conference on Complex, Intelligent and Software Intensive SystemsOn-chip interconnection networks (OCINs) in many-core systems are key to the system’s performance scalability. OCIN design constraints are governed by power, cost, latency, ease of routing, as well as others. As chips with 16 cores are around the corner,...
Comments