ABSTRACT
Pipelined MPSoCs provide a high throughput implementation platform for multimedia applications, with reduced design time and improved flexibility. Typically a pipelined MPSoC is balanced at design-time using worst-case parameters. Where there is a widely varying workload, such designs consume exorbitant amount of power. In this paper, we propose a novel adaptive pipelined MPSoC architecture that adapts itself to varying workloads. Our architecture consists of Main Processors and Auxiliary Processors with a distributed run-time balancing approach, where each Main Processor, independent of other Main Processors, decides for itself the number of required Auxiliary Processors at run-time depending on its varying workload. The proposed run-time balancing approach is based on off-line statistical information along with workload prediction and run-time monitoring of current and previous workloads' execution times. We exploited the adaptability of our architecture through a case study on an H.264 video encoder supporting HD720p at 30 fps, where clock- and power-gating were used to deactivate idle Auxiliary Processors during low workload periods. The results show that an adaptive pipelined MPSoC provides energy savings of up to 34% and 40% for clock- and power-gating based deactivation of Auxiliary Processors respectively with a minimum throughput of 29 fps when compared to a design-time balanced pipelined MPSoC.
- S. L. Shee, A. Erdos, and S. Parameswaran, "Heterogeneous multiprocessor implementations for jpeg:: a case study," in CODES+ISSS '06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, (New York, NY, USA), pp. 217--222, ACM, 2006. Google ScholarDigital Library
- S. Carta, A. Alimonda, A. Pisano, A. Acquaviva, and L. Benini, "A control theoretic approach to energy-efficient pipelined computation in mpsocs," ACM Trans. Embedded Comput. Syst., vol. 6, no. 4, 2007. Google ScholarDigital Library
- S. L. Shee and S. Parameswaran, "Design methodology for pipelined heterogeneous multiprocessor system," in DAC '07: Proceedings of the 44th annual conference on Design automation, (New York, NY, USA), pp. 811--816, ACM, 2007. Google ScholarDigital Library
- H. Javaid and S. Parameswaran, "A design flow for application specific heterogeneous pipelined multiprocessor systems," in DAC '09: Proceedings of the 46th Annual Design Automation Conference, (New York, NY, USA), pp. 250--253, ACM, 2009. Google ScholarDigital Library
- Tensilica, "Xtensa Customizable Processor." http://www.tensilica.com.Google Scholar
- Altera, "Nios Processor." http://www.altera.com.Google Scholar
- ARC, "ARC 600 and 700 Core Families." http://www.arc.com.Google Scholar
- "H.264: Advanced video coding for generic audiovisual services." Available at: http://www.itu.int/.Google Scholar
- "Avs: Audio video coding standard workgroup of china." Available at: http://www.avs.org.cn/en/.Google Scholar
- "Vc1 technical overview." Available at: http://www.microsoft.com/.Google Scholar
- "Tensilica." Tensilica Inc. (http://www.tensilica.com).Google Scholar
- T. Kodaka, K. Kimura, and H. Kasahara, "Multigrain parallel processing for jpeg encoding on a single chip multiprocessor," in IWIA '02: Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA '02), IEEE Computer Society, 2002. Google ScholarDigital Library
- S. Banerjee, T. Hamada, P. Chau, and R. Fellman, "Macro pipelining based scheduling on high performance heterogeneous multiprocessor systems," Signal Processing, IEEE Transactions on, vol. 43, no. 6, pp. 1468--1484, 1995. Google ScholarDigital Library
- J. Jeon and K. Choi, "Loop pipelining in hardware-software partitioning," in Asia and South Pacific Design Automation Conference, pp. 361--366, 1998.Google Scholar
- J. DeSouza-Batista and A. Parker, "Optimal synthesis of application specific heterogeneous pipelined multiprocessors," Application Specific Array Processors, 1994. Proceedings., International Conference on, pp. 99--110, 22--24 Aug 1994.Google Scholar
- S.-R. Kuang, C.-Y Chen, and R.-Z. Liao, "Partitioning and pipelined scheduling of embedded system using integer linear programming," in ICPADS '05: Proceedings of the 11th International Conference on Parallel and Distributed Systems - Workshops (ICPADS '05), (Washington, DC, USA), pp. 37--41, IEEE Computer Society, 2005. Google ScholarDigital Library
- S. Bakshi and D. D. Gajski, "Partitioning and pipelining for performance-constrained hardware/software systems," IEEE Trans. VLSI Syst., vol. 7, no. 4, pp. 419--432, 1999. Google ScholarDigital Library
- A. Tumeo, M. Branca, L. Camerini, M. Ceriani, M. Monchiero, G. Palermo, F. Ferrandi, and D. Sciuto, "Prototyping pipelined applications on a heterogeneous fpga multiprocessor virtual platform," in ASP-DAC '09: Proceedings of the 2009 Asia and South Pacific Design Automation Conference, 2009. Google ScholarDigital Library
- I. Karkowski and H. Corporaal, "Design of heterogenous multi-processor embedded systems: applying functional pipelining," in PACT '97: Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques, IEEE Computer Society, 1997. Google ScholarDigital Library
- A. Alimonda, S. Carta, A. Acquaviva, A. Pisano, and L. Benini, "A feedback-based approach to dvfs in data-flow applications," IEEE Trans. on CAD of Integrated Circuits and Systems, vol. 28, no. 11, pp. 1691--1704, 2009. Google ScholarDigital Library
- M. Shafique, L. Bauer, and J. Henkel, "enbudget: A run-time adaptive predictive energy-budgeting scheme for energy-aware motion estimation in h.264/mpeg-4 avc video encoder," in DATE, pp. 1725--1730, 2010. Google ScholarDigital Library
- Tensilica, "Flix: Fast relief for performance-hungry embedded applications." http://www.tensilica.com/pdf/FLIX_White_Paper_v2.pdf, 2005.Google Scholar
- Tensilica, "XPRES Generated Specialized Operations." http://tensilica.com/pdf/XPRES%201205.pdf, 2005.Google Scholar
- J. Leverich, M. Monchiero, V. Talwar, P. Ranganathan, and C. Kozyrakis, "Power management of datacenter workloads using per-core power gating," Computer Architecture Letters, vol. 8, pp. 48--51, feb. 2009. Google ScholarDigital Library
- T. Tuan, A. Rahman, S. Das, S. Trimberger, and S. Kao, "A 90-nm low-power fpga for battery-powered applications," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 26, pp. 296--300, feb. 2007. Google ScholarDigital Library
- T.-C. Chen, C.-J. Lian, and L.-G. Chen, "Hardware architecture design of an h.264/avc video codec," in Proceedings of the 2006 Asia and South Pacific Design Automation Conference, ASP-DAC '06, IEEE Press, 2006. Google ScholarDigital Library
- M. Shafique, L. Bauer, and J. Henkel, "3-tier dynamically adaptive power-aware motion estimator for h.264/avc video encoding," in ISLPED, pp. 147--152, 2008. Google ScholarDigital Library
- "H.264 test video sequences." Available at: http://media.xiph.org/video/derf/.Google Scholar
Index Terms
- Low-power adaptive pipelined MPSoCs for multimedia: an H.264 video encoder case study
Recommendations
Design of adaptive communication channel buffers for low-power area-efficient network-on-chip architecture
ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systemsNetwork-on-Chip (NoC)architectures provide a scalable solution to the wire delay constraints in deep submicron VLSI designs. Recent research into the ptimization of NoC architectures has shown that the design of buffers in the NoC routers influences the ...
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach
Special issue: ACM great lakes symposium on VLSIThe paper introduces a dynamic branch prediction scheme suitable for energy-aware Very Long Instruction Word (VLIW) processors. The proposed technique is based on a compiler hint mechanism to filter the accesses to the branch predictor blocks. We define ...
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor
Low-power embedded processors utilize compact instruction encodings to achieve small code size. Such encodings place tight restrictions on the number of bits available to encode operand specifiers and, thus, on the number of architected registers. As a ...
Comments