research-article

Low-power adaptive pipelined MPSoCs for multimedia: an H.264 video encoder case study

Authors:

Muhammad Shafique,

Sri Parameswaran,

Jörg HenkelAuthors Info & Claims

DAC '11: Proceedings of the 48th Design Automation Conference

Pages 1032 - 1037

https://doi.org/10.1145/2024724.2024951

Published: 05 June 2011 Publication History

Abstract

Pipelined MPSoCs provide a high throughput implementation platform for multimedia applications, with reduced design time and improved flexibility. Typically a pipelined MPSoC is balanced at design-time using worst-case parameters. Where there is a widely varying workload, such designs consume exorbitant amount of power. In this paper, we propose a novel adaptive pipelined MPSoC architecture that adapts itself to varying workloads. Our architecture consists of Main Processors and Auxiliary Processors with a distributed run-time balancing approach, where each Main Processor, independent of other Main Processors, decides for itself the number of required Auxiliary Processors at run-time depending on its varying workload. The proposed run-time balancing approach is based on off-line statistical information along with workload prediction and run-time monitoring of current and previous workloads' execution times. We exploited the adaptability of our architecture through a case study on an H.264 video encoder supporting HD720p at 30 fps, where clock- and power-gating were used to deactivate idle Auxiliary Processors during low workload periods. The results show that an adaptive pipelined MPSoC provides energy savings of up to 34% and 40% for clock- and power-gating based deactivation of Auxiliary Processors respectively with a minimum throughput of 29 fps when compared to a design-time balanced pipelined MPSoC.

References

[1]

S. L. Shee, A. Erdos, and S. Parameswaran, "Heterogeneous multiprocessor implementations for jpeg:: a case study," in CODES+ISSS '06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, (New York, NY, USA), pp. 217--222, ACM, 2006.

Digital Library

[2]

S. Carta, A. Alimonda, A. Pisano, A. Acquaviva, and L. Benini, "A control theoretic approach to energy-efficient pipelined computation in mpsocs," ACM Trans. Embedded Comput. Syst., vol. 6, no. 4, 2007.

Digital Library

[3]

S. L. Shee and S. Parameswaran, "Design methodology for pipelined heterogeneous multiprocessor system," in DAC '07: Proceedings of the 44th annual conference on Design automation, (New York, NY, USA), pp. 811--816, ACM, 2007.

Digital Library

[4]

H. Javaid and S. Parameswaran, "A design flow for application specific heterogeneous pipelined multiprocessor systems," in DAC '09: Proceedings of the 46th Annual Design Automation Conference, (New York, NY, USA), pp. 250--253, ACM, 2009.

Digital Library

[5]

Tensilica, "Xtensa Customizable Processor." http://www.tensilica.com.

[6]

Altera, "Nios Processor." http://www.altera.com.

[7]

ARC, "ARC 600 and 700 Core Families." http://www.arc.com.

[8]

"H.264: Advanced video coding for generic audiovisual services." Available at: http://www.itu.int/.

[9]

"Avs: Audio video coding standard workgroup of china." Available at: http://www.avs.org.cn/en/.

[10]

"Vc1 technical overview." Available at: http://www.microsoft.com/.

[11]

"Tensilica." Tensilica Inc. (http://www.tensilica.com).

[12]

T. Kodaka, K. Kimura, and H. Kasahara, "Multigrain parallel processing for jpeg encoding on a single chip multiprocessor," in IWIA '02: Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA '02), IEEE Computer Society, 2002.

Digital Library

[13]

S. Banerjee, T. Hamada, P. Chau, and R. Fellman, "Macro pipelining based scheduling on high performance heterogeneous multiprocessor systems," Signal Processing, IEEE Transactions on, vol. 43, no. 6, pp. 1468--1484, 1995.

Digital Library

[14]

J. Jeon and K. Choi, "Loop pipelining in hardware-software partitioning," in Asia and South Pacific Design Automation Conference, pp. 361--366, 1998.

[15]

J. DeSouza-Batista and A. Parker, "Optimal synthesis of application specific heterogeneous pipelined multiprocessors," Application Specific Array Processors, 1994. Proceedings., International Conference on, pp. 99--110, 22--24 Aug 1994.

[16]

S.-R. Kuang, C.-Y Chen, and R.-Z. Liao, "Partitioning and pipelined scheduling of embedded system using integer linear programming," in ICPADS '05: Proceedings of the 11th International Conference on Parallel and Distributed Systems - Workshops (ICPADS '05), (Washington, DC, USA), pp. 37--41, IEEE Computer Society, 2005.

Digital Library

[17]

S. Bakshi and D. D. Gajski, "Partitioning and pipelining for performance-constrained hardware/software systems," IEEE Trans. VLSI Syst., vol. 7, no. 4, pp. 419--432, 1999.

Digital Library

[18]

A. Tumeo, M. Branca, L. Camerini, M. Ceriani, M. Monchiero, G. Palermo, F. Ferrandi, and D. Sciuto, "Prototyping pipelined applications on a heterogeneous fpga multiprocessor virtual platform," in ASP-DAC '09: Proceedings of the 2009 Asia and South Pacific Design Automation Conference, 2009.

Digital Library

[19]

I. Karkowski and H. Corporaal, "Design of heterogenous multi-processor embedded systems: applying functional pipelining," in PACT '97: Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques, IEEE Computer Society, 1997.

Digital Library

[20]

A. Alimonda, S. Carta, A. Acquaviva, A. Pisano, and L. Benini, "A feedback-based approach to dvfs in data-flow applications," IEEE Trans. on CAD of Integrated Circuits and Systems, vol. 28, no. 11, pp. 1691--1704, 2009.

Digital Library

[21]

M. Shafique, L. Bauer, and J. Henkel, "enbudget: A run-time adaptive predictive energy-budgeting scheme for energy-aware motion estimation in h.264/mpeg-4 avc video encoder," in DATE, pp. 1725--1730, 2010.

Digital Library

[22]

Tensilica, "Flix: Fast relief for performance-hungry embedded applications." http://www.tensilica.com/pdf/FLIX_White_Paper_v2.pdf, 2005.

[23]

Tensilica, "XPRES Generated Specialized Operations." http://tensilica.com/pdf/XPRES%201205.pdf, 2005.

[24]

J. Leverich, M. Monchiero, V. Talwar, P. Ranganathan, and C. Kozyrakis, "Power management of datacenter workloads using per-core power gating," Computer Architecture Letters, vol. 8, pp. 48--51, feb. 2009.

Digital Library

[25]

T. Tuan, A. Rahman, S. Das, S. Trimberger, and S. Kao, "A 90-nm low-power fpga for battery-powered applications," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 26, pp. 296--300, feb. 2007.

Digital Library

[26]

T.-C. Chen, C.-J. Lian, and L.-G. Chen, "Hardware architecture design of an h.264/avc video codec," in Proceedings of the 2006 Asia and South Pacific Design Automation Conference, ASP-DAC '06, IEEE Press, 2006.

Digital Library

[27]

M. Shafique, L. Bauer, and J. Henkel, "3-tier dynamically adaptive power-aware motion estimator for h.264/avc video encoding," in ISLPED, pp. 147--152, 2008.

Digital Library

[28]

"H.264 test video sequences." Available at: http://media.xiph.org/video/derf/.

Cited By

Shafique MPrabakaran B(2024)Architectures for Multimedia Processing: A Cross-Layer PerspectiveHandbook of Computer Architecture10.1007/978-981-97-9314-3_7(215-236)Online publication date: 21-Dec-2024
https://doi.org/10.1007/978-981-97-9314-3_7
Shafique MPrabakaran B(2022)Architectures for Multimedia Processing: A Cross-Layer PerspectiveHandbook of Computer Architecture10.1007/978-981-15-6401-7_7-1(1-22)Online publication date: 27-Mar-2022
https://doi.org/10.1007/978-981-15-6401-7_7-1
Gnanasambandapillai VBayat AParameswaran SShin Y(2018)MESGAProceedings of the 23rd Asia and South Pacific Design Automation Conference10.5555/3201607.3201619(52-57)Online publication date: 22-Jan-2018
https://dl.acm.org/doi/10.5555/3201607.3201619
Show More Cited By

Index Terms

Low-power adaptive pipelined MPSoCs for multimedia: an H.264 video encoder case study

Recommendations

Design of adaptive communication channel buffers for low-power area-efficient network-on-chip architecture
ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems

Network-on-Chip (NoC)architectures provide a scalable solution to the wire delay constraints in deep submicron VLSI designs. Recent research into the ptimization of NoC architectures has shown that the design of buffers in the NoC routers influences the ...
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach
Special issue: ACM great lakes symposium on VLSI

The paper introduces a dynamic branch prediction scheme suitable for energy-aware Very Long Instruction Word (VLIW) processors. The proposed technique is based on a compiler hint mechanism to filter the accesses to the branch predictor blocks. We define ...
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor

Low-power embedded processors utilize compact instruction encodings to achieve small code size. Such encodings place tight restrictions on the number of bits available to encode operand specifiers and, thus, on the number of architected registers. As a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '11: Proceedings of the 48th Design Automation Conference

June 2011

1055 pages

ISBN:9781450306362

DOI:10.1145/2024724

General Chair:
Leon Stok
IBM Corp., Hopewell Jct., NY
,
Program Chairs:
Nikil Dutt
Univ. of California, Irvine, CA
,
Soha Hassoun
Tufts Univ., Medford, MA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE Council on Electronic Design Automation (CEDA)

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAC '11

Sponsor:

EDAC
SIGDA

DAC '11: The 48th Annual Design Automation Conference 2011

June 5 - 10, 2011

California, San Diego

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
147
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shafique MPrabakaran B(2024)Architectures for Multimedia Processing: A Cross-Layer PerspectiveHandbook of Computer Architecture10.1007/978-981-97-9314-3_7(215-236)Online publication date: 21-Dec-2024
https://doi.org/10.1007/978-981-97-9314-3_7
Shafique MPrabakaran B(2022)Architectures for Multimedia Processing: A Cross-Layer PerspectiveHandbook of Computer Architecture10.1007/978-981-15-6401-7_7-1(1-22)Online publication date: 27-Mar-2022
https://doi.org/10.1007/978-981-15-6401-7_7-1
Gnanasambandapillai VBayat AParameswaran SShin Y(2018)MESGAProceedings of the 23rd Asia and South Pacific Design Automation Conference10.5555/3201607.3201619(52-57)Online publication date: 22-Jan-2018
https://dl.acm.org/doi/10.5555/3201607.3201619
Gnanasambandapillai VBayat AParameswaran S(2018)MESGA: An MPSoC based embedded system solution for short read genome alignment2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASPDAC.2018.8297282(52-57)Online publication date: Jan-2018
https://doi.org/10.1109/ASPDAC.2018.8297282
Chen GHuang KBuckl CKnoll A(2015)Applying Pay-Burst-Only-Once Principle for Periodic Power Management in Hard Real-Time Pipelined Multiprocessor SystemsACM Transactions on Design Automation of Electronic Systems10.1145/269986520:2(1-27)Online publication date: 2-Mar-2015
https://dl.acm.org/doi/10.1145/2699865
Fan YHuang LBai YZeng X(2015)A Parallel-Access Mapping Method for the Data Exchange Buffers Around DCT/IDCT in HEVC Encoders Based on Single-Port SRAMsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2015.246891562:12(1139-1143)Online publication date: Dec-2015
https://doi.org/10.1109/TCSII.2015.2468915
Shwe SBatra KYachide YPeddersen JParameswaran S(2015)RAPITIMATEProceedings of the 2015 33rd IEEE International Conference on Computer Design (ICCD)10.1109/ICCD.2015.7357175(635-642)Online publication date: 18-Oct-2015
https://dl.acm.org/doi/10.1109/ICCD.2015.7357175
Xi Zhang Javaid HShafique MAmbrose JHenkel JParameswaran S(2015)ADAPT: An adaptive manycore methodology for software pipelined applicationsThe 20th Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2015.7059092(701-706)Online publication date: Jan-2015
https://doi.org/10.1109/ASPDAC.2015.7059092
Belhadj NBahri NMarrakchi ZBen Ayed MMasmoudi NMehrez H(2015)H.264/AVC high definition intra coding implementation on multiprocessor system on chip technology architectureIET Computers & Digital Techniques10.1049/iet-cdt.2014.01519:5(259-267)Online publication date: Sep-2015
https://doi.org/10.1049/iet-cdt.2014.0151
Belhadj NMarrakchi ZBen Ayed MMasmoudi NMehrez H(2014)MPSoC Architecture for Macro Blocks Line Partitioning of H.264/AVC EncoderInternational Journal of Embedded and Real-Time Communication Systems10.4018/ijertcs.20140401045:2(57-74)Online publication date: 1-Apr-2014
https://dl.acm.org/doi/10.4018/ijertcs.2014040104
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten