Article

A low power front-end for embedded processors using a block-aware instruction set

Authors:
Ahmad Zmily

Stanford University

Stanford University
View Profile

,
Christos Kozyrakis

Stanford University

Stanford University
View Profile

CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systemsSeptember 2007Pages 267–276https://doi.org/10.1145/1289881.1289926

Published:30 September 2007Publication History

CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems

Pages 267–276

ABSTRACT

Energy, power, and area efficiency are critical design concerns for embedded processors. Much of the energy of a typical embedded processor is consumed in the front-end since instruction fetching happens on nearly every cycle and involves accesses to large memory arrays such as instruction and branch target caches. The use of small front-end arrays leads to significant power and area savings, but typically results in significant performance degradation. This paper evaluates and compares optimizations that improve the performance of embedded processors with small front-end caches. We examine both software techniques, such as instruction re-ordering and selective caching, and hardware techniques, such as instruction prefetching, tagless instruction cache, and unified caches for instruction and branch targets. We demonstrate that, building on top of a block-aware instruction set, these optimizations can eliminate the performance degradation due to small front-end caches. Moreover, selective combinations of these optimizations lead to an embedded processor that performs significantly better than the large cache design while maintaining the area and energy efficiency of the small cache design.

References

D. H. Albonesi. Selective Cache Ways: On-Demand Cache Resource Allocation. In The Proceedings of Intl. Symposium on Microarchitecture, pages 248--259, Haifa, Israel, November 1999. Google ScholarDigital Library
N. Bellas, I. Hajj, C. Polychronopoulos, and G. Stamoulis. Energy and Performance Improvements in Microprocessor Design using a Loop Cache. In The Proceedings of Intl. Conference on Computer Design, pages 378--383, Washington, DC, October 1999. Google ScholarDigital Library
K. Beyls and E. H. D'Hollander. Generating Cache Hints for Improved Program Efficiency. Journal of Systems Architecture, 51(4):223--250, April 2005. Google ScholarDigital Library
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In The Proceedings of Intl. Symposium on Computer Architecture, pages 83--94, Vancouver, BC, Canada, June 2000. Google ScholarDigital Library
D. Burger and T. M. Austin. Simplescalar Tool Set, Version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, June 1997.Google ScholarDigital Library
I.-C. K. Chen, C.-C. Lee, and T. N. Mudge. Instruction Prefetching Using Branch Prediction Information. In The Proceedings of Intl. Conference on Computer Design, pages 593--601, San Jose, CA, October 1997. Google ScholarDigital Library
K. Ghose and M. B. Kamble. Reducing Power in Superscalar Processor Caches Using Subbanking, Multiple Line Buffers and Bit-Line Segmentation. In The Proceedings of Intl. Symposium on Low Power Electronics and Design, pages 70--75, San Diego, CA, August 1999. Google ScholarDigital Library
A. Gordon-Ross, S. Cotterell, and F. Vahid. Tiny Instruction Caches for Low Power Embedded Systems. ACM Transactions on Embedded Computing Systems, 2(4):449--481, November 2003. Google ScholarDigital Library
Intel Corporation. Intel Itanium Architecture Software Developers Manual. Revision 2.0, December 2001.Google Scholar
Intel Corporation. Intel PXA27x Processor Family Developer's Manual, October 2004.Google Scholar
P. Jain, S. Devadas, D. Engels, and L. Rudolph. Software-Assisted Cache Replacement Mechanisms for Embedded Systems. In The Proceedings of Intl. Conference on Computer-Aided Design, pages 119--126, San Jose, CA, November 2001. Google ScholarDigital Library
D. Joseph and D. Grunwald. Prefetching using Markov Predictors. In The Proceedings of Intl. Symposium on Computer Architecture, pages 252--263, Denver, CO, June 1997. Google ScholarDigital Library
J. Kin, M. Gupta, and W. H. Mangione-Smith. The Filter Cache: An Energy Efficient Memory Structure. In The Proceedings of Intl. Symposium on Microarchitecture, pages 184--193, Research Triangle Park, NC, December 1997. Google ScholarDigital Library
L. H. Lee, B. Moyer, and J. Arends. Instruction Fetch Energy Reduction Using Loop Caches for Embedded Applications with Small Tight Loops. In The Proceedings of Intl. Symposium on Low Power Electronics and Design, pages 267--269, San Diego, CA, August 1999. Google ScholarDigital Library
C.-K. Luk and T. C. Mowry. Architectural and Compiler Support for Effective Instruction Prefetching: a Cooperative Approach. ACM Transactions on Computer Systems, 19(1):71--109, February 2001. Google ScholarDigital Library
A. Malik, B. Moyer, and D. Cermak. A Low Power Unified Cache Architecture Providing Power and Performance Flexibility. In The Proceedings of Intl. Symposium on Low Power Electronics and Design, pages 241--243, Rapallo, Italy, July 2000. Google ScholarDigital Library
S. McFarling. Program Optimization for Instruction Caches. In The Proceedings of Intl. Conference on Architectural Support for Programming Languages and Operating Systems, pages 183--191, Boston, MA, April 1989. Google ScholarDigital Library
R. Panwar and D. Rennels. Reducing the Frequency of Tag Compares for Low Power I-Cache Design. In The Proceedings of Intl. Symposium on Low Power Design, pages 57--62, Dana Point, CA, April 1995. Google ScholarDigital Library
G.-H. Park, O.-Y. Kwon, T.-D. Han, S.-D. Kim, and S.-B. Yang. An Improved Lookahead Instruction Prefetching. In The Proceedings of High-Performance Computing on the Information Superhighway, pages 712--715, Seoul, South Korea, May 1997. Google ScholarDigital Library
K. Pettis and R. C. Hansen. Profile Guided Code Positioning. In The Proceedings of Conference on Programming Language Design and Implementation, pages 16--27, White Plains, NY, June 1990. Google ScholarDigital Library
J. Pierce and T. Mudge. Wrong-Path Instruction Prefetching. In The Proceedings of Intl. Symposium on Microarchitecture, pages 165--175, Paris, France, December 1996. Google ScholarDigital Library
M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy. Reducing Set-Associative Cache Energy via Way Prediction and Selective Direct-Mapping. In The Proceedings of Intl. Symposium on Microarchitecture, pages 54--65, Austin, TX, December 2001. Google ScholarDigital Library
P. Ranganathan, S. Adve, and N. P. Jouppi. Reconfigurable Caches and their Application to Media Processing. In The Proceedings of Intl. Symposium on Computer Architecture, pages 214--224, Vancouver, BC, Canada, June 2000. Google ScholarDigital Library
G. Reinman, B. Calder, and T. Austin. Fetch Directed Instruction Prefetching. In The Proceedings of Intl. Symposium on Microarchitecture, pages 16--27, Haifa, Israel, Nov. 1999. Google ScholarDigital Library
C. Rowen. Engineering the Complex SOC. Prentice Hall, 2004.Google Scholar
J. S. Seng and D. M. Tullsen. Architecture-Level Power Optimization-What Are the Limits? Journal of Instruction-Level Parallelism 7, 7(3):1--20, January 2005.Google Scholar
P. Shivakumar and N. P. Jouppi. Cacti 3.0: An Integrated Cache Timing, Power, Area Model. Technical Report 2001/02, Compaq Western Research Laboratory, Aug. 2001.Google Scholar
J. E. Smith and W.-C. Hsu. Prefetching in Supercomputer Instruction Caches. In The Proceedings of Conference on Supercomputing, pages 588--597, Minneapolis, MN, November 1992. Google ScholarDigital Library
V. Srinivasan, E. S. Davidson, G. S. Tyson, M. J. Charney, and T. R. Puzak. Branch History Guided Instruction Prefetching. In The Proceedings of Intl. Symposium on High-Performance Computer Architecture, pages 291--300, Nuevo Leone, Mexico, January 2001. Google ScholarDigital Library
H. Tomiyama and H. Yasuura. Code Placement Techniques for Cache Miss Rate Reduction. ACM Transactions on Design Automation of Electronic Systems, 2(4):410--429, October 1997. Google ScholarDigital Library
A. Zmily, E. Killian, and C. Kozyrakis. Improving Instruction Delivery with a Block-Aware ISA. In The Proceedings of EuroPar Conference, pages 530--539, Lisbon, Portugal, August 2005. Google ScholarDigital Library
A. Zmily and C. Kozyrakis. Energy-Efficient and High-Performance Instruction Fetch using a Block-Aware ISA. In The Proceedings of Intl. Symposium on Low Power Electronics and Design, pages 36--41, San Diego, CA, August 2005. Google ScholarDigital Library
A. Zmily and C. Kozyrakis. Block-Aware Instruction Set Architecture. ACM Transactions on Architecture and Code Optimization, 3(3):327--357, September 2006. Google ScholarDigital Library
A. Zmily and C. Kozyrakis. Simultaneously Improving Code Size, Performance, and Energy in Embedded Processors. In The Proceedings of Conference on Design, Automation and Test in Europe, pages 224--229, Munich, Germany, March 2006. Google ScholarDigital Library

Index Terms

A low power front-end for embedded processors using a block-aware instruction set
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems
2. Hardware
  1. Communication hardware, interfaces and storage

Recommendations

Block-aware instruction set architecture

Instruction delivery is a critical component for wide-issue, high-frequency processors since its bandwidth and accuracy place an upper limit on performance. The processor front-end accuracy and bandwidth are limited by instruction-cache misses, ...
Read More
Instruction prefetching using branch prediction information
ICCD '97: Proceedings of the 1997 International Conference on Computer Design (ICCD '97)

Instruction prefetching can effectively reduce instruction cache misses, thus improving the performance. In this paper, we propose a prefetching scheme, which employs a branch predictor to run ahead of the execution unit and to prefetch potentially ...
Read More
Optimizations Enabled by a Decoupled Front-End Architecture

In the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
September 2007
292 pages
ISBN:9781595938268
DOI:10.1145/1289881
General Chairs:
Taewhan Kim
Seoul National University, South Korea
,
Pascal Sainrat
IRIT - Université de Toulouse, France
,
Program Chairs:
Steven S. Lumetta
University of Illinois at Urbana-Champaign, USA
,
Nacho Navarro
Universitat Politecnica de Catalunya, Spain
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 September 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
instruction prefetching
instruction re-ordering
low power front-end
software hints
tagless instruction cache
unified instruction cache and BTB
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate52of230submissions,23%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 322
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A low power front-end for embedded processors using a block-aware instruction set

CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Block-aware instruction set architecture

Instruction prefetching using branch prediction information

Optimizations Enabled by a Decoupled Front-End Architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A low power front-end for embedded processors using a block-aware instruction set

CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Block-aware instruction set architecture

Instruction prefetching using branch prediction information

Optimizations Enabled by a Decoupled Front-End Architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media