skip to main content
10.1145/1289881.1289926acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

A low power front-end for embedded processors using a block-aware instruction set

Published:30 September 2007Publication History

ABSTRACT

Energy, power, and area efficiency are critical design concerns for embedded processors. Much of the energy of a typical embedded processor is consumed in the front-end since instruction fetching happens on nearly every cycle and involves accesses to large memory arrays such as instruction and branch target caches. The use of small front-end arrays leads to significant power and area savings, but typically results in significant performance degradation. This paper evaluates and compares optimizations that improve the performance of embedded processors with small front-end caches. We examine both software techniques, such as instruction re-ordering and selective caching, and hardware techniques, such as instruction prefetching, tagless instruction cache, and unified caches for instruction and branch targets. We demonstrate that, building on top of a block-aware instruction set, these optimizations can eliminate the performance degradation due to small front-end caches. Moreover, selective combinations of these optimizations lead to an embedded processor that performs significantly better than the large cache design while maintaining the area and energy efficiency of the small cache design.

References

  1. D. H. Albonesi. Selective Cache Ways: On-Demand Cache Resource Allocation. In The Proceedings of Intl. Symposium on Microarchitecture, pages 248--259, Haifa, Israel, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Bellas, I. Hajj, C. Polychronopoulos, and G. Stamoulis. Energy and Performance Improvements in Microprocessor Design using a Loop Cache. In The Proceedings of Intl. Conference on Computer Design, pages 378--383, Washington, DC, October 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Beyls and E. H. D'Hollander. Generating Cache Hints for Improved Program Efficiency. Journal of Systems Architecture, 51(4):223--250, April 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In The Proceedings of Intl. Symposium on Computer Architecture, pages 83--94, Vancouver, BC, Canada, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Burger and T. M. Austin. Simplescalar Tool Set, Version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, June 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I.-C. K. Chen, C.-C. Lee, and T. N. Mudge. Instruction Prefetching Using Branch Prediction Information. In The Proceedings of Intl. Conference on Computer Design, pages 593--601, San Jose, CA, October 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Ghose and M. B. Kamble. Reducing Power in Superscalar Processor Caches Using Subbanking, Multiple Line Buffers and Bit-Line Segmentation. In The Proceedings of Intl. Symposium on Low Power Electronics and Design, pages 70--75, San Diego, CA, August 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Gordon-Ross, S. Cotterell, and F. Vahid. Tiny Instruction Caches for Low Power Embedded Systems. ACM Transactions on Embedded Computing Systems, 2(4):449--481, November 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Intel Corporation. Intel Itanium Architecture Software Developers Manual. Revision 2.0, December 2001.Google ScholarGoogle Scholar
  10. Intel Corporation. Intel PXA27x Processor Family Developer's Manual, October 2004.Google ScholarGoogle Scholar
  11. P. Jain, S. Devadas, D. Engels, and L. Rudolph. Software-Assisted Cache Replacement Mechanisms for Embedded Systems. In The Proceedings of Intl. Conference on Computer-Aided Design, pages 119--126, San Jose, CA, November 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Joseph and D. Grunwald. Prefetching using Markov Predictors. In The Proceedings of Intl. Symposium on Computer Architecture, pages 252--263, Denver, CO, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Kin, M. Gupta, and W. H. Mangione-Smith. The Filter Cache: An Energy Efficient Memory Structure. In The Proceedings of Intl. Symposium on Microarchitecture, pages 184--193, Research Triangle Park, NC, December 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. H. Lee, B. Moyer, and J. Arends. Instruction Fetch Energy Reduction Using Loop Caches for Embedded Applications with Small Tight Loops. In The Proceedings of Intl. Symposium on Low Power Electronics and Design, pages 267--269, San Diego, CA, August 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C.-K. Luk and T. C. Mowry. Architectural and Compiler Support for Effective Instruction Prefetching: a Cooperative Approach. ACM Transactions on Computer Systems, 19(1):71--109, February 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Malik, B. Moyer, and D. Cermak. A Low Power Unified Cache Architecture Providing Power and Performance Flexibility. In The Proceedings of Intl. Symposium on Low Power Electronics and Design, pages 241--243, Rapallo, Italy, July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. McFarling. Program Optimization for Instruction Caches. In The Proceedings of Intl. Conference on Architectural Support for Programming Languages and Operating Systems, pages 183--191, Boston, MA, April 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Panwar and D. Rennels. Reducing the Frequency of Tag Compares for Low Power I-Cache Design. In The Proceedings of Intl. Symposium on Low Power Design, pages 57--62, Dana Point, CA, April 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G.-H. Park, O.-Y. Kwon, T.-D. Han, S.-D. Kim, and S.-B. Yang. An Improved Lookahead Instruction Prefetching. In The Proceedings of High-Performance Computing on the Information Superhighway, pages 712--715, Seoul, South Korea, May 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Pettis and R. C. Hansen. Profile Guided Code Positioning. In The Proceedings of Conference on Programming Language Design and Implementation, pages 16--27, White Plains, NY, June 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Pierce and T. Mudge. Wrong-Path Instruction Prefetching. In The Proceedings of Intl. Symposium on Microarchitecture, pages 165--175, Paris, France, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy. Reducing Set-Associative Cache Energy via Way Prediction and Selective Direct-Mapping. In The Proceedings of Intl. Symposium on Microarchitecture, pages 54--65, Austin, TX, December 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Ranganathan, S. Adve, and N. P. Jouppi. Reconfigurable Caches and their Application to Media Processing. In The Proceedings of Intl. Symposium on Computer Architecture, pages 214--224, Vancouver, BC, Canada, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Reinman, B. Calder, and T. Austin. Fetch Directed Instruction Prefetching. In The Proceedings of Intl. Symposium on Microarchitecture, pages 16--27, Haifa, Israel, Nov. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Rowen. Engineering the Complex SOC. Prentice Hall, 2004.Google ScholarGoogle Scholar
  26. J. S. Seng and D. M. Tullsen. Architecture-Level Power Optimization-What Are the Limits? Journal of Instruction-Level Parallelism 7, 7(3):1--20, January 2005.Google ScholarGoogle Scholar
  27. P. Shivakumar and N. P. Jouppi. Cacti 3.0: An Integrated Cache Timing, Power, Area Model. Technical Report 2001/02, Compaq Western Research Laboratory, Aug. 2001.Google ScholarGoogle Scholar
  28. J. E. Smith and W.-C. Hsu. Prefetching in Supercomputer Instruction Caches. In The Proceedings of Conference on Supercomputing, pages 588--597, Minneapolis, MN, November 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. V. Srinivasan, E. S. Davidson, G. S. Tyson, M. J. Charney, and T. R. Puzak. Branch History Guided Instruction Prefetching. In The Proceedings of Intl. Symposium on High-Performance Computer Architecture, pages 291--300, Nuevo Leone, Mexico, January 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Tomiyama and H. Yasuura. Code Placement Techniques for Cache Miss Rate Reduction. ACM Transactions on Design Automation of Electronic Systems, 2(4):410--429, October 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Zmily, E. Killian, and C. Kozyrakis. Improving Instruction Delivery with a Block-Aware ISA. In The Proceedings of EuroPar Conference, pages 530--539, Lisbon, Portugal, August 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Zmily and C. Kozyrakis. Energy-Efficient and High-Performance Instruction Fetch using a Block-Aware ISA. In The Proceedings of Intl. Symposium on Low Power Electronics and Design, pages 36--41, San Diego, CA, August 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Zmily and C. Kozyrakis. Block-Aware Instruction Set Architecture. ACM Transactions on Architecture and Code Optimization, 3(3):327--357, September 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Zmily and C. Kozyrakis. Simultaneously Improving Code Size, Performance, and Energy in Embedded Processors. In The Proceedings of Conference on Design, Automation and Test in Europe, pages 224--229, Munich, Germany, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A low power front-end for embedded processors using a block-aware instruction set

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
          September 2007
          292 pages
          ISBN:9781595938268
          DOI:10.1145/1289881

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 September 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate52of230submissions,23%

          Upcoming Conference

          ESWEEK '24
          Twentieth Embedded Systems Week
          September 29 - October 4, 2024
          Raleigh , NC , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader