skip to main content
research-article

Adaptive loop caching using lightweight runtime control flow analysis

Published:29 March 2013Publication History
Skip Abstract Section

Abstract

Loop caches provide an effective method for decreasing memory hierarchy energy consumption by storing frequently executed code (critical regions) in a more energy efficient structure than the level one cache. However, due to code structure restrictions or costly design time pre-analysis efforts, previous loop cache designs are not suitable for all applications and system scenarios. We present an adaptive loop cache that is amenable to a wider range of system scenarios, which can provide an additional 20% average instruction cache energy savings (with individual benchmark energy savings as high as 69%) compared to the next best loop cache, the preloaded loop cache.

References

  1. Bellas, N., Hajj, I., Polychronopoulos, C., and Stamoulis, G. 1999. Energy and performance improvements in microprocessor design using a loop cache. In Proceedings of IEEE International Conference on Computer Design (ICCD'99). 378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Burger, D., Austin, T., and Bennet, S. 1996. Evaluating future microprocessors: The SimpleScalar ToolSet. Tech. rep. CS-TR-1308, Computer Science Department, University of Wisconsin-Madison.Google ScholarGoogle Scholar
  3. Butts, J. A. and Sohi, G. S. 2000. A static power model for architects. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 33). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chaver, D., Rojas, M. A., and Pinuel, L. 2005. Energy-aware fetch mechanism: Trace cache and BTB customization. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED 05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. EEMBC. http://www.eembc.org/.Google ScholarGoogle Scholar
  6. Eeckhout, L., Vandierendonck, H., and De Bosschere, K. 2002. Workload design: selecting representative program-input pairs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gordon-Ross, A., Viana, P., Vahid, F., Najjar, W., and Barros, E. 2007. A one-shot configurable-cache tuner for improved energy and performance. In Proceedings of Design, Automation and Test in Europe (DATE 07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gordon-Ross, A. and Vahid, F. 2002a. Dynamic loop caching metes preloaded loop caching—A hybrid approach. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gordon-Ross, A., Cotterell, and Vahid, F. 2002b. Exploiting fixed programs in embedded systems: A Loop cache example. Comput. Architect. Letters, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gordon-Ross, A. and Vahid, F. 2003. Frequent loop detection using efficient non-intrusive on-chip hardware. In Proceedings of the International Conference on Compilers, Architecture and Synthesis For Embedded Systems (CASES'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gordon-Ross, A., Lau, J., and Calder, B. 2008. Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy. In Proceedings of the 18th ACM Great Lakes Symposium on VLSI (GLSVLSI'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hines, S., Whalley, D., and Tyson, G. 2007. Guaranteeing hits to improve the efficiency of a small instruction cache. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kin, J., Gupta, M., and Mangione-Smith, W. H. 1997. The filter cache: an energy efficient memory structure. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lee, L. H., Moyer, W., and Arends, J. 1999. Low cost embedded program loop caching -- Revisited. Tech. rep. CSE-TR-411-99, University of Michigan.Google ScholarGoogle Scholar
  16. Malik, A., Moyer, W., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Montanaro, J. and Witek, R. 1997. A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. Digital Techn. J. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rivers, J. A., Asaad, S., Wellman, J.-D., and Moreno, J. H. 2003. Reducing instruction fetch energy with backwards branch control information and buffering. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Rotenberg, E., Bennett, S., and Smith, J. E. 1996. Trace cache: A low latency approach to high bandwidth instruction fetching. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Scott, J., Lee, L., Arends, J., and Moyer, B. 1998. Designing the low- power M∼CORE Architecture. Proceedings of the International Symposium on Computer Architecture Power Driven Microarchitecture Workshop. 145--150Google ScholarGoogle Scholar
  21. Segars, S. 2001. Low power design for microprocessors. In Proceedings of the International Solid State Circuit Conference.Google ScholarGoogle Scholar
  22. Sherwood, T 2003. Discovering and exploiting program phases. IEEE Micro 23, 6, 84--93, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shivakumar, P. and Jouppi, N. P. 2001. Cacti3.0: An integrated cache timing and power model. COMPAQ Western Research Lab.Google ScholarGoogle Scholar
  24. Smith, M. D. 2000. Overcoming the challenges to feedback-directed optimization. SIGPLAN Not. 35, 7, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Villarreal, J., Lysecky, R., Cotterell, S., and Vahid, F. 2002. A Study on the loop behavior of embedded programs. Tech. rep. UCR-CSE-01-03, University of California, Riverside.Google ScholarGoogle Scholar
  26. Zhang, C. and Vahid, F. 2003. Cache configuration exploration on prototyping platforms. In Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP 03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zhang, C., Vahid, F., and Najjar, W. 2003. A highly-configurable cache architecture for embedded systems. In Proceedings of the 30th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adaptive loop caching using lightweight runtime control flow analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 12, Issue 1s
      Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
      March 2013
      701 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/2435227
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 March 2013
      • Accepted: 1 May 2011
      • Revised: 1 January 2011
      • Received: 1 September 2010
      Published in tecs Volume 12, Issue 1s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader