skip to main content
research-article

Heterogeneously tagged caches for low-power embedded systems with virtual memory support

Published:23 April 2008Publication History
Skip Abstract Section

Abstract

An energy-efficient data cache organization for embedded processors with virtual memory is proposed. Application knowledge regarding memory references is used to eliminate most tag translations. A novel tagging scheme is introduced, where both virtual and physical tags coexist. Physical tags and special handling of superset index bits are only used for references to shared regions in order to avoid cache inconsistency. By eliminating the need for most address translations on cache access, a significant power reduction is achieved. We outline an efficient hardware architecture, where the application information is captured in a reprogrammable way and the cache is minimally modified.

References

  1. ARM, Ltd. 1995. ARM920T Technical Reference Manual. ARM, Ltd.Google ScholarGoogle Scholar
  2. Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. IEEE Comput. 35, 2 (Feb.), 59--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Benini, L., Macii, A., and Poncino, M. 2003. Energy-Aware design of embedded memories: A survey of technologies, architectures, and optimization techniques. ACM Trans. Embed. Comput. Syst. 2, 1, 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Benini, L., Menichelli, F., and Olivieri, M. 2004. A class of code compression schemes for reducing power consumption in embedded microprocessor systems. IEEE Trans. Comput. 53, 4, 467--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Calder, B., Krintz, C., John, S., and Austin, T. 1998. Cache-Conscious data placement. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 139--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cekleov, M. and Dubois, M. 1997. Virtual-Address caches. Part 1: Problems and solutions in uniprocessors. IEEE Micro. 17, 5 (Sept.), 64--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chilimbi, T. M., Hill, M. D., and Larus, J. R. 1999. Cache-Conscious structure layout. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ekman, M., Dahlgren, F., and Stenstrom, P. 2002. TLB and snoop energy-reduction using virtual caches in low-power chip-microprocessors. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED). 243--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Intel Corp. 2007. Intel XScale microarchitecture. Intel Corporation.Google ScholarGoogle Scholar
  10. Jacob, B. and Mudge, T. 1998. Virtual memory: Issues of implementation. IEEE Comput. 31, 6 (Jun.), 33--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Juan, T., Lang, T., and Navarro, J. J. 1997. Reducing TLB power requirements. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED), 196--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kadayif, I., Nath, P., Kandemir, M., and Sivasubramaniam, A. 2004. Compiler-Directed physical address generation for reducing DTLB power. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), 161--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kadayif, I., Sivasubramaniam, A., Kandemir, M., Kandiraju, G., and Chen, G. 2002. Generating physical addresses directly for saving instruction TLB energy. In Proceedings of the International Symposium on Microarchitecture (MICRO), 185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kandemir, M., Kadayif, I., and Chen, G. 2004. Compiler-Directed code restructuring for reducing data TLB energy. In Proceedings of the International Conference on Hardware/Software Codedesign and System Synthesis (CODES and ISSS), 98--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kim, J., Min, S., Jeon, S., Ahn, B., Jeong, D., and Kim, C. 1995. U-Cache: A cost-effective solution to synonym problem. In Proceedings of the International Symposium on High-Performance Computer Archtecture (HPCA), 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kulkarni, C., Ghez, C., Miranda, M., Catthoor, F., and Man, H. D. 2005. Cache conscious data layout organization for conflict miss reduction in embedded multimedia applications. IEEE Trans. Comput. 54, 1, 76--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture (MICRO), 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lee, J. H., Lee, J. S., Jeong, S., and Kim, S. 2001. A banked-promotion TLB for high performance and low power. In Proceedings of the IEEE International Conference on Computer Design (ICCD), 118--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Middha, B., Simpson, M., and Barua, R. 2005. MTSS: Multi task stack sharing for embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), New York, 191--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Panda, P. R., Catthoor, F., Dutt, N. D., Danckaert, K., Brockmeyer, E., Kulkarni, C., Vandercappelle, A., and Kjeldsberg, P. G. 2001. Data and memory optimization techniques for embedded systems. ACM Trans. Des. Autom. Electron. Syst. 6, 2, 149--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Petrov, P., Tracy, D., and Orailoglu, A. 2005. Energy-Efficient physically tagged caches for embedded processors with virtual memory. In Proceedings of the IEEE/ACM Design Automation Conference (DAC), 17--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Qiu, X. and Dubois, M. 2001. Towards virtually-addressed memory hierarchies. In Proceedings of the International Symposium on High-Performance Computer Archtecture (HPCA), 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Simpson, M., Middha, B., and Barua, R. 2005. Segment protection for embedded systems using run-time checks. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), New York, 66--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tarjan, D., Thoziyoor, S., and Jouppi, N. 2006. Cacti 4.0: An integrated cache timing, power and area model. Tech. Rep., HP Laboratories, Palo Alto, California, June.Google ScholarGoogle Scholar
  25. Vratonjic, M., Zeydel, B., and Oklobdzija, V. 2005. Low- and ultra low-power arithmetic units: Design and comparison. In Proceedings of the International Conference on Computer Design (ICCD), 249--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Woo, D., Ghosh, M., Ozer, E., Biles, S., and Lee, H.-H. 2006. Reducing energy of virtual cache synonym lookup using bloom filters. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), 179--189. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Heterogeneously tagged caches for low-power embedded systems with virtual memory support

    Recommendations

    Reviews

    Gabriel Mateescu

    Zhou and Petrov propose a cache architecture that aims to provide fast data access and low power consumption. The memory hierarchy of modern computer systems includes a high-speed memory that is faster but smaller than the main memory, called cache memory. Caches reduce the average latency of memory accesses, and can be organized in multiple levels, where the size increases but the speed decreases with the level. The processor accesses the cache in two steps: cache indexing and tag comparison. In cache indexing, the least significant bits of the memory address are used to select a cache set, where a set consists of one (for direct-mapped caches) or several (for set-associative caches) cache lines. Each cache line consists of data, tags, and state bits, with the tags containing the memory address. During tag comparison, the tags of all the lines in the selected set are compared against the memory address. If a match is found, a cache hit occurs and the data from the cache is used. In systems with virtual memory, the processor issues virtual addresses that are translated into physical addresses using a combination of software and hardware. The virtual address space is divided into virtual pages, and the physical space is divided into page frames. A special structure in memory, called the page table, translates virtual page numbers to physical page numbers for each process, and a special cache called the translation lookaside buffer (TLB) caches the page table entries: "TLB is usually implemented as a highly associative cache structure which consumes a significant amount of power." The memory address used for indexing and for tagging the cache can be either the virtual or the physical address. If both addresses are physical addresses, the cache architecture is called physical cache. Otherwise, it is called virtual cache. The most common kinds of virtual caches are those indexed and tagged with virtual address bits (V/V caches), and those indexed with virtual bits and tagged with physical bits (V/P caches). Physical caches require that address translation be performed before cache indexing for each memory access. For this purpose, the TLB is accessed, which incurs both a performance penalty (because it inserts the TLB in the memory access path) and a power overhead (because of the power consumption of the TLB). In contrast, V/V caches have the advantage that cache accessing does not require address translation (thus no TLB access), which results in fast access and low power consumption. However, V/V caches have the drawback of potential cache consistency problems. These problems can occur when the virtual-to-physical page mapping is changed by the operating system, or when multiple processes share some physical memory (that is, parts of the virtual address spaces of two processes are mapped to the same physical memory). The following kinds of cache consistency problems can occur: synonyms, aliases, homonyms, and cache coherence. (Cekleov and Dubois define these cache consistency problems in their paper [1].) In uniprocessor systems, cache coherence problems can occur when synonyms for shared writable data exist. Since information in instruction caches is not modified by processes, V/V caches can be safely used for instruction caches. The homonym problem is solved by extending the virtual tags with the process ID of the process that issues the virtual address. In V/P caches, cache indexing proceeds in parallel with address translation, hiding some of the address translation latency. Tag comparison occurs when both indexing and address translation are complete. V/P caches consume more power than the V/V caches, are slower than V/V caches, and are faster than physical caches. However, V/P caches have the advantage over V/V caches in that cache consistency problems can be easily avoided. Therefore, V/P caches can be safely used for data caches. The cache architecture proposed by the authors tries to combine the low power consumption and fast access of V/V caches with the elimination of consistency problems provided by V/P caches. The authors introduce a hybrid tagging scheme that uses virtual tags for private data and physical tags for shared data, employing application-specific information in order to decide which kind of tag to use for a certain virtual page. Shared pages are identified using a combination of source-code annotation, compiler support, and additional hardware. The source code of the application declares shared data using #pragma directives. A portion of the virtual address space is reserved for shared data and the compiler maps data declared as shared to that reserved space. Finally, combinational logic (for example, a three-input and gate if the reserved address space is identified by ones in the most significant three address bits) is used to detect shared pages. The merits of the proposed technique are questionable. First, the requirement to annotate the applications implies that existing applications need to be modified. Second, applying the techniques requires compiler support and changes to the processor logic. The presentation style of the authors is not very clear, some ideas are repeatedly stated using slightly different phrasing, and some technical mistakes are made by the authors. For example, the authors incorrectly define cache aliasing as "a situation where the same virtual address from different tasks is mapped to different physical addresses." In fact, this defines homonyms [1]. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Design Automation of Electronic Systems
      ACM Transactions on Design Automation of Electronic Systems  Volume 13, Issue 2
      April 2008
      272 pages
      ISSN:1084-4309
      EISSN:1557-7309
      DOI:10.1145/1344418
      Issue’s Table of Contents

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 April 2008
      • Accepted: 1 December 2007
      • Received: 1 April 2007
      Published in todaes Volume 13, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Author Tags

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader