research-article

Heterogeneously tagged caches for low-power embedded systems with virtual memory support

Authors:
Xiangrong Zhou

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

,
Peter Petrov

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

ACM Transactions on Design Automation of Electronic Systems Volume 13 Issue 2Article No.: 32pp 1–24https://doi.org/10.1145/1344418.1344428

Published:23 April 2008Publication History

ACM Transactions on Design Automation of Electronic Systems

Abstract

An energy-efficient data cache organization for embedded processors with virtual memory is proposed. Application knowledge regarding memory references is used to eliminate most tag translations. A novel tagging scheme is introduced, where both virtual and physical tags coexist. Physical tags and special handling of superset index bits are only used for references to shared regions in order to avoid cache inconsistency. By eliminating the need for most address translations on cache access, a significant power reduction is achieved. We outline an efficient hardware architecture, where the application information is captured in a reprogrammable way and the cache is minimally modified.

References

ARM, Ltd. 1995. ARM920T Technical Reference Manual. ARM, Ltd.Google Scholar
Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. IEEE Comput. 35, 2 (Feb.), 59--67. Google ScholarDigital Library
Benini, L., Macii, A., and Poncino, M. 2003. Energy-Aware design of embedded memories: A survey of technologies, architectures, and optimization techniques. ACM Trans. Embed. Comput. Syst. 2, 1, 5--32. Google ScholarDigital Library
Benini, L., Menichelli, F., and Olivieri, M. 2004. A class of code compression schemes for reducing power consumption in embedded microprocessor systems. IEEE Trans. Comput. 53, 4, 467--482. Google ScholarDigital Library
Calder, B., Krintz, C., John, S., and Austin, T. 1998. Cache-Conscious data placement. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 139--149. Google ScholarDigital Library
Cekleov, M. and Dubois, M. 1997. Virtual-Address caches. Part 1: Problems and solutions in uniprocessors. IEEE Micro. 17, 5 (Sept.), 64--71. Google ScholarDigital Library
Chilimbi, T. M., Hill, M. D., and Larus, J. R. 1999. Cache-Conscious structure layout. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1--12. Google ScholarDigital Library
Ekman, M., Dahlgren, F., and Stenstrom, P. 2002. TLB and snoop energy-reduction using virtual caches in low-power chip-microprocessors. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED). 243--246. Google ScholarDigital Library
Intel Corp. 2007. Intel XScale microarchitecture. Intel Corporation.Google Scholar
Jacob, B. and Mudge, T. 1998. Virtual memory: Issues of implementation. IEEE Comput. 31, 6 (Jun.), 33--43. Google ScholarDigital Library
Juan, T., Lang, T., and Navarro, J. J. 1997. Reducing TLB power requirements. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED), 196--201. Google ScholarDigital Library
Kadayif, I., Nath, P., Kandemir, M., and Sivasubramaniam, A. 2004. Compiler-Directed physical address generation for reducing DTLB power. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), 161--168. Google ScholarDigital Library
Kadayif, I., Sivasubramaniam, A., Kandemir, M., Kandiraju, G., and Chen, G. 2002. Generating physical addresses directly for saving instruction TLB energy. In Proceedings of the International Symposium on Microarchitecture (MICRO), 185. Google ScholarDigital Library
Kandemir, M., Kadayif, I., and Chen, G. 2004. Compiler-Directed code restructuring for reducing data TLB energy. In Proceedings of the International Conference on Hardware/Software Codedesign and System Synthesis (CODES and ISSS), 98--103. Google ScholarDigital Library
Kim, J., Min, S., Jeon, S., Ahn, B., Jeong, D., and Kim, C. 1995. U-Cache: A cost-effective solution to synonym problem. In Proceedings of the International Symposium on High-Performance Computer Archtecture (HPCA), 243--252. Google ScholarDigital Library
Kulkarni, C., Ghez, C., Miranda, M., Catthoor, F., and Man, H. D. 2005. Cache conscious data layout organization for conflict miss reduction in embedded multimedia applications. IEEE Trans. Comput. 54, 1, 76--81. Google ScholarDigital Library
Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture (MICRO), 330--335. Google ScholarDigital Library
Lee, J. H., Lee, J. S., Jeong, S., and Kim, S. 2001. A banked-promotion TLB for high performance and low power. In Proceedings of the IEEE International Conference on Computer Design (ICCD), 118--123. Google ScholarDigital Library
Middha, B., Simpson, M., and Barua, R. 2005. MTSS: Multi task stack sharing for embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), New York, 191--201. Google ScholarDigital Library
Panda, P. R., Catthoor, F., Dutt, N. D., Danckaert, K., Brockmeyer, E., Kulkarni, C., Vandercappelle, A., and Kjeldsberg, P. G. 2001. Data and memory optimization techniques for embedded systems. ACM Trans. Des. Autom. Electron. Syst. 6, 2, 149--206. Google ScholarDigital Library
Petrov, P., Tracy, D., and Orailoglu, A. 2005. Energy-Efficient physically tagged caches for embedded processors with virtual memory. In Proceedings of the IEEE/ACM Design Automation Conference (DAC), 17--22. Google ScholarDigital Library
Qiu, X. and Dubois, M. 2001. Towards virtually-addressed memory hierarchies. In Proceedings of the International Symposium on High-Performance Computer Archtecture (HPCA), 51--62. Google ScholarDigital Library
Simpson, M., Middha, B., and Barua, R. 2005. Segment protection for embedded systems using run-time checks. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), New York, 66--77. Google ScholarDigital Library
Tarjan, D., Thoziyoor, S., and Jouppi, N. 2006. Cacti 4.0: An integrated cache timing, power and area model. Tech. Rep., HP Laboratories, Palo Alto, California, June.Google Scholar
Vratonjic, M., Zeydel, B., and Oklobdzija, V. 2005. Low- and ultra low-power arithmetic units: Design and comparison. In Proceedings of the International Conference on Computer Design (ICCD), 249--252. Google ScholarDigital Library
Woo, D., Ghosh, M., Ozer, E., Biles, S., and Lee, H.-H. 2006. Reducing energy of virtual cache synonym lookup using bloom filters. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), 179--189. Google ScholarDigital Library

Index Terms

Heterogeneously tagged caches for low-power embedded systems with virtual memory support
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and design

Chip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
Read More
Dynamic Tag Reduction for Low-Power Caches in Embedded Systems with Virtual Memory
This paper presents a low-power tag organization for physically tagged caches in embedded processors with virtual memory support. An exceedingly small subset of tag bits is identified for each application hot-spot so that only these tag bits are used for ...
Read More
Design and Optimization of Large Size and Low Overhead Off-Chip Caches

Large off-chip L3 caches can significantly improve the performance of memory-intensive applications. However, conventional L3 SRAM caches are facing two issues as those applications require increasingly large caches. First, an SRAM cache has a limited ...
Read More

Reviews

Reviewer: Gabriel Mateescu

Zhou and Petrov propose a cache architecture that aims to provide fast data access and low power consumption. The memory hierarchy of modern computer systems includes a high-speed memory that is faster but smaller than the main memory, called cache memory. Caches reduce the average latency of memory accesses, and can be organized in multiple levels, where the size increases but the speed decreases with the level. The processor accesses the cache in two steps: cache indexing and tag comparison. In cache indexing, the least significant bits of the memory address are used to select a cache set, where a set consists of one (for direct-mapped caches) or several (for set-associative caches) cache lines. Each cache line consists of data, tags, and state bits, with the tags containing the memory address. During tag comparison, the tags of all the lines in the selected set are compared against the memory address. If a match is found, a cache hit occurs and the data from the cache is used. In systems with virtual memory, the processor issues virtual addresses that are translated into physical addresses using a combination of software and hardware. The virtual address space is divided into virtual pages, and the physical space is divided into page frames. A special structure in memory, called the page table, translates virtual page numbers to physical page numbers for each process, and a special cache called the translation lookaside buffer (TLB) caches the page table entries: "TLB is usually implemented as a highly associative cache structure which consumes a significant amount of power." The memory address used for indexing and for tagging the cache can be either the virtual or the physical address. If both addresses are physical addresses, the cache architecture is called physical cache. Otherwise, it is called virtual cache. The most common kinds of virtual caches are those indexed and tagged with virtual address bits (V/V caches), and those indexed with virtual bits and tagged with physical bits (V/P caches). Physical caches require that address translation be performed before cache indexing for each memory access. For this purpose, the TLB is accessed, which incurs both a performance penalty (because it inserts the TLB in the memory access path) and a power overhead (because of the power consumption of the TLB). In contrast, V/V caches have the advantage that cache accessing does not require address translation (thus no TLB access), which results in fast access and low power consumption. However, V/V caches have the drawback of potential cache consistency problems. These problems can occur when the virtual-to-physical page mapping is changed by the operating system, or when multiple processes share some physical memory (that is, parts of the virtual address spaces of two processes are mapped to the same physical memory). The following kinds of cache consistency problems can occur: synonyms, aliases, homonyms, and cache coherence. (Cekleov and Dubois define these cache consistency problems in their paper [1].) In uniprocessor systems, cache coherence problems can occur when synonyms for shared writable data exist. Since information in instruction caches is not modified by processes, V/V caches can be safely used for instruction caches. The homonym problem is solved by extending the virtual tags with the process ID of the process that issues the virtual address. In V/P caches, cache indexing proceeds in parallel with address translation, hiding some of the address translation latency. Tag comparison occurs when both indexing and address translation are complete. V/P caches consume more power than the V/V caches, are slower than V/V caches, and are faster than physical caches. However, V/P caches have the advantage over V/V caches in that cache consistency problems can be easily avoided. Therefore, V/P caches can be safely used for data caches. The cache architecture proposed by the authors tries to combine the low power consumption and fast access of V/V caches with the elimination of consistency problems provided by V/P caches. The authors introduce a hybrid tagging scheme that uses virtual tags for private data and physical tags for shared data, employing application-specific information in order to decide which kind of tag to use for a certain virtual page. Shared pages are identified using a combination of source-code annotation, compiler support, and additional hardware. The source code of the application declares shared data using #pragma directives. A portion of the virtual address space is reserved for shared data and the compiler maps data declared as shared to that reserved space. Finally, combinational logic (for example, a three-input and gate if the reserved address space is identified by ones in the most significant three address bits) is used to detect shared pages. The merits of the proposed technique are questionable. First, the requirement to annotate the applications implies that existing applications need to be modified. Second, applying the techniques requires compiler support and changes to the processor logic. The presentation style of the authors is not very clear, some ideas are repeatedly stated using slightly different phrasing, and some technical mistakes are made by the authors. For example, the authors incorrectly define cache aliasing as "a situation where the same virtual address from different tasks is mapped to different physical addresses." In fact, this defines homonyms [1]. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Design Automation of Electronic Systems Volume 13, Issue 2
April 2008
272 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/1344418
Issue’s Table of Contents

Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 23 April 2008
- Accepted: 1 December 2007
- Received: 1 April 2007
Published in todaes Volume 13, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Embedded systems
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 455
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Heterogeneously tagged caches for low-power embedded systems with virtual memory support

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches

Dynamic Tag Reduction for Low-Power Caches in Embedded Systems with Virtual Memory

Design and Optimization of Large Size and Low Overhead Off-Chip Caches

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Heterogeneously tagged caches for low-power embedded systems with virtual memory support

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches

Dynamic Tag Reduction for Low-Power Caches in Embedded Systems with Virtual Memory

Design and Optimization of Large Size and Low Overhead Off-Chip Caches

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media