ABSTRACT
High-performance processors use a large set--associative L1 data cache with multiple ports. As clock speeds and size increase such a cache consumes a significant percentage of the total processor energy. This paper proposes a method of saving energy by reducing the number of data cache accesses. It does so by modifying the Load/Store Queue design to allow "caching" of previously accessed data values on both loads and stores after the corresponding memory access instruction has been committed. It is shown that a 32-entry modified LSQ design allows an average of 38.5% of the loads in the SpecINT95 benchmarks and 18.9% in the SpecFP95 benchmarks to get their data from the LSQ. The reduction in the number of L1 cache accesses results in up to a 40% reduction in the L1 data cache energy consumption and in an up to a 16% improvement in the energy--delay product while requiring almost no additional hardware or complex control logic.
- T. M. Austin and G. S. Sohi. Zero-cycle loads: Microarchitecture support for reducing load latency. pages 82--92. Google ScholarDigital Library
- R. Bodik, R. Gupta, and M. L. Soffa. Load-reuse analysis: Design and evaluation. In SIGPLAN Conference on Programming Language Design and Implementation, pages 64--76, 1999. Google ScholarDigital Library
- D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA, pages 83--94, 2000. Google ScholarDigital Library
- D. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, 1997.Google ScholarDigital Library
- K. Diefendorff. K7 challenges Intel. Microprocessor Report, 12(14):1--7, Oct. 1998.Google Scholar
- G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, (Q1):13, Feb. 2001.Google Scholar
- K. Inoue, T. Ishihara, and K. Murakami. Way-predicting set-associative cache for high performance and low energy consumption. In ACM/IEEE International Symposium on Low Power Electronics and Design, pages 273--275, 1999. Google ScholarDigital Library
- R. E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24--36, Mar. Apr. 1999. Google ScholarDigital Library
- J. Kin, M. Gupta, and W. H. Mangione-Smith. The filter cache: An energy efficient memory structure. In International Symposium on Microarchitecture, pages 184--193, 1997. Google ScholarDigital Library
- K. M. Lepak. Silent stores for free: Reducing the cost of store verification. Master's thesis, University of Wisconsin--Madison, 2000.Google Scholar
- A. Moshovos and G. S. Sohi. Read-after-read memory dependence prediction. 1999.Google Scholar
- D. Nicolaescu, A. Veidenbaum, and A. Nicolau. Reducing power consumption for high-associativity data caches in embedded processors. In DATE2003 Proceedings, 2003. Google ScholarDigital Library
- W. Tang, A. Veidenbaum, A. Nicolau, and R. Gupta. Simultaneous way-footprint prediction and branch prediction for energy savings in set-associative instruction caches. In IEEE Workshop on Power Management for Real-Time and Embedded Systems, 2001.Google Scholar
- J. Yang and R. Gupta. Energy-efficient load and store reuse. In ACM/IEEE International Symposium on Low Power Electronics and Design, pages 72--75, 2001. Google ScholarDigital Library
Index Terms
- Reducing data cache energy consumption via cached load/store queue
Recommendations
Reducing cache misses through programmable decoders
Level-one caches normally reside on a processor's critical path, which determines clock frequency. Therefore, fast access to level-one cache is important. Direct-mapped caches exhibit faster access time, but poor hit rates, compared with same sized set-...
A highly configurable cache for low energy embedded systems
Energy consumption is a major concern in many embedded computing systems. Several studies have shown that cache memories account for about 50% of the total energy consumed in these systems. The performance of a given cache architecture is determined, to ...
An energy-efficient L2 cache architecture using way tag information under write-through policy
Many high-performance microprocessors employ cache write-through policy for performance improvement and at the same time achieving good tolerance to soft errors in on-chip caches. However, write-through policy also incurs large energy overhead due to ...
Comments