Article

Reducing data cache energy consumption via cached load/store queue

Authors:
Dan Nicolaescu

University of California, Irvine, CA

University of California, Irvine, CA
View Profile

,
Alex Veidenbaum

University of California, Irvine, CA

University of California, Irvine, CA
View Profile

,
Alex Nicolau

University of California, Irvine, CA

University of California, Irvine, CA
View Profile

ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and designAugust 2003Pages 252–257https://doi.org/10.1145/871506.871569

Published:25 August 2003Publication History

ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and design

Pages 252–257

ABSTRACT

High-performance processors use a large set--associative L1 data cache with multiple ports. As clock speeds and size increase such a cache consumes a significant percentage of the total processor energy. This paper proposes a method of saving energy by reducing the number of data cache accesses. It does so by modifying the Load/Store Queue design to allow "caching" of previously accessed data values on both loads and stores after the corresponding memory access instruction has been committed. It is shown that a 32-entry modified LSQ design allows an average of 38.5% of the loads in the SpecINT95 benchmarks and 18.9% in the SpecFP95 benchmarks to get their data from the LSQ. The reduction in the number of L1 cache accesses results in up to a 40% reduction in the L1 data cache energy consumption and in an up to a 16% improvement in the energy--delay product while requiring almost no additional hardware or complex control logic.

References

T. M. Austin and G. S. Sohi. Zero-cycle loads: Microarchitecture support for reducing load latency. pages 82--92. Google ScholarDigital Library
R. Bodik, R. Gupta, and M. L. Soffa. Load-reuse analysis: Design and evaluation. In SIGPLAN Conference on Programming Language Design and Implementation, pages 64--76, 1999. Google ScholarDigital Library
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA, pages 83--94, 2000. Google ScholarDigital Library
D. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, 1997.Google ScholarDigital Library
K. Diefendorff. K7 challenges Intel. Microprocessor Report, 12(14):1--7, Oct. 1998.Google Scholar
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, (Q1):13, Feb. 2001.Google Scholar
K. Inoue, T. Ishihara, and K. Murakami. Way-predicting set-associative cache for high performance and low energy consumption. In ACM/IEEE International Symposium on Low Power Electronics and Design, pages 273--275, 1999. Google ScholarDigital Library
R. E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24--36, Mar. Apr. 1999. Google ScholarDigital Library
J. Kin, M. Gupta, and W. H. Mangione-Smith. The filter cache: An energy efficient memory structure. In International Symposium on Microarchitecture, pages 184--193, 1997. Google ScholarDigital Library
K. M. Lepak. Silent stores for free: Reducing the cost of store verification. Master's thesis, University of Wisconsin--Madison, 2000.Google Scholar
A. Moshovos and G. S. Sohi. Read-after-read memory dependence prediction. 1999.Google Scholar
D. Nicolaescu, A. Veidenbaum, and A. Nicolau. Reducing power consumption for high-associativity data caches in embedded processors. In DATE2003 Proceedings, 2003. Google ScholarDigital Library
W. Tang, A. Veidenbaum, A. Nicolau, and R. Gupta. Simultaneous way-footprint prediction and branch prediction for energy savings in set-associative instruction caches. In IEEE Workshop on Power Management for Real-Time and Embedded Systems, 2001.Google Scholar
J. Yang and R. Gupta. Energy-efficient load and store reuse. In ACM/IEEE International Symposium on Low Power Electronics and Design, pages 72--75, 2001. Google ScholarDigital Library

Index Terms

Reducing data cache energy consumption via cached load/store queue
1. Computer systems organization
  1. Architectures
    1. Serial architectures

Recommendations

Reducing cache misses through programmable decoders

Level-one caches normally reside on a processor's critical path, which determines clock frequency. Therefore, fast access to level-one cache is important. Direct-mapped caches exhibit faster access time, but poor hit rates, compared with same sized set-...
Read More
A highly configurable cache for low energy embedded systems

Energy consumption is a major concern in many embedded computing systems. Several studies have shown that cache memories account for about 50% of the total energy consumed in these systems. The performance of a given cache architecture is determined, to ...
Read More
An energy-efficient L2 cache architecture using way tag information under write-through policy

Many high-performance microprocessors employ cache write-through policy for performance improvement and at the same time achieving good tolerance to soft errors in on-chip caches. However, write-through policy also incurs large energy overhead due to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and design
August 2003
502 pages
ISBN:158113682X
DOI:10.1145/871506
General Chairs:
Ingrid Verbauwhede
University of California, Los Angeles, CA
,
Hyung Roh
Samsung Electronics
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 August 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
LSQ
cache
load queue
low energy
low latency
low power
memory
store queue
Qualifiers
- Article
Conference

Acceptance Rates
ISLPED '03 Paper Acceptance Rate90of221submissions,41%Overall Acceptance Rate398of1,159submissions,34%
More
Upcoming Conference
ISLPED '24

Sponsor:

sigda

ACM/IEEE International Symposium on Low Power Electronics and Design

August 5 - 7, 2024

Newport Beach , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 529
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reducing data cache energy consumption via cached load/store queue

ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and design

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reducing cache misses through programmable decoders

A highly configurable cache for low energy embedded systems

An energy-efficient L2 cache architecture using way tag information under write-through policy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Reducing data cache energy consumption via cached load/store queue

ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and design

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reducing cache misses through programmable decoders

A highly configurable cache for low energy embedded systems

An energy-efficient L2 cache architecture using way tag information under write-through policy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media