skip to main content
10.1145/3240302.3240429acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

Opportunistic compression for direct-mapped DRAM caches

Published: 01 October 2018 Publication History

Abstract

Large off-chip DRAM caches offer performance and bandwidth improvements for many systems by bridging the gap between on-chip last level caches and off-chip memories. To avoid the high hit latency resulting from serial DRAM accesses for tags and data, prior work proposed co-locating tags and data to be accessed together. The state-of-the-art block-based DRAM cache design, the Alloy Cache, reduces hit latency but suffers from increased miss rate due to its direct-mapped design.
In this paper, we propose using compression to increase the associativity of a direct-mapped DRAM cache with little impact on hit latency. If the fill and victim lines and the victim tag can be compressed to a single block, the cache effectively becomes a two-way set-associative cache. This mechanism can be extended to compress more lines together and achieve higher associativity. We propose using a low-latency compression algorithm to avoid performance losses. Our analysis on SPECCPU2006 benchmarks shows that nearly 36% of all sets become 2-way, which increases DRAM cache capacity and reduces conflict misses.

References

[1]
Moinuddin K. Qureshi and Gabriel H. Loh, "Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design," In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). Vancouver, B.C., Canada, pp. 235--246, 2012.
[2]
Djordje Jevdjic, Gabriel H. Loh, Cansu Kaynak, Babak Falsafi, "Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache," In the International Symposium on Microarchitecture (MICRO), Cambridge, UK, December 2014.
[3]
Vinson Young, Prashant Nair and Moinuddin K. Qureshi, "DICE: Compressing DRAM Caches for Bandwidth and Capacity," in Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, Canada, June, 2017.
[4]
Alaa R Alameldeen and David A Wood. 2004, "Frequent pattern compression: A significance-based compression scheme for L2 caches," Dept. Comp. Scie., Univ. Wisconsin-Madison, Tech. Rep 1500 (2004).
[5]
X. Chen, L. Yang, R. P. Dick, L. Shang, and H. Lekatsas, "C-Pack: A High-Performance Microprocessor Cache Compression Algoithm," IEEE Transactions on VLSI Systems, Vol. 18, No. 8, pp. 1196--1208, 2010.
[6]
SPEC Benchmarks, http://www.spec.org/cpu2006, 2006.
[7]
Gabriel H. Loh and Mark D. Hill. 2011, "Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches," In Proceedings of the 44<sup>th</sup> Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44), pp. 454--464, 2011.
[8]
Jaewoong Sim, Gabriel Loh, Hyesoon Kim, Mike OConnor, and Mithuna Thottethodi, "A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch,". In 45<sup>th</sup> Annual IEEE/ACM International Symposium on Microarchitecture. 247--257, 2012.
[9]
Djordje Jevdjic, Stavros Volos, and Babak Falsafi, "Die-stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache," In Proceedings of the 40<sup>th</sup> Annual International Symposium on Computer Architecture (ISCA '13), 2013.
[10]
Sean Franey and Mikko Lipasti, "Tag tables," In Proceedings of the IEEE 21<sup>st</sup> International Symposium on High Performance Computer Architecture (HPCA), pp. 514--525, 2015.
[11]
Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi, "BEAR: Techniques for Mitigating Bandwidth Bloat in Gigascale DRAM Caches," In Proceedings of the 42<sup>nd</sup> Annual International Symposium on Computer Architecture (ISCA '15), pp. 198--210, 2015.
[12]
Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi, "CANDY: Enabling coherent DRAM caches for multi-node systems," In 49<sup>th</sup> Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1--13, 2016.
[13]
M. Lipasti, Christopher B. Wilkerson and John Paul Shen, "Value Locality and Load Value Prediction," International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 138--147, Cambridge, MA, 1996.
[14]
Jun Yang and Rajiv Gupta, "Energy Efficient Frequent Value Cache Design," International Symposium on Microarchitecture (MICRO-35), pp. 197--207, Istanbul, Turkey, December 2002.
[15]
M. M. Islam and Per Stenstrom, "Zero-Value Caches; Cancelling Loads that Return Zero" International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 237--245, Raleigh, NC, September 2009.
[16]
Julien Dusser, Thomas Piquet and Andre Seznec, "Zero-Content Augmented Caches," 23rd International Conference on Supercomputing (ICS'09), pp. 46--55, 2009.
[17]
Yingying Tian, Samira M. Khan, Daniel A. Jimenez, and Gabriel H. Loh, "Last-level cache deduplication," 28th International Conference on Supercomputing (ICS '14), pp. 53--62, 2014.
[18]
E.G. Hallnor and S.K. Reinhardt, "A Unified Compressed Memory Hierarchy," International Symposium on High-Performance Computer Architecture, pp. 201--212, 2005.
[19]
Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip P. Gibbons, Michael A. Kozuch, and Todd C. Mowry, "Exploiting Compressed Block Size as an Indicator of Future Reuse," International Symposium on High-Performance Computer Architecture (HPCA-21), pp. 51--63, February 2015.
[20]
Moinuddin K. Qureshi, David Thompson, and Yale N. Patt, "The V-Way Cache: Demand Based Associativity via Global Replacement," International Symposium on Computer Architecture (ISCA-32), pp. 544--555, Madison, WI, June 2005.
[21]
Alaa R. Alameldeen and David A. Wood, "Adaptive Cache Compression for High-Performance Processors," International Symposium on Computer Architecture (ISCA-31), pp. 212--223, Munich, Germany, June 2004.
[22]
Somayeh Sardashti and David A. Wood, "Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching," International Symposium on Microarchitecture, Davis, CA, December 2013.
[23]
Somayeh Sardashti, Andre Seznec, and David A. Wood, "Skewed Compressed Caches," 47th International Symposium on Microarchitecture (MICRO-47), pp. 331--342, Washington, D.C., December 2014.
[24]
Jayesh Gaur, Alaa R. Alameldeen, and Sreenivas Subramoney, "Base-Victim Compression: An Opportunistic Cache Compression Architecture," In Proceedings of the ACM/IEEE 43<sup>rd</sup> Annual International Symposium on Computer Architecture (ISCA), pp. 317--328, 2016.
[25]
Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Michael A. Kozuch, Phillip B. Gibbons, and Todd C. Mowry, "Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches," International Conference on Parallel Architectures and Compilation Techniques (PACT), Minneapolis, MN, September 2012.
[26]
A. Arelakis and P. Stenstrom, "SC2: A Statistical Compression Cache Scheme," International Symposium on Computer Architecture (ISCA-41), pp. 145--156, Minneapolis, MN, June 2014.

Cited By

View all
  • (2025)Optimizing Bandwidth Utilization Through Word Based Compression in Main Memories2025 38th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID64188.2025.00029(91-96)Online publication date: 4-Jan-2025
  • (2024)A Low-Cost Fault-Tolerant Racetrack Cache Based on Data CompressionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337564071:8(3940-3944)Online publication date: Aug-2024
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '18: Proceedings of the International Symposium on Memory Systems
October 2018
361 pages
ISBN:9781450364751
DOI:10.1145/3240302
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DRAM cache
  2. alloy cache
  3. cache compression
  4. stacked DRAM

Qualifiers

  • Research-article

Conference

MEMSYS '18
MEMSYS '18: The International Symposium on Memory Systems
October 1 - 4, 2018
Virginia, Alexandria, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)3
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Optimizing Bandwidth Utilization Through Word Based Compression in Main Memories2025 38th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID64188.2025.00029(91-96)Online publication date: 4-Jan-2025
  • (2024)A Low-Cost Fault-Tolerant Racetrack Cache Based on Data CompressionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337564071:8(3940-3944)Online publication date: Aug-2024
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • (2024)A novel approximate cache block compressor for error-resilient image dataComputers and Electrical Engineering10.1016/j.compeleceng.2024.109106115(109106)Online publication date: Apr-2024
  • (2022)Exploiting Inter-block Entropy to Enhance the Compressibility of Blocks with Diverse Data2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00084(1100-1114)Online publication date: Apr-2022
  • (2022)A Case for Partial Co-allocation Constraints in Compressed CachesEmbedded Computer Systems: Architectures, Modeling, and Simulation10.1007/978-3-031-04580-6_5(65-77)Online publication date: 27-Apr-2022
  • (2021)Conciliating Speed and Efficiency on Cache Compressors2021 IEEE 39th International Conference on Computer Design (ICCD)10.1109/ICCD53106.2021.00075(442-446)Online publication date: Oct-2021
  • (2019)ZCOMPProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358305(126-138)Online publication date: 12-Oct-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media