research-article

Opportunistic compression for direct-mapped DRAM caches

Authors:

Alaa R. Alameldeen,

Rajat AgarwalAuthors Info & Claims

MEMSYS '18: Proceedings of the International Symposium on Memory Systems

Pages 129 - 136

https://doi.org/10.1145/3240302.3240429

Published: 01 October 2018 Publication History

Abstract

Large off-chip DRAM caches offer performance and bandwidth improvements for many systems by bridging the gap between on-chip last level caches and off-chip memories. To avoid the high hit latency resulting from serial DRAM accesses for tags and data, prior work proposed co-locating tags and data to be accessed together. The state-of-the-art block-based DRAM cache design, the Alloy Cache, reduces hit latency but suffers from increased miss rate due to its direct-mapped design.

In this paper, we propose using compression to increase the associativity of a direct-mapped DRAM cache with little impact on hit latency. If the fill and victim lines and the victim tag can be compressed to a single block, the cache effectively becomes a two-way set-associative cache. This mechanism can be extended to compress more lines together and achieve higher associativity. We propose using a low-latency compression algorithm to avoid performance losses. Our analysis on SPECCPU2006 benchmarks shows that nearly 36% of all sets become 2-way, which increases DRAM cache capacity and reduces conflict misses.

References

[1]

Moinuddin K. Qureshi and Gabriel H. Loh, "Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design," In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). Vancouver, B.C., Canada, pp. 235--246, 2012.

Digital Library

[2]

Djordje Jevdjic, Gabriel H. Loh, Cansu Kaynak, Babak Falsafi, "Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache," In the International Symposium on Microarchitecture (MICRO), Cambridge, UK, December 2014.

Digital Library

[3]

Vinson Young, Prashant Nair and Moinuddin K. Qureshi, "DICE: Compressing DRAM Caches for Bandwidth and Capacity," in Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, Canada, June, 2017.

Digital Library

[4]

Alaa R Alameldeen and David A Wood. 2004, "Frequent pattern compression: A significance-based compression scheme for L2 caches," Dept. Comp. Scie., Univ. Wisconsin-Madison, Tech. Rep 1500 (2004).

[5]

X. Chen, L. Yang, R. P. Dick, L. Shang, and H. Lekatsas, "C-Pack: A High-Performance Microprocessor Cache Compression Algoithm," IEEE Transactions on VLSI Systems, Vol. 18, No. 8, pp. 1196--1208, 2010.

Digital Library

[6]

SPEC Benchmarks, http://www.spec.org/cpu2006, 2006.

[7]

Gabriel H. Loh and Mark D. Hill. 2011, "Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches," In Proceedings of the 44<sup>th</sup> Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44), pp. 454--464, 2011.

Digital Library

[8]

Jaewoong Sim, Gabriel Loh, Hyesoon Kim, Mike OConnor, and Mithuna Thottethodi, "A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch,". In 45<sup>th</sup> Annual IEEE/ACM International Symposium on Microarchitecture. 247--257, 2012.

Digital Library

[9]

Djordje Jevdjic, Stavros Volos, and Babak Falsafi, "Die-stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache," In Proceedings of the 40<sup>th</sup> Annual International Symposium on Computer Architecture (ISCA '13), 2013.

Digital Library

[10]

Sean Franey and Mikko Lipasti, "Tag tables," In Proceedings of the IEEE 21<sup>st</sup> International Symposium on High Performance Computer Architecture (HPCA), pp. 514--525, 2015.

[11]

Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi, "BEAR: Techniques for Mitigating Bandwidth Bloat in Gigascale DRAM Caches," In Proceedings of the 42<sup>nd</sup> Annual International Symposium on Computer Architecture (ISCA '15), pp. 198--210, 2015.

Digital Library

[12]

Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi, "CANDY: Enabling coherent DRAM caches for multi-node systems," In 49<sup>th</sup> Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1--13, 2016.

Digital Library

[13]

M. Lipasti, Christopher B. Wilkerson and John Paul Shen, "Value Locality and Load Value Prediction," International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 138--147, Cambridge, MA, 1996.

Digital Library

[14]

Jun Yang and Rajiv Gupta, "Energy Efficient Frequent Value Cache Design," International Symposium on Microarchitecture (MICRO-35), pp. 197--207, Istanbul, Turkey, December 2002.

Digital Library

[15]

M. M. Islam and Per Stenstrom, "Zero-Value Caches; Cancelling Loads that Return Zero" International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 237--245, Raleigh, NC, September 2009.

Digital Library

[16]

Julien Dusser, Thomas Piquet and Andre Seznec, "Zero-Content Augmented Caches," 23rd International Conference on Supercomputing (ICS'09), pp. 46--55, 2009.

Digital Library

[17]

Yingying Tian, Samira M. Khan, Daniel A. Jimenez, and Gabriel H. Loh, "Last-level cache deduplication," 28th International Conference on Supercomputing (ICS '14), pp. 53--62, 2014.

Digital Library

[18]

E.G. Hallnor and S.K. Reinhardt, "A Unified Compressed Memory Hierarchy," International Symposium on High-Performance Computer Architecture, pp. 201--212, 2005.

Digital Library

[19]

Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip P. Gibbons, Michael A. Kozuch, and Todd C. Mowry, "Exploiting Compressed Block Size as an Indicator of Future Reuse," International Symposium on High-Performance Computer Architecture (HPCA-21), pp. 51--63, February 2015.

[20]

Moinuddin K. Qureshi, David Thompson, and Yale N. Patt, "The V-Way Cache: Demand Based Associativity via Global Replacement," International Symposium on Computer Architecture (ISCA-32), pp. 544--555, Madison, WI, June 2005.

Digital Library

[21]

Alaa R. Alameldeen and David A. Wood, "Adaptive Cache Compression for High-Performance Processors," International Symposium on Computer Architecture (ISCA-31), pp. 212--223, Munich, Germany, June 2004.

Digital Library

[22]

Somayeh Sardashti and David A. Wood, "Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching," International Symposium on Microarchitecture, Davis, CA, December 2013.

Digital Library

[23]

Somayeh Sardashti, Andre Seznec, and David A. Wood, "Skewed Compressed Caches," 47th International Symposium on Microarchitecture (MICRO-47), pp. 331--342, Washington, D.C., December 2014.

Digital Library

[24]

Jayesh Gaur, Alaa R. Alameldeen, and Sreenivas Subramoney, "Base-Victim Compression: An Opportunistic Cache Compression Architecture," In Proceedings of the ACM/IEEE 43<sup>rd</sup> Annual International Symposium on Computer Architecture (ISCA), pp. 317--328, 2016.

Digital Library

[25]

Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Michael A. Kozuch, Phillip B. Gibbons, and Todd C. Mowry, "Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches," International Conference on Parallel Architectures and Compilation Techniques (PACT), Minneapolis, MN, September 2012.

Digital Library

[26]

A. Arelakis and P. Stenstrom, "SC2: A Statistical Compression Cache Scheme," International Symposium on Computer Architecture (ISCA-41), pp. 145--156, Minneapolis, MN, June 2014.

Digital Library

Cited By

S AVerma HKapoor H(2025)Optimizing Bandwidth Utilization Through Word Based Compression in Main Memories2025 38th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID64188.2025.00029(91-96)Online publication date: 4-Jan-2025
https://doi.org/10.1109/VLSID64188.2025.00029
Cheshmikhani EShokouhinia FFarbeh H(2024)A Low-Cost Fault-Tolerant Racetrack Cache Based on Data CompressionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337564071:8(3940-3944)Online publication date: Aug-2024
https://doi.org/10.1109/TCSII.2024.3375640
Buyuktosunoglu ATrilla DAbali BBerger DWalters CLee J(2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00080
Show More Cited By

Recommendations

Base-delta-immediate compression: practical data compression for on-chip caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Cache compression is a promising technique to increase on-chip cache capacity and to decrease on-chip and off-chip bandwidth usage. Unfortunately, directly applying well-known compression algorithms (usually implemented in software) leads to high ...
SELECTIVE VICTIM CACHING: A METHOD TO IMPROVE THE PERFORMANCE OF DIRECT-MAPPED CACHES
Base-victim compression: an opportunistic cache compression architecture
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture

The memory wall has motivated many enhancements to cache management policies aimed at reducing misses. Cache compression has been proposed to increase effective cache capacity, which potentially reduces capacity and conflict misses. However, complexity ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '18: Proceedings of the International Symposium on Memory Systems

October 2018

361 pages

ISBN:9781450364751

DOI:10.1145/3240302

General Chair:
Bruce Jacob
University of Maryland

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MEMSYS '18

MEMSYS '18: The International Symposium on Memory Systems

October 1 - 4, 2018

Virginia, Alexandria, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
342
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)3

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

S AVerma HKapoor H(2025)Optimizing Bandwidth Utilization Through Word Based Compression in Main Memories2025 38th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID64188.2025.00029(91-96)Online publication date: 4-Jan-2025
https://doi.org/10.1109/VLSID64188.2025.00029
Cheshmikhani EShokouhinia FFarbeh H(2024)A Low-Cost Fault-Tolerant Racetrack Cache Based on Data CompressionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337564071:8(3940-3944)Online publication date: Aug-2024
https://doi.org/10.1109/TCSII.2024.3375640
Buyuktosunoglu ATrilla DAbali BBerger DWalters CLee J(2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00080
Loloeyan PNikmehr HRezaei M(2024)A novel approximate cache block compressor for error-resilient image dataComputers and Electrical Engineering10.1016/j.compeleceng.2024.109106115(109106)Online publication date: Apr-2024
https://doi.org/10.1016/j.compeleceng.2024.109106
Kim JKang MHong JKim S(2022)Exploiting Inter-block Entropy to Enhance the Compressibility of Blocks with Diverse Data2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00084(1100-1114)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00084
Rodrigues Carvalho DSeznec A(2022)A Case for Partial Co-allocation Constraints in Compressed CachesEmbedded Computer Systems: Architectures, Modeling, and Simulation10.1007/978-3-031-04580-6_5(65-77)Online publication date: 27-Apr-2022
https://doi.org/10.1007/978-3-031-04580-6_5
Carvalho DSeznec A(2021)Conciliating Speed and Efficiency on Cache Compressors2021 IEEE 39th International Conference on Computer Design (ICCD)10.1109/ICCD53106.2021.00075(442-446)Online publication date: Oct-2021
https://doi.org/10.1109/ICCD53106.2021.00075
Akin BChishti ZAlameldeen A(2019)ZCOMPProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358305(126-138)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1145/3352460.3358305

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten