skip to main content
10.1145/2370816.2370870acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Base-delta-immediate compression: practical data compression for on-chip caches

Published: 19 September 2012 Publication History

Abstract

Cache compression is a promising technique to increase on-chip cache capacity and to decrease on-chip and off-chip bandwidth usage. Unfortunately, directly applying well-known compression algorithms (usually implemented in software) leads to high hardware complexity and unacceptable decompression/compression latencies, which in turn can negatively affect performance. Hence, there is a need for a simple yet efficient compression technique that can effectively compress common in-cache data patterns, and has minimal effect on cache access latency.
In this paper, we introduce a new compression algorithm called Base-Delta-Immediate (BΔI) compression, a practical technique for compressing data in on-chip caches. The key idea is that, for many cache lines, the values within the cache line have a low dynamic range - i.e., the differences between values stored within the cache line are small. As a result, a cache line can be represented using a base value and an array of differences whose combined size is much smaller than the original cache line (we call this the base+delta encoding). Moreover, many cache lines intersperse such base+delta values with small values - our BΔI technique efficiently incorporates such immediate values into its encoding.
Compared to prior cache compression approaches, our studies show that BΔI strikes a sweet-spot in the tradeoff between compression ratio, decompression/compression latencies, and hardware complexity. Our results show that BΔI compression improves performance for both single-core (8.1% improvement) and multi-core workloads (9.5% / 11.2% improvement for two/four cores). For many applications, BΔI provides the performance benefit of doubling the cache size of the baseline system, effectively increasing average cache capacity by 1.53X.

References

[1]
B. Abali, H. Franke, D. E. Poff, R. A. Saccone, C. O. Schulz, L. M. Herger, and T. B. Smith. Memory expansion technology (MXT): software support and performance. IBM JRD, 2001.
[2]
A. R. Alameldeen and D. A. Wood. Adaptive cache compression for high-performance processors. In ISCA-31, 2004.
[3]
A. R. Alameldeen and D. A. Wood. Frequent pattern compression: A significance-based compression scheme for L2 caches. Tech. Rep., University of Wisconsin-Madison, 2004.
[4]
S. Balakrishnan and G. S. Sohi. Exploiting value locality in physical register files. In MICRO-36, 2003.
[5]
J. Chen and W. A. Watson-III. Multi-threading performance on commodity multi-core processors. In Proceedings of HPCAsia, 2007.
[6]
X. Chen, L. Yang, R. Dick, L. Shang, and H. Lekatsas. C-pack: A high-performance microprocessor cache compression algorithm. In IEEE TVLSI, Aug. 2010.
[7]
R. Das, A. Mishra, C. Nicopoulos, D. Park, V. Narayanan, R. Iyer, M. Yousif, and C. Das. Performance and power optimization through data compression in network-on-chip architectures. In HPCA, 2008.
[8]
J. Dusser, T. Piquet, and A. Seznec. Zero-content augmented caches. In ICS, 2009.
[9]
M. Ekman and P. Stenström. A robust main-memory compression scheme. In ISCA-32, 2005.
[10]
M. Farrens and A. Park. Dynamic base register caching: a technique for reducing address bus width. In ISCA-18, 1991.
[11]
E. G. Hallnor and S. K. Reinhardt. A fully associative software-managed cache design. In ISCA-27, 2000.
[12]
E. G. Hallnor and S. K. Reinhardt. A unified compressed memory hierarchy. In HPCA-11, 2005.
[13]
D. W. Hammerstrom and E. S. Davidson. Information content of CPU memory referencing behavior. In ISCA-4, 1977.
[14]
D. Huffman. A method for the construction of minimum-redundancy codes. 1952.
[15]
M. M. Islam and P. Stenström. Zero-value caches: Cancelling loads that return zero. In PACT, 2009.
[16]
M. M. Islam and P. Stenström. Characterization and exploitation of narrow-width loads: the narrow-width cache approach. In CASES, 2010.
[17]
A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ISCA-37, 2010.
[18]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. 2002.
[19]
D. Molka, D. Hackenberg, R. Schone, and M. Muller. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In PACT, 2009.
[20]
M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA-34, 2007.
[21]
M. K. Qureshi, M. A. Suleman, and Y. N. Patt. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In HPCA-13, 2007.
[22]
M. K. Qureshi, D. Thompson, and Y. N. Patt. The V-Way cache: Demand based associativity via global replacement. ISCA-32, 2005.
[23]
Y. Sazeides and J. E. Smith. The predictability of data values. In MICRO-30, 1997.
[24]
A. B. Sharma, L. Golubchik, R. Govindan, and M. J. Neely. Dynamic data compression in multi-hop wireless networks. In SIGMETRICS, 2009.
[25]
A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreaded processor. ASPLOS-9, 2000.
[26]
SPEC CPU2006 Benchmarks. http://www.spec.org/.
[27]
S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA-13, 2007.
[28]
W. Sun, Y. Lu, F. Wu, and S. Li. DHTC: an effective DXTC-based HDR texture compression scheme. In Graphics Hardware, 2008.
[29]
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical Report HPL-2008-20, HP Laboratories, 2008.
[30]
Transaction Processing Performance Council. http://www.tpc.org/.
[31]
L. Villa, M. Zhang, and K. Asanovic. Dynamic zero compression for cache energy reduction. In MICRO-33, 2000.
[32]
P. R. Wilson, S. F. Kaplan, and Y. Smaragdakis. The case for compressed caching in virtual memory systems. In USENIX ATC, 1999.
[33]
J. Yang, Y. Zhang, and R. Gupta. Frequent value compression in data caches. In MICRO-33, 2000.
[34]
Y. Zhang, J. Yang, and R. Gupta. Frequent value locality and value-centric data cache design. ASPLOS-9, 2000.
[35]
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 1977.

Cited By

View all
  • (2025)PAF-Enc: Position Affine Encoding to Reduce Bit-Flips in Non-Volatile Main Memories2025 38th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID64188.2025.00059(266-271)Online publication date: 4-Jan-2025
  • (2025)Optimizing Bandwidth Utilization Through Word Based Compression in Main Memories2025 38th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID64188.2025.00029(91-96)Online publication date: 4-Jan-2025
  • (2025)Accelerating Deep Neural Networks with a Low-Cost Lossless Compression2025 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC64972.2025.10879738(1-4)Online publication date: 19-Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
September 2012
512 pages
ISBN:9781450311823
DOI:10.1145/2370816
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache compression
  2. caching
  3. memory

Qualifiers

  • Research-article

Conference

PACT '12
Sponsor:
  • IFIP WG 10.3
  • SIGARCH
  • IEEE CS TCPP
  • IEEE CS TCAA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)143
  • Downloads (Last 6 weeks)16
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)PAF-Enc: Position Affine Encoding to Reduce Bit-Flips in Non-Volatile Main Memories2025 38th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID64188.2025.00059(266-271)Online publication date: 4-Jan-2025
  • (2025)Optimizing Bandwidth Utilization Through Word Based Compression in Main Memories2025 38th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID64188.2025.00029(91-96)Online publication date: 4-Jan-2025
  • (2025)Accelerating Deep Neural Networks with a Low-Cost Lossless Compression2025 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC64972.2025.10879738(1-4)Online publication date: 19-Jan-2025
  • (2025)C4ECC: Data Compression for Bandwidth Efficiency Under ECC Protection in GPUs2025 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC64972.2025.10879667(1-4)Online publication date: 19-Jan-2025
  • (2024)SabreProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691939(1-18)Online publication date: 10-Jul-2024
  • (2024)Hardware Compression Method for On-Chip and Interprocessor Networks with Wide Channels and Wormhole Flow Control PolicyМетодика компрессии данных в накристальных и межпроцессорных сетях с широкими каналами и политикой управления потоком wormholeInformatics and AutomationИнформатика и автоматизация10.15622/ia.23.3.823:3(859-885)Online publication date: 28-May-2024
  • (2024)Improving Graph Compression for Efficient Resource-Constrained Graph AnalyticsProceedings of the VLDB Endowment10.14778/3665844.366585217:9(2212-2226)Online publication date: 1-May-2024
  • (2024)HMComp: Extending Near-Memory Capacity using Compression in Hybrid MemoryProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656612(74-84)Online publication date: 30-May-2024
  • (2024)Marple: Scalable Spike Sorting for Untethered Brain-Machine InterfacingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640357(666-682)Online publication date: 27-Apr-2024
  • (2024)A Low-Cost Fault-Tolerant Racetrack Cache Based on Data CompressionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337564071:8(3940-3944)Online publication date: Aug-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media