research-article

Open access

FlatPack: Flexible Compaction of Compressed Memory

Authors:

Albin Eldstål-Ahrens,

Angelos Arelakis,

Ioannis SourdisAuthors Info & Claims

PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques

Pages 96 - 108

https://doi.org/10.1145/3559009.3569653

Published: 27 January 2023 Publication History

Abstract

The capacity and bandwidth of main memory is an increasingly important factor in computer system performance. Memory compression and compaction have been combined to increase effective capacity and reduce costly page faults. However, existing systems typically maintain compaction at the expense of bandwidth. One major cause of extra traffic in such systems is page overflows, which occur when data compressibility degrades and compressed pages must be reorganized. This paper introduces FlatPack, a novel approach to memory compaction which is able to mitigate this overhead by reorganizing compressed data dynamically with less data movement. Reorganization is carried out by an addition to the memory controller, without intervention from software. FlatPack is able to maintain memory capacity competitive with current state-of-the-art memory compression designs, while reducing mean memory traffic by up to 67%. This yields average improvements in performance and total system energy consumption over existing memory compression solutions of 31--46% and 11--25%, respectively. In total, FlatPack improves on baseline performance and energy consumption by 108% and 40%, respectively, in a single-core system, and 83% and 23%, respectively, in a multi-core system.

References

[1]

Alaa R Alameldeen and David A Wood. 2004. Frequent pattern compression: A significance-based compression scheme for L2 caches. Dept. Comp. Scie., Univ. Wisconsin-Madison, Tech. Rep 1500 (2004).

[2]

A. Arelakis, F. Dahlgren, and P. Stenstrom. 2015. HyComp: a hybrid cache compression method for selection of data-type-specific compression methods. In MICRO. IEEE, Waikiki, Hawaii, USA, 38--49.

[3]

Angelos Arelakis and Per Stenstrom. 2014. SC2: A statistical compression cache scheme. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). IEEE, Minneapolis, Minnesota, USA, 145--156.

[4]

Yanan Cao, Long Chen, and Zhao Zhang. 2015. Flexible memory: A novel main memory architecture with block-level memory compression. In 2015 IEEE International Conference on Networking, Architecture and Storage (NAS). IEEE, Boston, Massachusetts, USA, 285--294.

[5]

Xi Chen, Lei Yang, Robert P Dick, Li Shang, and Haris Lekatsas. 2010. C-pack: A high-performance microprocessor cache compression algorithm. IEEE transactions on very large scale integration (VLSI) systems 18, 8 (2010), 1196--1208.

[6]

E. Choukse, M. Erez, and A. R. Alameldeen. 2018. Compresso: Pragmatic Main Memory Compression. In MICRO. IEEE, Fukuoka, Japan, 546--558.

[7]

Esha Choukse, Michael B. Sullivan, Mike O'Connor, Mattan Erez, Jeff Pool, David Nellans, and Stephen W. Keckler. 2020. Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). ACM/IEEE, Valencia, Spain, 926--939.

[8]

Standard Performance Evaluation Corporation. 2017. SPEC CPU 2017. Retrieved 2021-07-30 from https://www.spec.org/cpu2017

[9]

Magnus Ekman and Per Stenstrom. 2005. A robust main-memory compression scheme. In SIGARCH C.A. News, Vol. 33. ACM, New York, New York, USA, 74--85.

[10]

Albin Eldstål-Damlin, Pedro Trancoso, and Ioannis Sourdis. 2019. AVR: Reducing Memory Traffic with Approximate Value Reconstruction. In ICPP. ACM, Kyoto, Japan, 1--10.

[11]

Albin Eldstål-Ahrens, Angelos Arelakis, and Ioannis Sourdis. 2022. L2C: Combining Lossy and Lossless Compression on Memory and I/O. ACM Trans. Embed. Comput. Syst. 21, 1, Article 12 (jan 2022).

Digital Library

[12]

Albin Eldstål-Ahrens and Ioannis Sourdis. 2020. MemSZ: Squeezing Memory Traffic with Lossy Compression. ACM TACO 17, 4, Article 40 (Nov. 2020), 40:1--40:25 pages.

[13]

Peter A. Franaszek and Dan E. Poff. 2007. Management of Guest OS Memory Compression In Virtualized Systems. Patent US20080307188A1.

[14]

D. Genbrugge, S. Eyerman, and L. Eeckhout. 2010. Interval simulation: Raising the level of abstraction in architectural simulation. In HPCA. IEEE, Bangalore, India, 1--12.

[15]

E.G. Hallnor and S.K. Reinhardt. 2005. A unified compressed memory hierarchy. In 11th International Symposium on High-Performance Computer Architecture. IEEE, San Fransisco, California, USA, 201--212.

[16]

S. Hong, P. J. Nair, B. Abali, A. Buyuktosunoglu, K. Kim, and M. Healy. 2018. Attaché: Towards Ideal Memory Compression by Mitigating Metadata Bandwidth Overheads. In MICRO. IEEE, Fukuoka, Japan, 326--338.

[17]

Raghavendra Kanakagiri, Biswabandan Panda, and Madhu Mutyam. 2017. MBZip: Multiblock data compression. TACO 14, 4 (2017), 1--29.

Digital Library

[18]

Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In ISCA. ACM/IEEE, Seoul, Republic of Korea, 329--340.

[19]

Sohan Lal, Jan Lucas, and Ben Juurlink. 2019. SLC: Memory access granularity aware selective lossy compression for GPUs. In DATE. IEEE, IEEE, Grenoble, France, 1184--1189.

[20]

Charles Lefurgy, Karthick Rajamani, Freeman Rawson, Wes Felter, Michael Kistler, and Tom W Keller. 2003. Energy management for commercial servers. Computer 36, 12 (2003), 39--48.

Digital Library

[21]

Jure Leskovec and Rok Sosič. 2016. Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1--20.

Digital Library

[22]

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In MICRO. IEEE, New York, New York, USA, 469--480.

[23]

Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: building customized program analysis tools with dynamic instrumentation. In ACM SIGPLAN Notices, Vol. 40. ACM, New York, New York, USA, 190--200.

Digital Library

[24]

Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP lab. 27 (2009), 22--31.

[25]

Richard C Murphy, Kyle B Wheeler, Brian W Barrett, and James A Ang. 2010. Introducing the graph 500. Cray Users Group (CUG) 19 (2010), 45--74.

[26]

David J. Palframan, Nam Sung Kim, and Mikko H. Lipasti. 2015. COP: To Compress and Protect Main Memory. In 42nd Annual International Symposium on Computer Architecture (Portland, Oregon) (ISCA '15). ACM, 682--693.

[27]

Sungbo Park, Ingab Kang, Yaebin Moon, Jung Ho Ahn, and G. Edward Suh. 2021. BCD Deduplication: Effective Memory Compression Using Partial Cache-Line Deduplication. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS 2021). 52--64.

Digital Library

[28]

Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B Gibbons, Michael A Kozuch, and Todd C Mowry. 2016. Linearly compressed pages: a low-complexity, low-latency main memory compression framework. In MICRO. IEEE, Taipei, Taiwan, 172--184.

[29]

Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Michael A Kozuch, Phillip B Gibbons, and Todd C Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In PACT. ACM, Minneapolis, Minnesota, USA, 377--388.

[30]

Ashish Ranjan, Arnab Raha, Vijay Raghunathan, and Anand Raghunathan. 2020. Approximate Memory Compression. IEEE TVLSI 28, 4 (2020), 980--991.

[31]

Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. 2011. DRAMSim2: A cycle accurate memory system simulator. IEEE CAL 10, 1 (2011), 16--19.

[32]

Larry Seiler, Daqi Lin, and Cem Yuksel. 2020. Compacted CPU/GPU Data Compression via Modified Virtual Address Translation. Proc. ACM Comput. Graph. Interact. Tech. 3, 2, Article 19 (Aug. 2020), 18 pages.

Digital Library

[33]

A. Seznec. 1994. Decoupled sectored caches: conciliating low tag implementation cost and low miss ratio. In ISCA. ACM/IEEE, Chicago, Illinois, USA, 384--393.

[34]

Ali Shafiee, Meysam Taassori, Rajeev Balasubramonian, and Al Davis. 2014. MemZip: Exploring unconventional benefits from memory compression. In HPCA. IEEE, Orlando, Florida, USA, 638--649.

[35]

R Brett Tremaine, Peter A Franaszek, John T Robinson, Charles O Schulz, T Basil Smith, Michael E Wazlowski, and P Maurice Bland. 2001. IBM memory expansion technology (MXT). IBM Journal of Research and Development 45, 2 (2001), 271--285.

Digital Library

[36]

Po-An Tsai, Andres Sanchez, Christopher W Fletcher, and Daniel Sanchez. 2020. Safecracker: Leaking secrets through compressed caches. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 1125--1140.

Digital Library

[37]

Carl A. Waldspurger. 2003. Memory Resource Management in VMware ESX Server. SIGOPS Oper. Syst. Rev. 36, SI (dec 2003), 181--194.

[38]

Jisoo Yang and Julian Seymour. 2018. Pmbench: A micro-benchmark for profiling paging performance on a system with low-latency SSDs. In Information Technology-New Generations. Springer, New York, New York, USA, 627--633.

[39]

Youtao Zhang and Rajiv Gupta. 2003. Enabling partial cache line prefetching through data compression. In 2003 International Conference on Parallel Processing, 2003. Proceedings. IEEE, IEEE, Lyon, France, 277--285.

[40]

Jishen Zhao, Sheng Li, Jichuan Chang, John L Byrne, Laura L Ramirez, Kevin Lim, Yuan Xie, and Paolo Faraboschi. 2015. Buri: Scaling big-memory computing with hardware-based memory expansion. ACM Transactions on Architecture and Code Optimization (TACO) 12, 3 (2015), 31.

Index Terms

FlatPack: Flexible Compaction of Compressed Memory
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Hardware
  1. Emerging technologies
    1. Memory and dense storage

Recommendations

Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory
Special Issue on High-Performance Embedded Architectures and Compilers

Limited PCM write bandwidth is a critical obstacle to achieve good performance from hybrid DRAM/PCM memory systems. The write bandwidth is severely restricted in PCM devices, which harms application performance. Indeed, as we show, it is more important ...
Decoupled zero-compressed memory
HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers

For each computer system generation, there are always applications or workloads for which the main memory size is the major limitation. On the other hand, in many cases, one could free a very significant portion of the memory space by storing data in a ...
HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory
ICS '24: Proceedings of the 38th ACM International Conference on Supercomputing

Hybrid memories, especially combining a first-tier near memory using High-Bandwidth Memory (HBM) and a second-tier far memory using DRAM, can realize a large and low cost, high-bandwidth main memory.

State-of-the-art hybrid memories typically use a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques

October 2022

569 pages

ISBN:9781450398688

DOI:10.1145/3559009

General Chair:
Andreas Kloeckner
University of Illinois
,
Program Chair:
José Moreira
IBM

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IFIP WG 10.3: IFIP WG 10.3
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Swedish Foundation for Strategic Research

Conference

PACT '22

Sponsor:

SIGARCH

PACT '22: International Conference on Parallel Architectures and Compilation Techniques

October 8 - 12, 2022

Illinois, Chicago

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
483
Total Downloads

Downloads (Last 12 months)217
Downloads (Last 6 weeks)20

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten