research-article

Public Access

Pageforge: a near-memory content-aware page-merging architecture

Authors:

Dimitrios Skarlatos,

Josep TorrellasAuthors Info & Claims

MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 302 - 314

https://doi.org/10.1145/3123939.3124540

Published: 14 October 2017 Publication History

Abstract

To reduce the memory requirements of virtualized environments, modern hypervisors are equipped with the capability to search the memory address space and merge identical pages --- a process called page deduplication. This process uses a combination of data hashing and exhaustive comparison of pages, which consumes processor cycles and pollutes caches.

In this paper, we present a lightweight hardware mechanism that augments the memory controller and performs the page merging process with minimal hypervisor involvement. Our concept, called PageForge, is effective. It compares pages in the memory controller, and repurposes the Error Correction Codes (ECC) engine to generate accurate and inexpensive ECC-based hash keys. We evaluate PageForge with simulations of a 10-core processor with a virtual machine (VM) on each core, running a set of applications from the TailBench suite. When compared with RedHat's KSM, a state-of-the-art software implementation of page merging, PageForge attains identical savings in memory footprint while substantially reducing the overhead. Compared to a system without same-page merging, PageForge reduces the memory footprint by an average of 48%, enabling the deployment of twice as many VMs for the same physical memory. Importantly, it keeps the average latency overhead to 10%, and the 95^th percentile tail latency to 11%. In contrast, in KSM, these latency overheads are 68% and 136%, respectively.

References

[1]

Andrea Arcangeli, Izik Eidus, and Chris Wright. 2009. Increasing memory density by using KSM. In Proceedings of the Linux Symposium. 19--28.

[2]

Microsoft Azure. https://azure.microsoft.com.

[3]

Sean Barker, Timothy Wood, Prashant Shenoy, and Ramesh Sitaraman. 2012. An Empirical Study of Memory Sharing in Virtual Machines. In USENIX Annual Technical Conference. USENIX, Boston, MA, 273--284.

Digital Library

[4]

Thomas W. Barr, Alan L. Cox, and Scott Rixner. 2011. SpecTLB: A Mechanism for Speculative Address Translation. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA '11). ACM, New York, NY, USA, 307--318.

Digital Library

[5]

Luiz Andre Barroso and Urs Hoelzle. 2009. The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines (1st ed.). Morgan and Claypool Publishers.

Digital Library

[6]

Edouard Bugnion, Scott Devine, Kinshuk Govil, and Mendel Rosenblum. 1997. Disco: Running Commodity Operating Systems on Scalable Multiprocessors. ACM Trans. Comput. Syst. 15, 4 (Nov. 1997), 412--447.

Digital Library

[7]

John B. Carter, Wilson C. Hsieh, Leigh Stoller, Mark R. Swanson, Lixin Zhang, Erik Brunvand, Al Davis, Chen-Chi Kuo, Ravindra Kuramkote, Michael A. Parker, Lambert Schaelicke, and Terry Tateyama. 1999. Impulse: Building a Smarter Memory Controller. In IEEE International Symposium on High Performance Computer Architecture (HPCA).

Digital Library

[8]

Rodrigo Ceron, Rafael Folco, Breno Leitao, and Humberto Tsubamoto. Power Systems Memory Deduplication. http://www.redbooks.ibm.com/abstracts/redp4827.html.

[9]

Nadav Chachmon, Daniel Richins, Robert Cohn, Magnus Christensson, Wenzhi Cui, and Vijay Janapa Reddi. 2016. Simulation and Analysis Engine for Scale-Out Workloads. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16). ACM, New York, NY, USA, Article 22, 13 pages.

Digital Library

[10]

C. R. Chang, J. J. Wu, and P. Liu. 2011. An Empirical Study on Memory Sharing of Virtual Machines for Server Consolidation. In 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications. 244--249.

Digital Library

[11]

David Cheriton, Amin Firoozshahian, Alex Solomatnikov, John P. Stevenson, and Omid Azizi. 2012. HICAMP: Architectural Support for Efficient Concurrency-safe Shared Structured Data Access. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, USA, 287--300.

Digital Library

[12]

IBM Cloud. https://www.ibm.com/cloud-computing.

[13]

Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 143--154.

Digital Library

[14]

Jeffrey Dean and Luiz Andre Barroso. 2013. The Tail at Scale. Communications of the ACM, 56 (2013), 74--80. http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scale/fulltext

Digital Library

[15]

Timothy J. Dell. 1997. A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory. In IBM Executive Overview.

[16]

Umesh Deshpande, Xiaoshuang Wang, and Kartik Gopalan. 2011. Live Gang Migration of Virtual Machines. In Proceedings of the 20th International Symposium on High Performance Distributed Computing (HPDC '11). ACM, New York, NY, USA, 135--146.

Digital Library

[17]

Y. Du, M. Zhou, B. R. Childers, D. Mosse, and R. Melhem. 2015. Supporting superpages in non-contiguous physical memory. In IEEE International Symposium on High Performance Computer Architecture (HPCA). 223--234.

[18]

Amazon EC2. https://aws.amazon.com/ec2.

[19]

Google Compute Engine. https://cloud.google.com/compute.

[20]

Mikinori Eto and Hidenori Umeno. 2008. Design and implementation of content based page sharing method in Xen. In International Conference on Control, Automation and Systems (ICCAS). 2919--2922.

[21]

Linux Kernel JHash Header File. http://lxr.free-electrons.com/source/include/linux/jhash.h.

[22]

Fei Guo, Seongbeom Kim, Yury Baskakov, and Ishan Banerjee. 2015. Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '15). ACM, New York, NY, USA, 39--51.

Digital Library

[23]

Diwaker Gupta, Sangmin Lee, Michael Vrable, Stefan Savage, Alex C. Snoeren, George Varghese, Geoffrey M. Voelker, and Amin Vahdat. 2008. Difference Engine: Harnessing Memory Redundancy in Virtual Machines. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). USENIX Association, Berkeley, CA, USA, 309--322. http://dl.acm.org/citation.cfm?id=1855741.1855763

Digital Library

[24]

R. W. Hamming. 1950. Error detecting and error correcting codes. The Bell System Technical Journal 29, 2 (April 1950), 147--160.

[25]

Milad Hashemi, Khubaib, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2016. Accelerating Dependent Cache Misses with an Enhanced Memory Controller. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 444--455.

Digital Library

[26]

H. Hassan, G. Pekhimenko, N. Vijaykumar, V. Seshadri, D. Lee, O. Ergin, and O. Mutlu. 2016. ChargeCache: Reducing DRAM latency by exploiting row access locality. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 581--593.

[27]

Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). USENIX Association, Berkeley, CA, USA, 295--308. http://dl.acm.org/citation.cfm?id=1972457.1972488

Digital Library

[28]

M. Y. Hsiao. 1970. A Class of Optimal Minimum Odd-weight-column SEC-DED Codes. IBM Journal of Research and Development 14, 4 (July 1970), 395--401.

Digital Library

[29]

Ubuntu Cloud Images. https://cloud-images.ubuntu.com.

[30]

Intel. Intel Clear Containers. https://clearlinux.org/features/intel-clear-containers

[31]

Intel. Intel Clear Containers: A Breakthrough Combination of Speed and Workload Isolation. https://clearlinux.org/sites/default/files/vmscontainers_wp_v5.pdf

[32]

Engin Ipek, Onur Mutlu, José F. Martínez, and Rich Caruana. 2008. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach. In 2008 International Symposium on Computer Architecture.

Digital Library

[33]

H. Kasture and D. Sanchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC). 1--10.

[34]

Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. 2010. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In IEEE International Symposium on High-Performance Computer Architecture (HPCA). 1--12.

[35]

Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. 2010. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In 43rd Annual IEEE/ACM International Symposium on Microarchitecture. 65--76.

Digital Library

[36]

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, USA, 469--480.

Digital Library

[37]

Kernel Virtual Machine. https://www.linux-kvm.org.

[38]

Linux Programmer's Manual MADVISE(2). http://man7.org/linux/man-pages/man2/madvise.2.html.

[39]

Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hållberg, Johan Högberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A Full System Simulation Platform. Computer 35, 2 (Feb. 2002), 50--58.

Digital Library

[40]

Grzegorz Miłós, Derek G. Murray, Steven Hand, and Michael A. Fetterman. 2009. Satori: Enlightened Page Sharing. In Proceedings of the 2009 Conference on USENIX Annual Technical Conference (USENIX'09). USENIX Association, Berkeley, CA, USA, 1--1. http://dl.acm.org/citation.cfm?id=1855807.1855808

Digital Library

[41]

Panagiota Nikolaou, Yiannakis Sazeides, Lorena Ndreu, and Marios Kleanthous. 2015. Modeling the Implications of DRAM Failures and Protection Techniques on Datacenter TCO. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 572--584.

Digital Library

[42]

Openstack. https://www.openstack.org.

[43]

Binh Pham, Ján Veselý, Gabriel H. Loh, and Abhishek Bhattacharjee. 2015. Large Pages and Lightweight Memory Management in Virtualized Environments: Can You Have It Both Ways?. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 1--12.

Digital Library

[44]

RedHat. https://www.redhat.com.

[45]

RedHat. https://www.redhat.com/en/resources/kvm-kernel-based-virtual-machine.

[46]

Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens. 2000. Memory Access Scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA '00). ACM, New York, NY, USA, 128--138.

Digital Library

[47]

A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. Cooper-Balls, and B. Jacob. 2011. The Structural Simulation Toolkit. SIGMETRICS Perform. Eval. Rev. 38, 4 (March 2011), 37--42.

Digital Library

[48]

P. Rosenfeld, E. Cooper-Balls, and B. Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters 10, 1 (Jan 2011), 16--19.

Digital Library

[49]

Yiannakis Sazeides, Emre Özer, Danny Kershaw, Panagiota Nikolaou, Marios Kleanthous, and Jaume Abella. 2013. Implicit-storing and Redundant-encoding-of-attribute Information in Error-correction-codes. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 160--171.

Digital Library

[50]

Ubuntu Server. https://www.ubuntu.com/server.

[51]

Jaewoong Sim, Gabriel H. Loh, Vilas Sridharan, and Mike O'Connor. 2013. Resilient Die-stacked DRAM Caches. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 416--427.

Digital Library

[52]

J.P. Stevenson. Fine-grain In-memory Deduplication for Large-scale Workloads, PhD Thesis, Stanford University, Department of Electrical Engineering. https://books.google.com/books?id=tsjdnQAACAAJ

[53]

Yingying Tian, Samira M. Khan, Daniel A. Jiménez, and Gabriel H. Loh. 2014. Last-level Cache Deduplication. In Proceedings of the 28th ACM International Conference on Supercomputing (ICS '14). ACM, New York, NY, USA, 53--62.

Digital Library

[54]

UKSM. http://kerneldedup.org/.

[55]

Carl A. Waldspurger. 2002. Memory Resource Management in VMware ESX Server. SIGOPS Oper. Syst. Rev. 36, SI (Dec. 2002), 181--194.

Digital Library

[56]

Timothy Wood, Gabriel Tarasuk-Levin, Prashant Shenoy, Peter Desnoyers, Emmanuel Cecchet, and Mark D. Corner. 2009. Memory Buddies: Exploiting Page Sharing for Smart Colocation in Virtualized Data Centers. In Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '09). ACM, New York, NY, USA.

Digital Library

Cited By

Schwedock BBeckmann N(2024)Leviathan: A Unified System for General-Purpose Near-Data Computing2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00095(1278-1294)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00095
Jangra PDuhan M(2024)In-memory computing: characteristics, spintronics, and neural network applications insightsMultiscale and Multidisciplinary Modeling, Experiments and Design10.1007/s41939-024-00517-07:6(5005-5029)Online publication date: 9-Jul-2024
https://doi.org/10.1007/s41939-024-00517-0
Schwedock BYoovidhya PSeibert JBeckmann NSalapura VZahran MChong FTang L(2022)täkōProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527379(42-58)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527379
Show More Cited By

Index Terms

Pageforge: a near-memory content-aware page-merging architecture
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
      2. Software infrastructure
        Virtual machines

Recommendations

Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Replacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Emerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14

Emerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

October 2017

850 pages

ISBN:9781450349529

DOI:10.1145/3123939

General Chairs:
Hillery Hunter
IBM Research
,
Jaime Moreno
IBM Research
,
Program Chairs:
Joel Emer
NVIDIA and MIT
,
Daniel Sanchez
MIT

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF

Conference

MICRO-50

Sponsor:

SIGMICRO
IEEE-CS\DATC

MICRO-50: The 50th Annual IEEE/ACM International Symposium on Microarchitecture

October 14 - 18, 2017

Massachusetts, Cambridge

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
865
Total Downloads

Downloads (Last 12 months)82
Downloads (Last 6 weeks)20

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Schwedock BBeckmann N(2024)Leviathan: A Unified System for General-Purpose Near-Data Computing2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00095(1278-1294)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00095
Jangra PDuhan M(2024)In-memory computing: characteristics, spintronics, and neural network applications insightsMultiscale and Multidisciplinary Modeling, Experiments and Design10.1007/s41939-024-00517-07:6(5005-5029)Online publication date: 9-Jul-2024
https://doi.org/10.1007/s41939-024-00517-0
Schwedock BYoovidhya PSeibert JBeckmann NSalapura VZahran MChong FTang L(2022)täkōProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527379(42-58)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527379
Fu YLu YChen wu ZWu YXiao N(2022)Design and Simulation of Content-Aware Hybrid DRAM-PCM Memory SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.312353933:7(1666-1677)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TPDS.2021.3123539
Roozkhosh SHoornaert DMancuso R(2022)CAESAR: Coherence-Aided Elective and Seamless Alternative Routing via on-chip FPGA2022 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS55097.2022.00038(356-369)Online publication date: Dec-2022
https://doi.org/10.1109/RTSS55097.2022.00038
Han YMa JLi FLiu YXiao NLu YChen Z(2022)A tail-tolerant cloud storage scheduling based on precise periodicity detectionCCF Transactions on High Performance Computing10.1007/s42514-022-00099-84:3(321-338)Online publication date: 23-May-2022
https://doi.org/10.1007/s42514-022-00099-8
Soundararajan NBraun PKhan TKasikci BLitz HSubramoney S(2021)PDede: Partitioned, Deduplicated, Delta Branch Target BufferMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480046(779-791)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480046
Kandemir MRyoo JTang XKarakoy MLee JPetrank E(2021)Compiler support for near data computingProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441600(90-104)Online publication date: 17-Feb-2021
https://dl.acm.org/doi/10.1145/3437801.3441600
Vakil Ghahani SKandemir MKotra J(2020)DSMACM SIGMETRICS Performance Evaluation Review10.1145/3410048.341010148:1(91-92)Online publication date: 9-Jul-2020
https://dl.acm.org/doi/10.1145/3410048.3410101
Vakil Ghahani SKandemir MKotra JYeh EMarkopoulou ATay Y(2020)DSM: A Case for Hardware-Assisted Merging of DRAM Rows with Same ContentAbstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3393691.3394182(91-92)Online publication date: 8-Jun-2020
https://dl.acm.org/doi/10.1145/3393691.3394182
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten