research-article

Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures

Authors:

Rentong GuoAuthors Info & Claims

ICS '17: Proceedings of the International Conference on Supercomputing

Article No.: 26, Pages 1 - 10

https://doi.org/10.1145/3079079.3079089

Published: 14 June 2017 Publication History

Abstract

Non-Volatile Memory (NVM) has recently emerged for its nonvolatility, high density and energy efficiency. Hybrid memory systems composed of DRAM and NVM have the best of both worlds, because NVM can offer larger capacity and have near-zero standby power consumption while DRAM provides higher performance. Many studies have advocated to use DRAM as a cache to NVM. However, it is still an open problem on how to manage the DRAM cache effectively and efficiently. In this paper, we propose a novel Hardware/Software Cooperative Caching (HSCC) mechanism that organizes NVM and DRAM in a flat address space while logically supporting a cache/memory hierarchy. HSCC maintains the NVM- to-DRAM address mapping and tracks the access counts of NVM pages through a moderate extension to page tables and TLBs. It significantly simplifies the hardware design and offers several optimization opportunities for cache management in software layers. We thus propose utility-based cache filtering policies to improve the efficiency of DRAM cache. Experimental results show that HSCC improves system performance by up to 9.6X (77.2% on average) and reduces energy consumption by 34.3% on average, compared to a hardware-assisted DRAM/NVM memory system. HSCC also presents 15.4% and 14.5% performance improvement against a flat- addressable memory architecture and a Row Buffer Locality Aware (RBLA) caching policy for hybrid memories, respectively.

References

[1]

John Ousterhout, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, Stephen Rumble, Ryan Stutsman, and Stephen Yang. The RAMCloud Storage System. ACM Trans. Comput. Syst., 33(3):1--55, 2015.

Digital Library

[2]

Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. In-Memory Big Data Management and Processing: A Survey. IEEE Trans. Knowl. and Data Engin., 27(7):1920--1948, 2015.

[3]

Krishna T Malladi, Ian Shaeffer, Liji Gopalakrishnan, David Lo, Benjamin C Lee, and Mark Horowitz. Rethinking DRAM Power Modes for Energy Proportionality. In MICRO, 2012.

Digital Library

[4]

Gaurav Dhiman, Raid Ayoub, and Tajana Rosing. PDRAM: a Hybrid PRAM and DRAM Main Memory System. In DAC, 2009.

Digital Library

[5]

Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. Scalable High Performance Main Memory System Using Phase-change Memory Technology. In ISCA, 2009.

Digital Library

[6]

Luiz E Ramos, Eugene Gorbatov, and Ricardo Bianchini. Page Placement in Hybrid Memory Systems. In ICS, 2011.

Digital Library

[7]

Wangyuan Zhang and Tao Li. Exploring Phase Change Memory and 3D Diestacking for Power/Thermal Friendly, Fast and Durable Memory Architectures. In PACT, 2009.

Digital Library

[8]

Justin Meza, Jichuan Chang, HanBin Yoon, Onur Mutlu, and Parthasarathy Ranganathan. Enabling Efficient and Scalable Hybrid Memories using Fine-granularity DRAM Cache Management. IEEE Comput. Archit. Lett., 11(2):61--64, 2012.

Digital Library

[9]

Hyunsun Park, Sungjoo Yoo, and Sunggu Lee. Power Management of Hybrid DRAM/PRAM-based Main Memory. In DAC, 2011.

Digital Library

[10]

HanBin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A Harding, and Onur Mutlu. Row Buffer Locality Aware Caching Policies for Hybrid Memories. In ICCD, 2012.

Digital Library

[11]

Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. Architecting Phase Change Memory As a Scalable Dram Alternative. In ISCA, 2009.

Digital Library

[12]

Gabriel H. Loh and Mark D. Hill. Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches. In MICRO, 2011.

Digital Library

[13]

Daniel Sanchez and Christos Kozyrakis. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-core Systems. In ISCA, 2013.

Digital Library

[14]

Matthew Poremba, Tao Zhang, and Yuan Xie. NVMain 2.0: A User-Friendly Memory Simulator to Model (Non-)Volatile Memory Systems. IEEE Comput. Archit. Lett., 14(2):140--143, 2015.

Digital Library

[15]

Subramanya R. Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. Data Tiering in Heterogeneous Memory Systems. In EuroSys, 2016.

Digital Library

[16]

SPEC CPU 2006. https://www.spec.org/cpu2006.

[17]

Parsec. http://parsec.cs.princeton.edu/index.htm.

[18]

PBBS. http://www.cs.cmu.edu/pbbs/.

[19]

Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, Ravishankar Iyer, Srihari Makineni, Donald Newell, Yan Solihin, and Rajeev Balasubramonian. CHOP: Adaptive Filter-based DRAM Caching for CMP Server Platforms. In HPCA, 2010.

[20]

Mitesh R Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H Loh. Heterogeneous Memory Architectures: A HW/SW Approach for Mixing Die-stacked and Off-package Memories. In HPCA, 2015.

[21]

Carlos Villavieja, Vasileios Karakostas, Lluis Vilanova, Yoav Etsion, Alex Ramirez, Avi Mendelson, Nacho Navarro, Adrian Cristal, and Osman S Unsal. Didi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory. In PACT, 2011.

Digital Library

[22]

Richard Rashid, Avadis Tevanian, Michael Young, David Golub, Robert Baron, David Black, William Bolosky, and Jonathan Chew. Machine-independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures. In ASPLOS, 1987.

Digital Library

[23]

SHMA. https://github.com/CGCL-codes/SHMA.

[24]

Neha Agarwal, David Nellans, Mark Stephenson, Mike O'Connor, and Stephen W Keckler. Page Placement Strategies for GPUs within Heterogeneous Memory Systems. In ASPLOS, 2015.

Digital Library

[25]

Jaewoong Sim, Alaa R Alameldeen, Zeshan Chishti, Chris Wilkerson, and Hyesoon Kim. Transparent Hardware Management of Stacked DRAM as Part of Memory. In MICRO, 2014.

Digital Library

[26]

Xiangyu Dong, Yuan Xie, Naveen Muralimanohar, and Norman P Jouppi. Simple but Effective Heterogeneous Main Memory with On-chip Memory Controller Support. In SC, 2010.

Digital Library

[27]

Tae Jun Ham, Bharath K Chelepalli, Neng Xue, and Brian C Lee. Disintegrated Control for Energy-efficient and Heterogeneous Memory Systems. In HPCA, 2013.

Digital Library

[28]

Milan Pavlovic, Nikola Puzovic, and Adrian Ramirez. Data Placement in HPC Architectures with Heterogeneous Off-chip Memory. In ICCD, 2013.

[29]

Alan Bivens, Parijat Dube, Michele Franceschini, John Karidis, Luis Lastras, and Mickey Tsao. Architectural Design for Next Generation Heterogeneous Memory Systems. In IMW, 2010.

[30]

Sujoy Basu and Josep Torrellas. Enhancing Memory Use in Simple Coma: Multiplexed Simple Coma. In HPCA, 1998.

Digital Library

[31]

Erik Hagersten, Anders Landin, and Seif Haridi. DDM-a Cache-only Memory Architecture. Computer, 25(9):44--54, 1992.

Digital Library

[32]

Ashley Saulsbury, Tim Wilkinson, John Carter, and Anders Landin. An Argument for Simple COMA. In HPCA, 1995.

Digital Library

[33]

Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. CAMEO: A Two- Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache. In MICRO, 2014.

Digital Library

[34]

Djordje Jevdjic, Gabriel H Loh, Cansu Kaynak, and Babak Falsafi. Unison Cache: A Scalable and Effective Die-stacked DRAM Cache. In MICRO, 2014.

Digital Library

[35]

Djordje Jevdjic, Stavros Volos, and Babak Falsafi. Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache. In ISCA, 2013.

Digital Library

[36]

Mark Oskin and Gabriel H. Loh. A Software-Managed Approach to Die-Stacked DRAM. In PACT, 2015.

Digital Library

[37]

Yongjun Lee, Jongwon Kim, Hakbeom Jang, Hyunggyun Yang, Jangwoo Kim, Jinkyu Jeong, and Jae W. Lee. A Fully Associative, Tagless DRAM Cache. ACM SIGARCH Comput. Archit. News, 43(3):211--222, 2015.

Digital Library

[38]

Moinuddin K. Qureshi and Gabe H. Loh. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design. In MICRO, 2012.

Digital Library

[39]

Nagendra Gulur, Mahesh Mehendale, R. Manikantan, and R. Govindarajan. Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth. In MICRO, 2014.

[40]

Saurabh Gupta, Hongliang Gao, and Huiyang Zhou. Adaptive Cache Bypassing for Inclusive Last Level Caches. In IPDPS, 2013.

Digital Library

[41]

Mazen Kharbutli and Yan Solihin. Counter-based Cache Replacement and Bypassing Algorithms. IEEE Trans. Comput., 57(4):433--447, 2008.

Digital Library

[42]

Teresa L. Johnson, Daniel A. Connors, Matthew C. Merten, and Wen-mei W. Hwu. Run-time Cache Bypassing. IEEE Trans. Comput., 48(12):1338--1354, 1999.

Digital Library

[43]

Jayesh Gaur, Mainak Chaudhuri, and Sreenivas Subramoney. Bypass and Insertion Algorithms for exclusive last-level caches. ACM SIGARCH Comput. Archit. News, 39(3):81--92, 2011.

Digital Library

[44]

Xuhao Chen, Li-Wen Chang, Christopher I Rodrigues, Jie Lv, Zhiying Wang, and Wen-Mei Hwu. Adaptive Cache Management for Energy-efficient GPU Computing. In MICRO, 2014.

Digital Library

[45]

Xiaolong Xie, Yun Liang, Guangyu Sun, and Deming Chen. An Efficient Compiler Framework for Cache Bypassing on GPUs. In ICCAD, 2013.

Digital Library

[46]

Xiaolong Xie, Yun Liang, Yu Wang, Guangyu Sun, and Tao Wang. Coordinated Static and Dynamic Cache Bypassing for GPUs. In HPCA, 2015.

[47]

Guangyu Sun, Chao Zhang, Peng Li, Tao Wang, and Yiran Chen. Statistical Cache Bypassing for Non-Volatile Memory. IEEE Trans. Comput., 65(11):3427--3440, 2016.

Digital Library

Cited By

Lin BZhang JQiao X(2024)A Hybrid Memory Data Placement Strategy for Edge Computing2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00035(211-218)Online publication date: 30-Oct-2024
https://doi.org/10.1109/ISPA63168.2024.00035
Wu RShen ZYang ZShu J(2024)Mitigating Write Disturbance in Non-Volatile Memory via Coupling Machine Learning with Out-of-Place Updates2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00092(1184-1198)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00092
Li ZHuang K(2024)A read-efficient and write-optimized hash table for Intel Optane DC Persistent MemoryFuture Generation Computer Systems10.1016/j.future.2024.06.028161(49-65)Online publication date: Dec-2024
https://doi.org/10.1016/j.future.2024.06.028
Show More Cited By

Index Terms

Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures
1. Computer systems organization
  1. Architectures

Recommendations

Designing a secure DRAM+NVM hybrid memory module
CF '19: Proceedings of the 16th ACM International Conference on Computing Frontiers

Non-Volatile Memory (NVM) such as PCM has emerged as a potential alternative for main memory due to its high density and low leakage power. However, an NVM main-memory system faces three challenges when compared to Dynamic Random Access Memory (DRAM) - ...
Migration based page caching algorithm for a hybrid main memory of DRAM and PRAM
SAC '11: Proceedings of the 2011 ACM Symposium on Applied Computing

As the DRAM based main memory significantly increases the power and cost budget of a computer system, new memory technologies such as Phase-change RAM (PRAM), Ferroelectric RAM (FRAM), and Magnetic RAM (MRAM) have been proposed to replace the DRAM. ...
NCRedis: An NVM-Optimized Redis with Memory Caching
Database and Expert Systems Applications
Abstract
Non-volatile memory (NVM) has byte-addressability and data-durability. Redis, a popular in-memory kv-store system, can persist data when replacing DRAM with NVM. However, to implement NVM Redis, we need to use general NVM allocators to obtain NVM ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '17: Proceedings of the International Conference on Supercomputing

June 2017

300 pages

ISBN:9781450350204

DOI:10.1145/3079079

General Chairs:
William D. Gropp
University of Illinois at Urbana-Champaign, Illinois
,
Pete Beckman
Argonne National Laboratory/Northwestern University, Illinois
,
Program Chairs:
Zhiyuan Li
Purdue University, West Lafayette, Indiana
,
Francisco J. Cazorla
IIIA-CSIC and Barcelona Supercomputing Center, Barcelona, Spain

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS '17

Sponsor:

SIGARCH

ICS '17: 2017 International Conference on Supercomputing

June 14 - 16, 2017

Illinois, Chicago

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
1,545
Total Downloads

Downloads (Last 12 months)93
Downloads (Last 6 weeks)6

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lin BZhang JQiao X(2024)A Hybrid Memory Data Placement Strategy for Edge Computing2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00035(211-218)Online publication date: 30-Oct-2024
https://doi.org/10.1109/ISPA63168.2024.00035
Wu RShen ZYang ZShu J(2024)Mitigating Write Disturbance in Non-Volatile Memory via Coupling Machine Learning with Out-of-Place Updates2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00092(1184-1198)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00092
Li ZHuang K(2024)A read-efficient and write-optimized hash table for Intel Optane DC Persistent MemoryFuture Generation Computer Systems10.1016/j.future.2024.06.028161(49-65)Online publication date: Dec-2024
https://doi.org/10.1016/j.future.2024.06.028
Mummidi CKundu S(2023)ACTION: Adaptive Cache Block Migration in Distributed Cache ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/357291120:2(1-19)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1145/3572911
Chi YLiu HPeng GLiao XJin H(2022)Transformer: An OS-Supported Reconfigurable Hybrid Memory ArchitectureApplied Sciences10.3390/app12241299512:24(12995)Online publication date: 18-Dec-2022
https://doi.org/10.3390/app122412995
TSUKADA STAKAYASHIKI HSATO MKOMATSU KKOBAYASHI H(2022)A Metadata Prefetching Mechanism for Hybrid Memory ArchitecturesIEICE Transactions on Electronics10.1587/transele.2021LHP0004E105.C:6(232-243)Online publication date: 1-Jun-2022
https://doi.org/10.1587/transele.2021LHP0004
Peng BDong YYao JWu FGuan H(2022)FlexHM: A Practical System for Heterogeneous Memory with Flexible and Efficient Performance OptimizationsACM Transactions on Architecture and Code Optimization10.1145/356588520:1(1-26)Online publication date: 16-Dec-2022
https://dl.acm.org/doi/10.1145/3565885
Kassa HAkers JGhosh MCao ZGogte VDreslinski R(2022)Power-optimized Deployment of Key-value Stores Using Storage Class MemoryACM Transactions on Storage10.1145/351190518:2(1-26)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1145/3511905
Wen FQin MGratz PReddy N(2022)Software Hint-Driven Data Management for Hybrid Memory in Mobile SystemsACM Transactions on Embedded Computing Systems10.1145/349453621:1(1-18)Online publication date: 14-Jan-2022
https://dl.acm.org/doi/10.1145/3494536
Cui LHe KLi YLi PZhang JWang GLiu X(2022)SwapKV: A Hotness Aware In-memory Key-Value Store for Hybrid Memory SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3077264(1-1)Online publication date: 2022
https://doi.org/10.1109/TKDE.2021.3077264
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten