skip to main content
10.1145/3079079.3079089acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures

Published: 14 June 2017 Publication History

Abstract

Non-Volatile Memory (NVM) has recently emerged for its nonvolatility, high density and energy efficiency. Hybrid memory systems composed of DRAM and NVM have the best of both worlds, because NVM can offer larger capacity and have near-zero standby power consumption while DRAM provides higher performance. Many studies have advocated to use DRAM as a cache to NVM. However, it is still an open problem on how to manage the DRAM cache effectively and efficiently. In this paper, we propose a novel Hardware/Software Cooperative Caching (HSCC) mechanism that organizes NVM and DRAM in a flat address space while logically supporting a cache/memory hierarchy. HSCC maintains the NVM- to-DRAM address mapping and tracks the access counts of NVM pages through a moderate extension to page tables and TLBs. It significantly simplifies the hardware design and offers several optimization opportunities for cache management in software layers. We thus propose utility-based cache filtering policies to improve the efficiency of DRAM cache. Experimental results show that HSCC improves system performance by up to 9.6X (77.2% on average) and reduces energy consumption by 34.3% on average, compared to a hardware-assisted DRAM/NVM memory system. HSCC also presents 15.4% and 14.5% performance improvement against a flat- addressable memory architecture and a Row Buffer Locality Aware (RBLA) caching policy for hybrid memories, respectively.

References

[1]
John Ousterhout, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, Stephen Rumble, Ryan Stutsman, and Stephen Yang. The RAMCloud Storage System. ACM Trans. Comput. Syst., 33(3):1--55, 2015.
[2]
Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. In-Memory Big Data Management and Processing: A Survey. IEEE Trans. Knowl. and Data Engin., 27(7):1920--1948, 2015.
[3]
Krishna T Malladi, Ian Shaeffer, Liji Gopalakrishnan, David Lo, Benjamin C Lee, and Mark Horowitz. Rethinking DRAM Power Modes for Energy Proportionality. In MICRO, 2012.
[4]
Gaurav Dhiman, Raid Ayoub, and Tajana Rosing. PDRAM: a Hybrid PRAM and DRAM Main Memory System. In DAC, 2009.
[5]
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. Scalable High Performance Main Memory System Using Phase-change Memory Technology. In ISCA, 2009.
[6]
Luiz E Ramos, Eugene Gorbatov, and Ricardo Bianchini. Page Placement in Hybrid Memory Systems. In ICS, 2011.
[7]
Wangyuan Zhang and Tao Li. Exploring Phase Change Memory and 3D Diestacking for Power/Thermal Friendly, Fast and Durable Memory Architectures. In PACT, 2009.
[8]
Justin Meza, Jichuan Chang, HanBin Yoon, Onur Mutlu, and Parthasarathy Ranganathan. Enabling Efficient and Scalable Hybrid Memories using Fine-granularity DRAM Cache Management. IEEE Comput. Archit. Lett., 11(2):61--64, 2012.
[9]
Hyunsun Park, Sungjoo Yoo, and Sunggu Lee. Power Management of Hybrid DRAM/PRAM-based Main Memory. In DAC, 2011.
[10]
HanBin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A Harding, and Onur Mutlu. Row Buffer Locality Aware Caching Policies for Hybrid Memories. In ICCD, 2012.
[11]
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. Architecting Phase Change Memory As a Scalable Dram Alternative. In ISCA, 2009.
[12]
Gabriel H. Loh and Mark D. Hill. Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches. In MICRO, 2011.
[13]
Daniel Sanchez and Christos Kozyrakis. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-core Systems. In ISCA, 2013.
[14]
Matthew Poremba, Tao Zhang, and Yuan Xie. NVMain 2.0: A User-Friendly Memory Simulator to Model (Non-)Volatile Memory Systems. IEEE Comput. Archit. Lett., 14(2):140--143, 2015.
[15]
Subramanya R. Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. Data Tiering in Heterogeneous Memory Systems. In EuroSys, 2016.
[16]
SPEC CPU 2006. https://www.spec.org/cpu2006.
[17]
Parsec. http://parsec.cs.princeton.edu/index.htm.
[18]
PBBS. http://www.cs.cmu.edu/pbbs/.
[19]
Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, Ravishankar Iyer, Srihari Makineni, Donald Newell, Yan Solihin, and Rajeev Balasubramonian. CHOP: Adaptive Filter-based DRAM Caching for CMP Server Platforms. In HPCA, 2010.
[20]
Mitesh R Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H Loh. Heterogeneous Memory Architectures: A HW/SW Approach for Mixing Die-stacked and Off-package Memories. In HPCA, 2015.
[21]
Carlos Villavieja, Vasileios Karakostas, Lluis Vilanova, Yoav Etsion, Alex Ramirez, Avi Mendelson, Nacho Navarro, Adrian Cristal, and Osman S Unsal. Didi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory. In PACT, 2011.
[22]
Richard Rashid, Avadis Tevanian, Michael Young, David Golub, Robert Baron, David Black, William Bolosky, and Jonathan Chew. Machine-independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures. In ASPLOS, 1987.
[23]
SHMA. https://github.com/CGCL-codes/SHMA.
[24]
Neha Agarwal, David Nellans, Mark Stephenson, Mike O'Connor, and Stephen W Keckler. Page Placement Strategies for GPUs within Heterogeneous Memory Systems. In ASPLOS, 2015.
[25]
Jaewoong Sim, Alaa R Alameldeen, Zeshan Chishti, Chris Wilkerson, and Hyesoon Kim. Transparent Hardware Management of Stacked DRAM as Part of Memory. In MICRO, 2014.
[26]
Xiangyu Dong, Yuan Xie, Naveen Muralimanohar, and Norman P Jouppi. Simple but Effective Heterogeneous Main Memory with On-chip Memory Controller Support. In SC, 2010.
[27]
Tae Jun Ham, Bharath K Chelepalli, Neng Xue, and Brian C Lee. Disintegrated Control for Energy-efficient and Heterogeneous Memory Systems. In HPCA, 2013.
[28]
Milan Pavlovic, Nikola Puzovic, and Adrian Ramirez. Data Placement in HPC Architectures with Heterogeneous Off-chip Memory. In ICCD, 2013.
[29]
Alan Bivens, Parijat Dube, Michele Franceschini, John Karidis, Luis Lastras, and Mickey Tsao. Architectural Design for Next Generation Heterogeneous Memory Systems. In IMW, 2010.
[30]
Sujoy Basu and Josep Torrellas. Enhancing Memory Use in Simple Coma: Multiplexed Simple Coma. In HPCA, 1998.
[31]
Erik Hagersten, Anders Landin, and Seif Haridi. DDM-a Cache-only Memory Architecture. Computer, 25(9):44--54, 1992.
[32]
Ashley Saulsbury, Tim Wilkinson, John Carter, and Anders Landin. An Argument for Simple COMA. In HPCA, 1995.
[33]
Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. CAMEO: A Two- Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache. In MICRO, 2014.
[34]
Djordje Jevdjic, Gabriel H Loh, Cansu Kaynak, and Babak Falsafi. Unison Cache: A Scalable and Effective Die-stacked DRAM Cache. In MICRO, 2014.
[35]
Djordje Jevdjic, Stavros Volos, and Babak Falsafi. Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache. In ISCA, 2013.
[36]
Mark Oskin and Gabriel H. Loh. A Software-Managed Approach to Die-Stacked DRAM. In PACT, 2015.
[37]
Yongjun Lee, Jongwon Kim, Hakbeom Jang, Hyunggyun Yang, Jangwoo Kim, Jinkyu Jeong, and Jae W. Lee. A Fully Associative, Tagless DRAM Cache. ACM SIGARCH Comput. Archit. News, 43(3):211--222, 2015.
[38]
Moinuddin K. Qureshi and Gabe H. Loh. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design. In MICRO, 2012.
[39]
Nagendra Gulur, Mahesh Mehendale, R. Manikantan, and R. Govindarajan. Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth. In MICRO, 2014.
[40]
Saurabh Gupta, Hongliang Gao, and Huiyang Zhou. Adaptive Cache Bypassing for Inclusive Last Level Caches. In IPDPS, 2013.
[41]
Mazen Kharbutli and Yan Solihin. Counter-based Cache Replacement and Bypassing Algorithms. IEEE Trans. Comput., 57(4):433--447, 2008.
[42]
Teresa L. Johnson, Daniel A. Connors, Matthew C. Merten, and Wen-mei W. Hwu. Run-time Cache Bypassing. IEEE Trans. Comput., 48(12):1338--1354, 1999.
[43]
Jayesh Gaur, Mainak Chaudhuri, and Sreenivas Subramoney. Bypass and Insertion Algorithms for exclusive last-level caches. ACM SIGARCH Comput. Archit. News, 39(3):81--92, 2011.
[44]
Xuhao Chen, Li-Wen Chang, Christopher I Rodrigues, Jie Lv, Zhiying Wang, and Wen-Mei Hwu. Adaptive Cache Management for Energy-efficient GPU Computing. In MICRO, 2014.
[45]
Xiaolong Xie, Yun Liang, Guangyu Sun, and Deming Chen. An Efficient Compiler Framework for Cache Bypassing on GPUs. In ICCAD, 2013.
[46]
Xiaolong Xie, Yun Liang, Yu Wang, Guangyu Sun, and Tao Wang. Coordinated Static and Dynamic Cache Bypassing for GPUs. In HPCA, 2015.
[47]
Guangyu Sun, Chao Zhang, Peng Li, Tao Wang, and Yiran Chen. Statistical Cache Bypassing for Non-Volatile Memory. IEEE Trans. Comput., 65(11):3427--3440, 2016.

Cited By

View all
  • (2024)A Hybrid Memory Data Placement Strategy for Edge Computing2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00035(211-218)Online publication date: 30-Oct-2024
  • (2024)Mitigating Write Disturbance in Non-Volatile Memory via Coupling Machine Learning with Out-of-Place Updates2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00092(1184-1198)Online publication date: 2-Mar-2024
  • (2024)A read-efficient and write-optimized hash table for Intel Optane DC Persistent MemoryFuture Generation Computer Systems10.1016/j.future.2024.06.028161(49-65)Online publication date: Dec-2024
  • Show More Cited By

Index Terms

  1. Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '17: Proceedings of the International Conference on Supercomputing
    June 2017
    300 pages
    ISBN:9781450350204
    DOI:10.1145/3079079
    • General Chairs:
    • William D. Gropp,
    • Pete Beckman,
    • Program Chairs:
    • Zhiyuan Li,
    • Francisco J. Cazorla
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. caching
    2. hybird memory
    3. non-volatile memory (NVM)

    Qualifiers

    • Research-article

    Conference

    ICS '17
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)93
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Hybrid Memory Data Placement Strategy for Edge Computing2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00035(211-218)Online publication date: 30-Oct-2024
    • (2024)Mitigating Write Disturbance in Non-Volatile Memory via Coupling Machine Learning with Out-of-Place Updates2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00092(1184-1198)Online publication date: 2-Mar-2024
    • (2024)A read-efficient and write-optimized hash table for Intel Optane DC Persistent MemoryFuture Generation Computer Systems10.1016/j.future.2024.06.028161(49-65)Online publication date: Dec-2024
    • (2023)ACTION: Adaptive Cache Block Migration in Distributed Cache ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/357291120:2(1-19)Online publication date: 1-Mar-2023
    • (2022)Transformer: An OS-Supported Reconfigurable Hybrid Memory ArchitectureApplied Sciences10.3390/app12241299512:24(12995)Online publication date: 18-Dec-2022
    • (2022)A Metadata Prefetching Mechanism for Hybrid Memory ArchitecturesIEICE Transactions on Electronics10.1587/transele.2021LHP0004E105.C:6(232-243)Online publication date: 1-Jun-2022
    • (2022)FlexHM: A Practical System for Heterogeneous Memory with Flexible and Efficient Performance OptimizationsACM Transactions on Architecture and Code Optimization10.1145/356588520:1(1-26)Online publication date: 16-Dec-2022
    • (2022)Power-optimized Deployment of Key-value Stores Using Storage Class MemoryACM Transactions on Storage10.1145/351190518:2(1-26)Online publication date: 10-Mar-2022
    • (2022)Software Hint-Driven Data Management for Hybrid Memory in Mobile SystemsACM Transactions on Embedded Computing Systems10.1145/349453621:1(1-18)Online publication date: 14-Jan-2022
    • (2022)SwapKV: A Hotness Aware In-memory Key-Value Store for Hybrid Memory SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3077264(1-1)Online publication date: 2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media