A hybrid memory architecture supporting fine-grained data migration

Chi, Ye; Yue, Jianhui; Liao, Xiaofei; Liu, Haikun; Jin, Hai

doi:10.1007/s11704-023-2675-y

A hybrid memory architecture supporting fine-grained data migration

Research Article
Published: 22 January 2024

Volume 18, article number 182103, (2024)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Ye Chi¹,
Jianhui Yue²,
Xiaofei Liao¹,
Haikun Liu¹ &
…
Hai Jin¹

114 Accesses
1 Altmetric
Explore all metrics

Abstract

Hybrid memory systems composed of dynamic random access memory (DRAM) and Non-volatile memory (NVM) often exploit page migration technologies to fully take the advantages of different memory media. Most previous proposals usually migrate data at a granularity of 4 KB pages, and thus waste memory bandwidth and DRAM resource. In this paper, we propose Mocha, a non-hierarchical architecture that organizes DRAM and NVM in a flat address space physically, but manages them in a cache/memory hierarchy. Since the commercial NVM device-Intel Optane DC Persistent Memory Modules (DCPMM) actually access the physical media at a granularity of 256 bytes (an Optane block), we manage the DRAM cache at the 256-byte size to adapt to this feature of Optane. This design not only enables fine-grained data migration and management for the DRAM cache, but also avoids write amplification for Intel Optane DCPMM. We also create an Indirect Address Cache (IAC) in Hybrid Memory Controller (HMC) and propose a reverse address mapping table in the DRAM to speed up address translation and cache replacement. Moreover, we exploit a utility-based caching mechanism to filter cold blocks in the NVM, and further improve the efficiency of the DRAM cache. We implement Mocha in an architectural simulator. Experimental results show that Mocha can improve application performance by 8.2% on average (up to 24.6%), reduce 6.9% energy consumption and 25.9% data migration traffic on average, compared with a typical hybrid memory architecture–HSCC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing

Article 30 January 2021

Main memory controller with multiple media technologies for big data workloads

Article Open access 22 May 2023

Resource abstraction and data placement for distributed hybrid memory pool

Article 16 January 2021

References

Li J, Lam C. Phase change memory. Science China Information Sciences, 2011, 54(5): 1061–1072
Article Google Scholar
Cai M, Huang H. A survey of operating system support for persistent memory. Frontiers of Computer Science, 2021, 15(4): 154207
Article Google Scholar
Izraelevitz J, Yang J, Zhang L, Kim J, Liu X, Memaripour A, Soh Y J, Wang Z, Xu Y, Dulloor S R, Zhao J, Swanson S. Basic performance measurements of the INTEL optane DC persistent memory module. 2019, arXiv preprint arXiv: 1903.05714
Loh G, Hill M D. Supporting very large DRAM caches with compound-access scheduling and missmap. IEEE Micro, 2012, 32(3): 70–78
Article Google Scholar
Liu H, Chen Y, Liao X, Jin H, He B, Zheng L, Guo R. Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures. In: Proceedings of International Conference on Supercomputing. 2017, 26
Qureshi M K, Srinivasan V, Rivers J A. Scalable high performance main memory system using phase-change memory technology. In: Proceedings of the 36th Annual International Symposium on Computer Architecture. 2009, 24–33
Yoon H, Meza J, Ausavarungnirun R, Harding R A, Mutlu O. Row buffer locality aware caching policies for hybrid memories. In: Proceedings of the 30th IEEE International Conference on Computer Design. 2012, 337–344
Chen C, An J. DRAM write-only-cache for improving lifetime of phase change memory. In: Proceedings of the 59th IEEE International Midwest Symposium on Circuits and Systems. 2016, 1–4
Awad A, Basu A, Blagodurov S, Solihin Y, Loh G H. Avoiding TLB shootdowns through self-invalidating TLB entries. In: Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 2017, 273–287
Vasilakis E, Papaefstathiou V, Trancoso P, Sourdis I. LLC-guided data migration in hybrid memory systems. In: Proceedings of 2019 IEEE International Parallel and Distributed Processing Symposium. 2019, 932–942
Loh G H, Hill M D. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 2011, 454–464
Jevdjic D, Loh G H, Kaynak C, Falsafi B. Unison cache: a scalable and effective die-stacked DRAM cache. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. 2014, 25–37
Hallnor E G, Reinhardt S K. A fully associative software-managed cache design. In: Proceedings of the 27th International Symposium on Computer Architecture. 2000, 107–116
Oskin M, Loh G H. A software-managed approach to die-stacked DRAM. In: Proceedings of 2015 International Conference on Parallel Architecture and Compilation. 2015, 188–200
Wang X, Liu H, Liao X, Chen J, Jin H, Zhang Y, Zheng L, He B, Jiang S. Supporting superpages and lightweight page migration in hybrid memory systems. ACM Transactions on Architecture and Code Optimization, 2019, 16(2): 11
Article Google Scholar
Ryoo J H, John L K, Basu A. A case for granularity aware page migration. In: Proceedings of 2018 International Conference on Supercomputing. 2018, 352–362
Sanchez D, Kozyrakis C. ZSim: fast and accurate microarchitectural simulation of thousand-core systems. ACM SIGARCH Computer Architecture News, 2013, 41(3): 475–486
Article Google Scholar
Poremba M, Xie Y. Nvmain: an architectural-level main memory simulator for emerging non-volatile memories. In: Proceedings of 2012 IEEE Computer Society Annual Symposium on VLSI. 2012, 392–397
Poremba M, Zhang T, Xie Y. Nvmain 2.0: a user-friendly memory simulator to model (non-)volatile memory systems. IEEE Computer Architecture Letters, 2015, 14(2): 140–143
Article Google Scholar
Hao Y, Xiang S, Han G, Zhang J, Ma X, Zhu Z, Guo X, Zhang Y, Han Y, Song Z, Liu Y, Yang L, Zhou H, Shi J, Zhang W, Xu M, Zhao W, Pan B, Huang Y, Liu Q, Cai Y, Zhu J, Ou X, You T, Wu H, Gao B, Zhang Z, Guo G, Chen Y, Liu Y, Chen X, Xue C, Wang X, Zhao L, Zou X, Yan L, Li M. Recent progress of integrated circuits and optoelectronic chips. Science China Information Sciences, 2021, 64(10): 201401
Article Google Scholar
Lu Y, Wu D, He B, Tang X, Xu J, Guo M. Rank-aware dynamic migrations and adaptive demotions for dram power management. IEEE Transactions on Computers, 2016, 65(1): 187–202
Article MathSciNet Google Scholar
Lu Y, He B, Tang X, Guo M. Synergy of dynamic frequency scaling and demotion on DRAM power management: models and optimizations. IEEE Transactions on Computers, 2015, 64(8): 2367–2381
Article MathSciNet Google Scholar
Mittal S, Vetter J S. A survey of software techniques for using nonvolatile memories for storage and main memory systems. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(5): 1537–1550
Article Google Scholar
Zhang J, Guo M, Wu C, Chen Y. Toward multi-programmed workloads with different memory footprints: a self-adaptive last level cache scheduling scheme. Science China Information Sciences, 2018, 61(1): 012105
Article Google Scholar
Gulur N, Mehendale M, Manikantan R, Govindarajan R. Bi-modal DRAM cache: Improving hit rate, hit latency and bandwidth. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. 2014, 38–50
Huang C C, Nagarajan V. ATCache: reducing DRAM cache latency via a small SRAM tag cache. In: Proceedings of the 23rd International Conference on Parallel Architecture and Compilation Techniques. 2014, 51–60
Yang D, Liu H, Jin H, Zhang Y. HMvisor: dynamic hybrid memory management for virtual machines. Science China Information Sciences, 2021, 64(9): 192104
Article Google Scholar
Chen T, Liu H, Liao X, Jin H. Resource abstraction and data placement for distributed hybrid memory pool. Frontiers of Computer Science, 2021, 15(3): 153103
Article Google Scholar
Jiang X, Madan N, Zhao L, Upton M, Iyer R, Makineni S, Newell D, Solihin Y, Balasubramonian R. CHOP: adaptive filter-based DRAM caching for CMP server platforms. In: Proceedings of the 16th International Symposium on High-Performance Computer Architecture. 2010, 1–12
Chen P, Yue J, Liao X, Jin H. Trade-off between hit rate and hit latency for optimizing dram cache. IEEE Transactions on Emerging Topics in Computing, 2021, 9(1): 55–64
Google Scholar
Luk C K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi V J, Hazelwood K. Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Notices, 2005, 40(6): 190–200
Article Google Scholar
Lee B C, Ipek E, Mutlu O, Burger D. Architecting phase change memory as a scalable DRAM alternative. ACM SIGARCH Computer Architecture News, 2009, 37(3): 2–13
Article Google Scholar
Henning J L. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, 2006, 34(4): 1–17
Article Google Scholar
Shun J, Blelloch G E, Fineman J T, Gibbons P B, Kyrola A, Simhadri H V, Tangwongsan K. Brief announcement: the problem based benchmark suite. In: Proceedings of the 24th Annual ACM Symposium on Parallelism in Algorithms and Architectures. 2012, 68–70
Bienia C, Kumar S, Singh J P, Li K. The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of 2008 International Conference on Parallel Architectures and Compilation Techniques. 2008, 72–81
Zhang Q, Sui X, Hou R, Zhang L. Line-coalescing dram cache. Sustainable Computing: Informatics and Systems, 2021, 29: 100449
Google Scholar
Jevdjic D, Volos S, Falsafi B. Die-stacked dram caches for servers: Hit ratio, latency, or bandwidth? Have it all with footprint cache ACM SIGARCH Computer Architecture News, 2013, 41(3): 404–415
Article Google Scholar
Agarwal N, Wenisch T F. Thermostat: application-transparent page management for two-tiered main memory. In: Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 2017, 631–644
Aswathy N S, Bhavanasi S, Sarkar A, Kapoor H K. SRS-Mig: selection and run-time scheduling of page migration for improved response time in hybrid PCM-DRAM memories. In: Proceedings of Great Lakes Symposium on VLSI 2022. 2022, 217–222

Download references

Acknowledgements

This work was supported jointly by the National Key Research and Development Program of China (No. 2022YFB4500303), and the National Natural Science Foundation of China (NSFC) (Grant Nos. 62072198, 61832006, 61825202, 61929103).

Author information

Authors and Affiliations

National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Ye Chi, Xiaofei Liao, Haikun Liu & Hai Jin
Department of Computer Science, Michigan Technological University, Michigan, 49931, USA
Jianhui Yue

Authors

Ye Chi
View author publications
You can also search for this author inPubMed Google Scholar
Jianhui Yue
View author publications
You can also search for this author inPubMed Google Scholar
Xiaofei Liao
View author publications
You can also search for this author inPubMed Google Scholar
Haikun Liu
View author publications
You can also search for this author inPubMed Google Scholar
Hai Jin
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xiaofei Liao.

Additional information

Ye Chi received the BS degree from the Huazhong University of Science and Technology (HUST), China in 2016. He is now working toward the PhD degree in the School of Computer Science and Technology, HUST, China. His research interests focus on computer architecture, die-stacked DRAM, HMC and hybrid memory system architecture.

Jianhui Yue received the PhD degree from the University of Maine, USA in 2012. He is an assistant professor of the Computer Science Department, Michigan Technological University, USA. Before joining Michigan Technological University, he was a visiting assistant professor at Miami University, USA. His research interests include computer architecture and systems. He served as the program committee of international conferences, including ICPP and CCGrid. He received the Best Paper Award at IEEE CLUSTER’07 and was the Best Paper Award candidate at HPCA’13.

Xiaofei Liao received the PhD degree in computer science and engineering from the Huazhong University of Science and Technology (HUST), China in 2005. He has served as a reviewer for many conferences and journal papers. His research interests are in the areas of system software, P2P system, cluster computing, and streaming services. He is a member of the IEEE and the IEEE Computer Society.

Haikun Liu is a professor in the School of Computer Science and Technology, Huazhong University of Science and Technology(HUST), China. He received his PhD degree in computer science and technology from HUST, China in 2012. His current research interests include inmemory computing, virtualization technologies, cloud computing, and distributed systems. He is a senior member of CCF and a member of the IEEE.

Hai Jin is a Chair Professor of computer science and engineering at Huazhong University of Science and Technology (HUST), China. Jin received his PhD in computer engineering from HUST in 1994. In 1996, he was awarded a German Academic Exchange Service fellowship to visit the Technical University of Chemnitz in Germany. Jin worked at The University of Hong Kong, China between 1998 and 2000, and as a visiting scholar at the University of Southern California between 1999 and 2000. He was awarded Excellent Youth Award from the National Science Foundation of China in 2001. Jin is a Fellow of IEEE, Fellow of CCF, and a life member of the ACM. He has co-authored more than 20 books and published over 900 research papers. His research interests include computer architecture, parallel and distributed computing, big data processing, data storage, and system security.

Electronic Supplementary Material