research-article

Cache in Hand: Expander-Driven CXL Prefetcher for Next Generation CXL-SSD

Authors:

Myoungsoo JungAuthors Info & Claims

HotStorage '23: Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems

Pages 24 - 30

https://doi.org/10.1145/3599691.3603406

Published: 10 July 2023 Publication History

Abstract

Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level cache (LLC) prefetching from host CPU to CXL-SSDs. ExPAND uses a heterogeneous prediction algorithm for prefetching and ensures data consistency with CXL.mem's back-invalidation. We examine prefetch timeliness for accurate latency estimation. ExPAND, being aware of CXL multi-tiered switching, provides end-to-end latency for each CXL-SSD and precise prefetch timeliness estimations. Our method reduces CXL-SSD reliance and enables direct host cache access for most data. ExPAND enhances graph application performance by 3.5x, surpassing CXL-SSD pools with diverse prefetching strategies.

References

[1]

JH Oh, Jae Hyo Park, YS Lim, HS Lim, YT Oh, Jong Soo Kim, JM Shin, Young Jun Song, KC Ryoo, DW Lim, et al. Full integration of highly manufacturable 512mb pram based on 90nm technology. In 2006 International Electron Devices Meeting, pages 1--4. IEEE, 2006.

[2]

Wooseong Cheong, Chanho Yoon, Seonghoon Woo, Kyuwook Han, Daehyun Kim, Chulseung Lee, Youra Choi, Shine Kim, Dongku Kang, Geunyeong Yu, et al. A flash memory controller for 15μs ultra-low-latency ssd using high-speed 3d nand flash with 3μs read time. In 2018 IEEE International Solid-State Circuits Conference-(ISSCC), pages 338--340. IEEE, 2018.

[3]

Toshiyuki Kouchi, Noriyasu Kumazaki, Masashi Yamaoka, Sanad Bushnaq, Takuyo Kodama, Yuki Ishizaki, Yoko Deguchi, Akio Sugahara, Akihiro Imamoto, Norichika Asaoka, et al. 13.5 a 128gb 1b/cell 96-word-line-layer 3d flash memory to improve random read latency with t prog= 75μs and t r= 4μs. In 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pages 226--228. IEEE, 2020.

[4]

Samsung Semiconductor. Samsung electronics unveils far-reaching, next-generation memory solutions at flash memory summit 2022, 2022.

[5]

Kioxia Corporation. Kioxia launches second generation of high-performance, cost-effective xl-flash^TM storage class memory solution, 2022.

[6]

Myoungsoo Jung. Hello bytes, bye blocks: Pcie storage meets compute express link for memory expansion (cxl-ssd). In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems, pages 45--51, 2022.

Digital Library

[7]

Yiying Zhang and Steven Swanson. A study of application performance with non-volatile main memory. In 2015 31st Symposium on Mass Storage Systems and Technologies (MSST), pages 1--10. IEEE, 2015.

[8]

Pierre Michaud. Best-offset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469--480. IEEE, 2016.

[9]

Rahul Bera, Anant V Nori, Onur Mutlu, and Sreenivas Subramoney. Dspatch: Dual spatial pattern prefetcher. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 531--544, 2019.

Digital Library

[10]

Mohammad Bakhshalipour, Mehran Shakerinava, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 399--411. IEEE, 2019.

[11]

Akanksha Jain and Calvin Lin. Linearizing irregular memory accesses for improved correlated prefetching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pages 247--259, 2013.

Digital Library

[12]

Mohammad Bakhshalipour, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Domino temporal data prefetcher. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 131--142. IEEE, 2018.

[13]

Stephen Somogyi, Thomas F Wenisch, Anastasia Ailamaki, and Babak Falsafi. Spatio-temporal memory streaming. ACM SIGARCH Computer Architecture News, 37(3):69--80, 2009.

Digital Library

[14]

Part Guide. Intel® 64 and ia-32 architectures software developer's manual. Volume 3B: System programming Guide, Part, 2(11), 2011.

[15]

Chander Chadha. Nvme ssds with persistent memory regions. 2018.

[16]

CXL Consortium. Compute express link 3.0 specification. pages 128--130, 2022.

[17]

Erich Strohmaier and Hongzhang Shan. Apex-map: A global data access benchmark to analyze hpc systems and parallel programming paradigms. In SC'05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, pages 49--49. IEEE, 2005.

Digital Library

[18]

Kartik Lakhotia, Rajgopal Kannan, Sourav Pati, and Viktor Prasanna. Gpop: A scalable cache-and memory-efficient framework for graph processing over parts. ACM Transactions on Parallel Computing (TOPC), 7(1):1--24, 2020.

[19]

Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 12), pages 17--30, 2012.

[20]

Saami Rahman, Martin Burtscher, Ziliang Zong, and Apan Qasem. Maximizing hardware prefetch effectiveness with machine learning. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pages 383--389. IEEE, 2015.

[21]

Pengmiao Zhang, Rajgopal Kannan, and Viktor K Prasanna. Phases, modalities, temporal and spatial locality: Domain specific ml prefetcher for accelerating graph analytics. arXiv preprint arXiv:2212.05250, 2022.

[22]

Milad Hashemi, Kevin Swersky, Jamie Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. Learning memory access patterns. In International Conference on Machine Learning, pages 1919--1928. PMLR, 2018.

[23]

Peter Braun and Heiner Litz. Understanding memory access patterns for prefetching. In International Workshop on AI-assisted Design for Architecture (AIDArc), held in conjunction with ISCA, 2019.

[24]

Ajitesh Srivastava, Ta-Yang Wang, Pengmiao Zhang, Cesar Augusto F De Rose, Rajgopal Kannan, and Viktor K Prasanna. Memmap: Compact and generalizable meta-lstm models for memory access prediction. In Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11--14, 2020, Proceedings, Part II 24, pages 57--68. Springer, 2020.

Digital Library

[25]

Arvind Narayanan, Saurabh Verma, Eman Ramadan, Pariya Babaie, and Zhi-Li Zhang. Deepcache: A deep learning based framework for content caching. In Proceedings of the 2018 Workshop on Network Meets AI & ML, pages 48--53, 2018.

Digital Library

[26]

Pengmiao Zhang, Ajitesh Srivastava, Anant V Nori, Rajgopal Kannan, and Viktor K Prasanna. Transformap: Transformer for memory access prediction. arXiv preprint arXiv:2205.14778, 2022.

[27]

Pengmiao Zhang, Ajitesh Srivastava, Anant V Nori, Rajgopal Kannan, and Viktor K Prasanna. Fine-grained address segmentation for attention-based variable-degree prefetching. In Proceedings of the 19th ACM International Conference on Computing Frontiers, pages 103--112, 2022.

Digital Library

[28]

Zhan Shi, Akanksha Jain, Kevin Swersky, Milad Hashemi, Parthasarathy Ranganathan, and Calvin Lin. A hierarchical neural model of data prefetching. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 861--873, 2021.

Digital Library

[29]

Carey Jewitt, Jeff Bezemer, and Kay O'Halloran. Introducing multimodality. Routledge, 2016.

[30]

Anthony J Myles, Robert N Feudale, Yang Liu, Nathaniel A Woody, and Steven D Brown. An introduction to decision tree modeling. Journal of Chemometrics: A Journal of the Chemometrics Society, 18(6):275--285, 2004.

[31]

CXL Consortium. Compute express link 3.0 specification. pages 398--399, 2022.

[32]

CXL Consortium. Compute express link 3.0 specification. page 129, 2022.

[33]

Jure Leskovec and Rok Sosič. Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1):1, 2016.

[34]

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et al. The gem5 simulator. ACM SIGARCH computer architecture news, 39(2):1--7, 2011.

[35]

Donghyun Gouk, Miryeong Kwon, Jie Zhang, Sungjoon Koh, Wonil Choi, Nam Sung Kim, Mahmut Kandemir, and Myoungsoo Jung. Amber: Enabling precise full-system simulation with detailed modeling of all ssd resources. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 469--481. IEEE, 2018.

Digital Library

[36]

Donghyun Gouk, Sangwon Lee, Miryeong Kwon, and Myoungsoo Jung. Direct access, {High-Performance} memory disaggregation with {DirectCXL}. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), pages 287--294, 2022.

Cited By

Jalalian SPatel SHajidehi MSeltzer MFedorova ABagchi SZhang Y(2024)EXTMEMProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692017(397-408)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692017
Liu YRen YLiu MLi HGuo HMiao XHu XChen HMa XWon Y(2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650702
Zhang XBhimani JPei SLee ELee SSeong YKim EChoi CNam EChoi JKim B(2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
https://dl.acm.org/doi/10.1145/3708992
Show More Cited By

Recommendations

The migration prefetcher: Anticipating data promotion in dynamic NUCA caches
Special Issue on High-Performance Embedded Architectures and Compilers

The exponential increase in multicore processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with a single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been ...
Polaris: Enhancing CXL-based Memory Expanders with Memory-side Prefetching
Advanced Parallel Processing Technologies
Abstract
The use of CXL-based memory expanders introduces increased latency compared to local memory due to control and transmission overheads. This latency difference negatively impacts tasks that are sensitive to latency. While cache prefetching has ...
A PAB-based multi-prefetcher mechanism

Aggressive prefetching mechanisms improve performance of some important applications, but substantially increase bus traffic and "pressure" on cache tag arrays. They may even reduce performance of applications that are not memory bounded. We introduce a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HotStorage '23: Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems

July 2023

131 pages

ISBN:9798400702242

DOI:10.1145/3599691

General Chairs:
Ali Anwar
University of Minnesota
,
Ningfang Mi
Northeastern University
,
Program Chairs:
Vasily Tarasov
IBM Research
,
Yiying Zhang
University of California, San Diego

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

In-Cooperation

USENIX

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

HotStorage '23

Sponsor:

SIGOPS

HotStorage '23: 15th ACM Workshop on Hot Topics in Storage and File Systems

July 9, 2023

MA, Boston, USA

Acceptance Rates

Overall Acceptance Rate 34 of 87 submissions, 39%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
1,645
Total Downloads

Downloads (Last 12 months)670
Downloads (Last 6 weeks)43

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jalalian SPatel SHajidehi MSeltzer MFedorova ABagchi SZhang Y(2024)EXTMEMProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692017(397-408)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692017
Liu YRen YLiu MLi HGuo HMiao XHu XChen HMa XWon Y(2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650702
Zhang XBhimani JPei SLee ELee SSeong YKim EChoi CNam EChoi JKim B(2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
https://dl.acm.org/doi/10.1145/3708992
Tirumalasetty CAnnapareddy N(2024)Contention aware DRAM caching for CXL-enabled pooled memoryProceedings of the International Symposium on Memory Systems10.1145/3695794.3695808(157-171)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3695794.3695808
Wang LZhang XWang SJiang ZLu TChen MLuo SHuang K(2024)Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory AccessACM Transactions on Architecture and Code Optimization10.1145/366347921:3(1-28)Online publication date: 9-May-2024
https://dl.acm.org/doi/10.1145/3663479
Zhang JNguyen MKashyap SKannan S(2024)Context-aware Prefetching for Near-Storage AcceleratorsProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665956(131-136)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665956
Gouk DKang SBae HRyu ELee SKim DJang JJung M(2024)Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL ControllerProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665953(108-115)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665953
Huang WZhou JWang MZhou YZhang XZhu FLi SWang KWu F(2024)TieredHM: Hotspot-Optimized Hash Indexing for Memory-Semantic SSD-Based Hybrid MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335469343:6(1755-1768)Online publication date: Jun-2024
https://doi.org/10.1109/TCAD.2024.3354693
Zhou ZChen YZhang TWang YShu RXu SCheng PQu LXiong YZhang JSun G(2024)NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00111(1518-1531)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00111
Guo YLi G(2024)A CXL- Powered Database System: Opportunities and Challenges2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00447(5593-5604)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00447

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten