skip to main content
10.1145/3599691.3603406acmconferencesArticle/Chapter ViewAbstractPublication PageshotstorageConference Proceedingsconference-collections
research-article

Cache in Hand: Expander-Driven CXL Prefetcher for Next Generation CXL-SSD

Published: 10 July 2023 Publication History

Abstract

Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level cache (LLC) prefetching from host CPU to CXL-SSDs. ExPAND uses a heterogeneous prediction algorithm for prefetching and ensures data consistency with CXL.mem's back-invalidation. We examine prefetch timeliness for accurate latency estimation. ExPAND, being aware of CXL multi-tiered switching, provides end-to-end latency for each CXL-SSD and precise prefetch timeliness estimations. Our method reduces CXL-SSD reliance and enables direct host cache access for most data. ExPAND enhances graph application performance by 3.5x, surpassing CXL-SSD pools with diverse prefetching strategies.

References

[1]
JH Oh, Jae Hyo Park, YS Lim, HS Lim, YT Oh, Jong Soo Kim, JM Shin, Young Jun Song, KC Ryoo, DW Lim, et al. Full integration of highly manufacturable 512mb pram based on 90nm technology. In 2006 International Electron Devices Meeting, pages 1--4. IEEE, 2006.
[2]
Wooseong Cheong, Chanho Yoon, Seonghoon Woo, Kyuwook Han, Daehyun Kim, Chulseung Lee, Youra Choi, Shine Kim, Dongku Kang, Geunyeong Yu, et al. A flash memory controller for 15μs ultra-low-latency ssd using high-speed 3d nand flash with 3μs read time. In 2018 IEEE International Solid-State Circuits Conference-(ISSCC), pages 338--340. IEEE, 2018.
[3]
Toshiyuki Kouchi, Noriyasu Kumazaki, Masashi Yamaoka, Sanad Bushnaq, Takuyo Kodama, Yuki Ishizaki, Yoko Deguchi, Akio Sugahara, Akihiro Imamoto, Norichika Asaoka, et al. 13.5 a 128gb 1b/cell 96-word-line-layer 3d flash memory to improve random read latency with t prog= 75μs and t r= 4μs. In 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pages 226--228. IEEE, 2020.
[4]
Samsung Semiconductor. Samsung electronics unveils far-reaching, next-generation memory solutions at flash memory summit 2022, 2022.
[5]
Kioxia Corporation. Kioxia launches second generation of high-performance, cost-effective xl-flashTM storage class memory solution, 2022.
[6]
Myoungsoo Jung. Hello bytes, bye blocks: Pcie storage meets compute express link for memory expansion (cxl-ssd). In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems, pages 45--51, 2022.
[7]
Yiying Zhang and Steven Swanson. A study of application performance with non-volatile main memory. In 2015 31st Symposium on Mass Storage Systems and Technologies (MSST), pages 1--10. IEEE, 2015.
[8]
Pierre Michaud. Best-offset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469--480. IEEE, 2016.
[9]
Rahul Bera, Anant V Nori, Onur Mutlu, and Sreenivas Subramoney. Dspatch: Dual spatial pattern prefetcher. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 531--544, 2019.
[10]
Mohammad Bakhshalipour, Mehran Shakerinava, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 399--411. IEEE, 2019.
[11]
Akanksha Jain and Calvin Lin. Linearizing irregular memory accesses for improved correlated prefetching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pages 247--259, 2013.
[12]
Mohammad Bakhshalipour, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Domino temporal data prefetcher. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 131--142. IEEE, 2018.
[13]
Stephen Somogyi, Thomas F Wenisch, Anastasia Ailamaki, and Babak Falsafi. Spatio-temporal memory streaming. ACM SIGARCH Computer Architecture News, 37(3):69--80, 2009.
[14]
Part Guide. Intel® 64 and ia-32 architectures software developer's manual. Volume 3B: System programming Guide, Part, 2(11), 2011.
[15]
Chander Chadha. Nvme ssds with persistent memory regions. 2018.
[16]
CXL Consortium. Compute express link 3.0 specification. pages 128--130, 2022.
[17]
Erich Strohmaier and Hongzhang Shan. Apex-map: A global data access benchmark to analyze hpc systems and parallel programming paradigms. In SC'05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, pages 49--49. IEEE, 2005.
[18]
Kartik Lakhotia, Rajgopal Kannan, Sourav Pati, and Viktor Prasanna. Gpop: A scalable cache-and memory-efficient framework for graph processing over parts. ACM Transactions on Parallel Computing (TOPC), 7(1):1--24, 2020.
[19]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 12), pages 17--30, 2012.
[20]
Saami Rahman, Martin Burtscher, Ziliang Zong, and Apan Qasem. Maximizing hardware prefetch effectiveness with machine learning. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pages 383--389. IEEE, 2015.
[21]
Pengmiao Zhang, Rajgopal Kannan, and Viktor K Prasanna. Phases, modalities, temporal and spatial locality: Domain specific ml prefetcher for accelerating graph analytics. arXiv preprint arXiv:2212.05250, 2022.
[22]
Milad Hashemi, Kevin Swersky, Jamie Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. Learning memory access patterns. In International Conference on Machine Learning, pages 1919--1928. PMLR, 2018.
[23]
Peter Braun and Heiner Litz. Understanding memory access patterns for prefetching. In International Workshop on AI-assisted Design for Architecture (AIDArc), held in conjunction with ISCA, 2019.
[24]
Ajitesh Srivastava, Ta-Yang Wang, Pengmiao Zhang, Cesar Augusto F De Rose, Rajgopal Kannan, and Viktor K Prasanna. Memmap: Compact and generalizable meta-lstm models for memory access prediction. In Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11--14, 2020, Proceedings, Part II 24, pages 57--68. Springer, 2020.
[25]
Arvind Narayanan, Saurabh Verma, Eman Ramadan, Pariya Babaie, and Zhi-Li Zhang. Deepcache: A deep learning based framework for content caching. In Proceedings of the 2018 Workshop on Network Meets AI & ML, pages 48--53, 2018.
[26]
Pengmiao Zhang, Ajitesh Srivastava, Anant V Nori, Rajgopal Kannan, and Viktor K Prasanna. Transformap: Transformer for memory access prediction. arXiv preprint arXiv:2205.14778, 2022.
[27]
Pengmiao Zhang, Ajitesh Srivastava, Anant V Nori, Rajgopal Kannan, and Viktor K Prasanna. Fine-grained address segmentation for attention-based variable-degree prefetching. In Proceedings of the 19th ACM International Conference on Computing Frontiers, pages 103--112, 2022.
[28]
Zhan Shi, Akanksha Jain, Kevin Swersky, Milad Hashemi, Parthasarathy Ranganathan, and Calvin Lin. A hierarchical neural model of data prefetching. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 861--873, 2021.
[29]
Carey Jewitt, Jeff Bezemer, and Kay O'Halloran. Introducing multimodality. Routledge, 2016.
[30]
Anthony J Myles, Robert N Feudale, Yang Liu, Nathaniel A Woody, and Steven D Brown. An introduction to decision tree modeling. Journal of Chemometrics: A Journal of the Chemometrics Society, 18(6):275--285, 2004.
[31]
CXL Consortium. Compute express link 3.0 specification. pages 398--399, 2022.
[32]
CXL Consortium. Compute express link 3.0 specification. page 129, 2022.
[33]
Jure Leskovec and Rok Sosič. Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1):1, 2016.
[34]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et al. The gem5 simulator. ACM SIGARCH computer architecture news, 39(2):1--7, 2011.
[35]
Donghyun Gouk, Miryeong Kwon, Jie Zhang, Sungjoon Koh, Wonil Choi, Nam Sung Kim, Mahmut Kandemir, and Myoungsoo Jung. Amber: Enabling precise full-system simulation with detailed modeling of all ssd resources. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 469--481. IEEE, 2018.
[36]
Donghyun Gouk, Sangwon Lee, Miryeong Kwon, and Myoungsoo Jung. Direct access, {High-Performance} memory disaggregation with {DirectCXL}. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), pages 287--294, 2022.

Cited By

View all
  • (2024)EXTMEMProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692017(397-408)Online publication date: 10-Jul-2024
  • (2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024
  • (2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HotStorage '23: Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems
July 2023
131 pages
ISBN:9798400702242
DOI:10.1145/3599691
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • USENIX

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2023

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

HotStorage '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 34 of 87 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)670
  • Downloads (Last 6 weeks)43
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)EXTMEMProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692017(397-408)Online publication date: 10-Jul-2024
  • (2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024
  • (2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
  • (2024)Contention aware DRAM caching for CXL-enabled pooled memoryProceedings of the International Symposium on Memory Systems10.1145/3695794.3695808(157-171)Online publication date: 30-Sep-2024
  • (2024)Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory AccessACM Transactions on Architecture and Code Optimization10.1145/366347921:3(1-28)Online publication date: 9-May-2024
  • (2024)Context-aware Prefetching for Near-Storage AcceleratorsProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665956(131-136)Online publication date: 8-Jul-2024
  • (2024)Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL ControllerProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665953(108-115)Online publication date: 8-Jul-2024
  • (2024)TieredHM: Hotspot-Optimized Hash Indexing for Memory-Semantic SSD-Based Hybrid MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335469343:6(1755-1768)Online publication date: Jun-2024
  • (2024)NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00111(1518-1531)Online publication date: 2-Nov-2024
  • (2024)A CXL- Powered Database System: Opportunities and Challenges2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00447(5593-5604)Online publication date: 13-May-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media