skip to main content
research-article

EMS-i: An Efficient Memory System Design with Specialized Caching Mechanism for Recommendation Inference

Published:09 September 2023Publication History
Skip Abstract Section

Abstract

Recommendation systems have been widely embedded into many Internet services. For example, Meta’s deep learning recommendation model (DLRM) shows high prefictive accuracy of click-through rate in processing large-scale embedding tables. The SparseLengthSum (SLS) kernel of the DLRM dominates the inference time of the DLRM due to intensive irregular memory accesses to the embedding vectors. Some prior works directly adopt near data processing (NDP) solutions to obtain higher memory bandwidth to accelerate SLS. However, their inferior memory hierarchy induces low performance-cost ratio and fails to fully exploit the data locality. Although some software-managed cache policies were proposed to improve the cache hit rate, the incurred cache miss penalty is unacceptable considering the high overheads of executing the corresponding programs and the communication between the host and the accelerator. To address the issues aforementioned, we propose EMS-i, an efficient memory system design that integrates Solide State Drive (SSD) into the memory hierarchy using Compute Express Link (CXL) for recommendation system inference. We specialize the caching mechanism according to the characteristics of various DLRM workloads and propose a novel prefetching mechanism to further improve the performance. In addition, we delicately design the inference kernel and develop a customized mapping scheme for SLS operation, considering the multi-level parallelism in SLS and the data locality within a batch of queries. Compared to the state-of-the-art NDP solutions, EMS-i achieves up to 10.9× speedup over RecSSD and the performance comparable to RecNMP with 72% energy savings. EMS-i also saves up to 8.7× and 6.6 × memory cost w.r.t. RecSSD and RecNMP, respectively.

REFERENCES

  1. [1] Amazon Personalize 2023. https://aws.amazon.com/personalize/Google ScholarGoogle Scholar
  2. [2] Ardestani Ehsan K. et al. 2022. Supporting massive DLRM inference through software defined memory. In ICDCS. IEEE.Google ScholarGoogle Scholar
  3. [3] Arya Sunil, Mount David M., Netanyahu Nathan S., Silverman Ruth, and Wu Angela Y.. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM (JACM) 45, 6 (1998), 891923.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Babenko Artem and Lempitsky Victor. 2016. Efficient indexing of billion-scale datasets of deep descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20552063.Google ScholarGoogle Scholar
  5. [5] Balasubramanian Keshav, Alshabanah Abdulla, Choe Joshua D., and Annavaram Murali. 2021. cDLRM: Look ahead caching for scalable training of recommendation models. In Proceedings of the 15th ACM Conference on Recommender Systems. 263272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Criteo Kaggle Dataset 2020. https://www.kaggle.com/datasets/mrkmakr/criteo-datasetGoogle ScholarGoogle Scholar
  7. [7] CXL 3.0 Specification 2022. https://www.computeexpresslink.org/download-the-specification/Google ScholarGoogle Scholar
  8. [8] DRAM Market Price 2023. https://electronics-sourcing.com/2022/05/12/dram-price-increases-will-ease/Google ScholarGoogle Scholar
  9. [9] Facebook DLRM Dataset 2021. https://github.com/facebookresearch/dlrm_datasetsGoogle ScholarGoogle Scholar
  10. [10] Gupta Udit et al. 2020. DeepRecSys: A system for optimizing end-to-end at-scale neural recommendation inference. In ISCA.Google ScholarGoogle Scholar
  11. [11] HBM Market Price 2023. https://www.networkworld.com/article/3664088/high-bandwidth-memory-hdm-delivers-impressive-performance-gains.htmlGoogle ScholarGoogle Scholar
  12. [12] Hwang Ranggi, Kim Taehun, Kwon Youngeun, and Rhu Minsoo. 2020. Centaur: A chiplet-based, hybrid sparse-dense accelerator for personalized recommendations. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’20). IEEE, 968981.Google ScholarGoogle Scholar
  13. [13] Jung Myoungsoo et al. 2017. SimpleSSD: Modeling solid state drives for holistic system simulation. IEEE Computer Architecture Letters (2017).Google ScholarGoogle Scholar
  14. [14] Kaggle 2023. https://www.kaggle.comGoogle ScholarGoogle Scholar
  15. [15] Ke Liu et al. 2020. Recnmp: Accelerating personalized recommendation with near-memory processing. In ISCA.Google ScholarGoogle Scholar
  16. [16] Ke Liu, Zhang Xuan, So Jinin, Lee Jong-Geon, Kang Shin-Haeng, Lee Sukhan, Han Songyi, Cho YeonGon, Kim Jin Hyun, Kwon Yongsuk, et al. 2021. Near-memory processing in action: Accelerating personalized recommendation with axdimm. IEEE Micro 42, 1 (2021), 116127.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Kim Ji-Hoon, Park Yeo-Reum, Do Jaeyoung, Ji Soo-Young, and Kim Joo-Young. 2022. Accelerating large-scale graph-based nearest neighbor search on a computational storage platform. IEEE Trans. Comput. (2022), 11. Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Kim Yoongu et al. 2015. Ramulator: A fast and extensible DRAM simulator. IEEE Computer Architecture Letters (2015).Google ScholarGoogle Scholar
  19. [19] Kwon Youngeun, Lee Yunjae, and Rhu Minsoo. 2019. Tensordimm: A practical near-memory processing architecture for embeddings and tensor operations in deep learning. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 740753.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Kwon Youngeun, Lee Yunjae, and Rhu Minsoo. 2021. Tensor casting: Co-designing algorithm-architecture for personalized recommendation training. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA’21). IEEE, 235248.Google ScholarGoogle Scholar
  21. [21] Li Huaicheng et al. 2022. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms. Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Lowe-Power Jason et al. 2020. The gem5 simulator: Version 20.0+. arXiv preprint arXiv:2007.03152 (2020).Google ScholarGoogle Scholar
  23. [23] Malkov Yu A. and Yashunin Dmitry A.. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (2018), 824836.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Meta 2023. https://about.meta.comGoogle ScholarGoogle Scholar
  25. [25] Mudigere Dheevatsa, Hao Yuchen, Huang Jianyu, Tulloch Andrew, Sridharan Srinivas, Liu Xing, Ozdal Mustafa, Nie Jade, Park Jongsoo, Luo Liang, et al. 2021. High-performance, distributed training of large-scale deep learning recommendation models. arXiv preprint arXiv:2104.05158 (2021).Google ScholarGoogle Scholar
  26. [26] Naumov Maxim et al. 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv (2019).Google ScholarGoogle Scholar
  27. [27] Pennington Jeffrey, Socher Richard, and Manning Christopher D.. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 15321543.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] PM983 Product Brief 2018. https://www.samsung.com/semiconductor/global.semi.static/Google ScholarGoogle Scholar
  29. [29] sift-1b 2022. http://corpus-texmex.irisa.fr/Google ScholarGoogle Scholar
  30. [30] Soltaniyeh Mohammadreza et al. 2022. Near-storage processing for solid state drive based recommendation inference with SmartSSDs®. In ICPE.Google ScholarGoogle Scholar
  31. [31] spacev-1b 2021. https://github.com/microsoft/SPTAG/tree/main/datasets/SPACEV1BGoogle ScholarGoogle Scholar
  32. [32] SSD Market Price 2023. https://www.disctech.com/Samsung-PM1725B-3.2TB-MZ-PLL3T2C-MZPLK1T6HCHP-00005-Dell-73KJ7-PCIe-NVMe-SSD?partner=1011&gclid=CjwKCAiAzp6eBhByEiwA_gGq5BswRyE1M-T6X7Gjbw9dlC_GAWnrc0kRwddyzN9IQ6mbkMA3mfSvpxoCmvEQAvD_BwEGoogle ScholarGoogle Scholar
  33. [33] Sun Xuan, Wan Hu, Li Qiao, Yang Chia-Lin, Kuo Tei-Wei, and Xue Chun Jason. 2022. Rm-ssd: In-storage computing for large-scale recommendation inference. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA’22). IEEE, 10561070.Google ScholarGoogle Scholar
  34. [34] torchrec 2022. https://pytorch.org/torchrec/Google ScholarGoogle Scholar
  35. [35] Walter Frank Edward et al. 2008. A model of a trust-based recommendation system on a social network. AAMAS (2008).Google ScholarGoogle Scholar
  36. [36] Wang Yitu, Zhu Zhenhua, Chen Fan, Ma Mingyuan, Dai Guohao, Wang Yu, Li Hai, and Chen Yiran. 2021. REREC: In-ReRAM acceleration with access-aware mapping for personalized recommendation. In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD’21). IEEE, 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Wilkening Mark, Gupta, et al. 2021. RecSSD: Near data processing for solid state drive based recommendation inference. In ASPLOS.Google ScholarGoogle Scholar
  38. [38] Xiao Han, Rasul Kashif, and Vollgraf Roland. 2017. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).Google ScholarGoogle Scholar
  39. [39] Xilinx VU57P HBM 2023. https://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale-plus-vu57p.htmlGoogle ScholarGoogle Scholar
  40. [40] Zhou Xiangmin et al. 2015. Online video recommendation in sharing community. In ICMD.Google ScholarGoogle Scholar

Index Terms

  1. EMS-i: An Efficient Memory System Design with Specialized Caching Mechanism for Recommendation Inference

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 22, Issue 5s
        Special Issue ESWEEK 2023
        October 2023
        1394 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3614235
        • Editor:
        • Tulika Mitra
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 September 2023
        • Accepted: 13 July 2023
        • Revised: 2 June 2023
        • Received: 23 March 2023
        Published in tecs Volume 22, Issue 5s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)715
        • Downloads (Last 6 weeks)116

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text