skip to main content
10.1145/3489517.3530426acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

A joint management middleware to improve training performance of deep recommendation systems with SSDs

Published: 23 August 2022 Publication History

Abstract

As the sizes and variety of training data scale over time, data preprocessing is becoming an important performance bottleneck for training deep recommendation systems. This challenge becomes more serious when training data is stored in Solid-State Drives (SSDs). Due to the access behavior gap between recommendation systems and SSDs, unused training data may be read and filtered out during preprocessing. This work advocates a joint management middleware to avoid reading unused data by bridging the access behavior gap. The evaluation results show that our middleware can effectively improve the performance of the data preprocessing phase so as to boost training performance.

References

[1]
Andrea Arcangeli, Izik Eidus, and Chris Wright. 2009. Increasing memory density by using KSM. In Proceedings of the linux symposium. Citeseer, 19--28.
[2]
Paolo Atzeni, Francesca Bugiotti, and Luca Rossi. 2014. Uniform access to NoSQL systems. Information Systems 43 (2014), 117--133.
[3]
Keshav Balasubramanian, Abdulla Alshabanah, Joshua D Choe, and Murali Annavaram. 2021. cDLRM: Look Ahead Caching for Scalable Training of Recommendation Models. In Fifteenth ACM Conference on Recommender Systems. 263--272.
[4]
Tanya Brokhman, Pavel Lifshits, and Mark Silberstein. 2019. {GAIA}: An {OS} page cache for heterogeneous systems. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 661--674.
[5]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7--10.
[6]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191--198.
[7]
Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018. Reducing DRAM footprint with NVM in Facebook. In Proceedings of the Thirteenth EuroSys Conference. ACM, 42.
[8]
Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, and Sachin Katti. 2018. Bandana: Using non-volatile memory for storing deep learning models. arXiv preprint arXiv:1811.05922 (2018).
[9]
Sanjay Ghemawat and Jeff Dean. 2011. LevelDB. http://code.google.com/p/leveldb
[10]
Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S Lee, David Brooks, and Carole-Jean Wu. 2020. Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 982--995.
[11]
Udit Gupta, Samuel Hsia, Mark Wilkening, Javin Pombra, Hsien-Hsin S Lee, GuYeon Wei, Carole-Jean Wu, David Brooks, et al. 2021. RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance. arXiv preprint arXiv:2105.08820 (2021).
[12]
Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, et al. 2020. The architectural implications of facebook's dnn-based personalized recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 488--501.
[13]
Jen-Wei Hsieh, Yuan-Hao Chang, and Yuan-Sheng Chu. 2013. Implementation strategy for downgraded flash-memory storage devices. ACM Transactions on Embedded Computing Systems (TECS) 12, 1s (2013), 1--29.
[14]
Junsu Im, Jinwook Bae, Chanwoo Chung, and Sungjin Lee. 2021. Design of LSM-tree-based Key-value SSDs with Bounded Tails. ACM Transactions on Storage (TOS) 17, 2 (2021), 1--27.
[15]
Chun-Yi Liu, Jagadish B Kotra, Myoungsoo Jung, Mahmut T Kandemir, and Chita R Das. 2019. SOML read: Rethinking the read operation granularity of 3D NAND SSDs. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 955--969.
[16]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2017. Wisckey: Separating keys from values in ssd-conscious storage. ACM Transactions on Storage (TOS) 13, 1 (2017), 1--28.
[17]
Derek G Murray, Jiri Simsa, Ana Klimovic, and Ihor Indyk. 2021. tf. data: A Machine Learning Data Processing Framework. arXiv preprint arXiv:2101.12127 (2021).
[18]
Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, et al. 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).
[19]
Mark Wilkening, Udit Gupta, Samuel Hsia, Caroline Trippel, Carole-Jean Wu, David Brooks, and Gu-Yeon Wei. 2021. RecSSD: near data processing for solid state drive based recommendation inference. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 717--729.
[20]
Chun-Feng Wu, Yuan-Hao Chang, Ming-Chang Yang, and Tei-Wei Kuo. 2020. Joint Management of CPU and NVDIMM for Breaking Down the Great Memory Wall. IEEE Trans. Comput. (2020).
[21]
Chun-Feng Wu, Yuan-Hao Chang, Ming-Chang Yang, and Tei-Wei Kuo. 2020. When Storage Response Time Catches Up With Overall Context Switch Overhead, What Is Next? IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 4266--4277.
[22]
Chun-Feng Wu, Ming-Chang Yang, and Yuan-Hao Chang. 2018. Improving runtime performance of deduplication system with host-managed SMR storage drives. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). IEEE, 1--6.
[23]
Chun-Feng Wu, Ming-Chang Yang, Yuan-Hao Chang, and Tei-Wei Kuo. 2018. Hot-spot suppression for resource-constrained image recognition devices with nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2567--2577.
[24]
Qiumin Xu, Huzefa Siyamwala, Mrinmoy Ghosh, Manu Awasthi, Tameesh Suri, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance characterization of hyperscale applicationson on nvme ssds. In ACM SIGMETRICS Performance Evaluation Review, Vol. 43. ACM, 473--474.
[25]
Mark Zhao, Niket Agarwal, Aarti Basant, Bugra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, et al. 2021. Understanding and Co-designing the Data Ingestion Pipeline for Industry-Scale RecSys Training. arXiv preprint arXiv:2108.09373 (2021).
[26]
Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed hierarchical gpu parameter server for massive scale deep learning ads systems. arXiv preprint arXiv:2003.05622 (2020).

Cited By

View all
  • (2024)How to Steal CPU Idle Time When Synchronous I/O Mode Becomes PromisingProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655929(1-6)Online publication date: 23-Jun-2024
  • (2024)SpeedyLoaderProceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655824(65-72)Online publication date: 22-Apr-2024
  • (2024)GEAR: Graph-Evolving Aware Data Arranger to Enhance the Performance of Traversing Evolving Graphs on SCMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344722243:11(3674-3684)Online publication date: Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
July 2022
1462 pages
ISBN:9781450391429
DOI:10.1145/3489517
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data arranger
  2. data preprocessing
  3. deep recommendation systems
  4. hardware/software co-design
  5. log-structured merge (LSM)
  6. solid-state drives (SSDs)
  7. training performance

Qualifiers

  • Research-article

Funding Sources

Conference

DAC '22
Sponsor:
DAC '22: 59th ACM/IEEE Design Automation Conference
July 10 - 14, 2022
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)101
  • Downloads (Last 6 weeks)6
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)How to Steal CPU Idle Time When Synchronous I/O Mode Becomes PromisingProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655929(1-6)Online publication date: 23-Jun-2024
  • (2024)SpeedyLoaderProceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655824(65-72)Online publication date: 22-Apr-2024
  • (2024)GEAR: Graph-Evolving Aware Data Arranger to Enhance the Performance of Traversing Evolving Graphs on SCMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344722243:11(3674-3684)Online publication date: Nov-2024
  • (2023)Failure Tolerant Training With Persistent Memory Disaggregation Over CXLIEEE Micro10.1109/MM.2023.323754843:2(66-75)Online publication date: 1-Mar-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media