research-article

A joint management middleware to improve training performance of deep recommendation systems with SSDs

Authors:

Carole-Jean Wu,

David BrooksAuthors Info & Claims

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 157 - 162

https://doi.org/10.1145/3489517.3530426

Published: 23 August 2022 Publication History

Abstract

As the sizes and variety of training data scale over time, data preprocessing is becoming an important performance bottleneck for training deep recommendation systems. This challenge becomes more serious when training data is stored in Solid-State Drives (SSDs). Due to the access behavior gap between recommendation systems and SSDs, unused training data may be read and filtered out during preprocessing. This work advocates a joint management middleware to avoid reading unused data by bridging the access behavior gap. The evaluation results show that our middleware can effectively improve the performance of the data preprocessing phase so as to boost training performance.

References

[1]

Andrea Arcangeli, Izik Eidus, and Chris Wright. 2009. Increasing memory density by using KSM. In Proceedings of the linux symposium. Citeseer, 19--28.

[2]

Paolo Atzeni, Francesca Bugiotti, and Luca Rossi. 2014. Uniform access to NoSQL systems. Information Systems 43 (2014), 117--133.

Digital Library

[3]

Keshav Balasubramanian, Abdulla Alshabanah, Joshua D Choe, and Murali Annavaram. 2021. cDLRM: Look Ahead Caching for Scalable Training of Recommendation Models. In Fifteenth ACM Conference on Recommender Systems. 263--272.

Digital Library

[4]

Tanya Brokhman, Pavel Lifshits, and Mark Silberstein. 2019. {GAIA}: An {OS} page cache for heterogeneous systems. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 661--674.

[5]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7--10.

Digital Library

[6]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191--198.

Digital Library

[7]

Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018. Reducing DRAM footprint with NVM in Facebook. In Proceedings of the Thirteenth EuroSys Conference. ACM, 42.

Digital Library

[8]

Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, and Sachin Katti. 2018. Bandana: Using non-volatile memory for storing deep learning models. arXiv preprint arXiv:1811.05922 (2018).

[9]

Sanjay Ghemawat and Jeff Dean. 2011. LevelDB. http://code.google.com/p/leveldb

[10]

Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S Lee, David Brooks, and Carole-Jean Wu. 2020. Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 982--995.

Digital Library

[11]

Udit Gupta, Samuel Hsia, Mark Wilkening, Javin Pombra, Hsien-Hsin S Lee, GuYeon Wei, Carole-Jean Wu, David Brooks, et al. 2021. RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance. arXiv preprint arXiv:2105.08820 (2021).

[12]

Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, et al. 2020. The architectural implications of facebook's dnn-based personalized recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 488--501.

[13]

Jen-Wei Hsieh, Yuan-Hao Chang, and Yuan-Sheng Chu. 2013. Implementation strategy for downgraded flash-memory storage devices. ACM Transactions on Embedded Computing Systems (TECS) 12, 1s (2013), 1--29.

Digital Library

[14]

Junsu Im, Jinwook Bae, Chanwoo Chung, and Sungjin Lee. 2021. Design of LSM-tree-based Key-value SSDs with Bounded Tails. ACM Transactions on Storage (TOS) 17, 2 (2021), 1--27.

Digital Library

[15]

Chun-Yi Liu, Jagadish B Kotra, Myoungsoo Jung, Mahmut T Kandemir, and Chita R Das. 2019. SOML read: Rethinking the read operation granularity of 3D NAND SSDs. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 955--969.

Digital Library

[16]

Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2017. Wisckey: Separating keys from values in ssd-conscious storage. ACM Transactions on Storage (TOS) 13, 1 (2017), 1--28.

Digital Library

[17]

Derek G Murray, Jiri Simsa, Ana Klimovic, and Ihor Indyk. 2021. tf. data: A Machine Learning Data Processing Framework. arXiv preprint arXiv:2101.12127 (2021).

[18]

Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, et al. 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).

[19]

Mark Wilkening, Udit Gupta, Samuel Hsia, Caroline Trippel, Carole-Jean Wu, David Brooks, and Gu-Yeon Wei. 2021. RecSSD: near data processing for solid state drive based recommendation inference. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 717--729.

Digital Library

[20]

Chun-Feng Wu, Yuan-Hao Chang, Ming-Chang Yang, and Tei-Wei Kuo. 2020. Joint Management of CPU and NVDIMM for Breaking Down the Great Memory Wall. IEEE Trans. Comput. (2020).

[21]

Chun-Feng Wu, Yuan-Hao Chang, Ming-Chang Yang, and Tei-Wei Kuo. 2020. When Storage Response Time Catches Up With Overall Context Switch Overhead, What Is Next? IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 4266--4277.

[22]

Chun-Feng Wu, Ming-Chang Yang, and Yuan-Hao Chang. 2018. Improving runtime performance of deduplication system with host-managed SMR storage drives. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). IEEE, 1--6.

Digital Library

[23]

Chun-Feng Wu, Ming-Chang Yang, Yuan-Hao Chang, and Tei-Wei Kuo. 2018. Hot-spot suppression for resource-constrained image recognition devices with nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2567--2577.

[24]

Qiumin Xu, Huzefa Siyamwala, Mrinmoy Ghosh, Manu Awasthi, Tameesh Suri, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance characterization of hyperscale applicationson on nvme ssds. In ACM SIGMETRICS Performance Evaluation Review, Vol. 43. ACM, 473--474.

Digital Library

[25]

Mark Zhao, Niket Agarwal, Aarti Basant, Bugra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, et al. 2021. Understanding and Co-designing the Data Ingestion Pipeline for Industry-Scale RecSys Training. arXiv preprint arXiv:2108.09373 (2021).

[26]

Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed hierarchical gpu parameter server for massive scale deep learning ads systems. arXiv preprint arXiv:2003.05622 (2020).

Cited By

Wu CChang YYang MKuo TDe V(2024)How to Steal CPU Idle Time When Synchronous I/O Mode Becomes PromisingProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655929(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3655929
Nouaji RBitchebe SBalmau O(2024)SpeedyLoaderProceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655824(65-72)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3642970.3655824
Wang WWu CChen YKuo TChang Y(2024)GEAR: Graph-Evolving Aware Data Arranger to Enhance the Performance of Traversing Evolving Graphs on SCMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344722243:11(3674-3684)Online publication date: Nov-2024
https://doi.org/10.1109/TCAD.2024.3447222
Show More Cited By

Index Terms

A joint management middleware to improve training performance of deep recommendation systems with SSDs
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

A data management method for databases using hybrid storage systems

When applications require high I/O performance, solid-state drives (SSDs) are often preferable because they perform better than traditional hard-disk drives (HDDs). Therefore, database system response time can be improved by moving frequently used data ...
A priority-based data placement method for databases using solid-state drives
RACS '18: Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems

When applications require high I/O performance, solid-state drives (SSDs) are often preferable because they perform better than traditional hard-disk drives (HDDs). Therefore, database system response time can be improved by moving frequently used data ...
Cache eviction for SSD-HDD hybrid storage based on sequential packing
Abstract
Hybrid storage systems consist of NAND flash-based solid-state drives (SSDs) and conventional hard disk drives (HDDs). In which, the SSD device commonly acts as a cache for HDDs, to not only reduce overall power consumption, but also ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

July 2022

1462 pages

ISBN:9781450391429

DOI:10.1145/3489517

General Chair:
Rob Oshana
NXP

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

JUMP Center cosponsored by SRC and DARPA
Ministry of Science and Technology
Application Driving Architectures (ADA) Research Center

Conference

DAC '22

Sponsor:

SIGDA

DAC '22: 59th ACM/IEEE Design Automation Conference

July 10 - 14, 2022

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
469
Total Downloads

Downloads (Last 12 months)101
Downloads (Last 6 weeks)6

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu CChang YYang MKuo TDe V(2024)How to Steal CPU Idle Time When Synchronous I/O Mode Becomes PromisingProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655929(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3655929
Nouaji RBitchebe SBalmau O(2024)SpeedyLoaderProceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655824(65-72)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3642970.3655824
Wang WWu CChen YKuo TChang Y(2024)GEAR: Graph-Evolving Aware Data Arranger to Enhance the Performance of Traversing Evolving Graphs on SCMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344722243:11(3674-3684)Online publication date: Nov-2024
https://doi.org/10.1109/TCAD.2024.3447222
Kwon MJang JChoi HLee SJung M(2023)Failure Tolerant Training With Persistent Memory Disaggregation Over CXLIEEE Micro10.1109/MM.2023.323754843:2(66-75)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1109/MM.2023.3237548

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten