skip to main content
10.1145/3359555.3359564acmotherconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Accelerating recommender system training 15x with RAPIDS

Published: 20 September 2019 Publication History

Abstract

In this paper we present the novel aspects of our 15th place solution to the RecSys Challenge 2019 which are focused on the acceleration of feature generation and model training time. In our final solution we sped up training of our model by a factor of 15.6x, from a workflow of 891.8s (14m52s) to 57.2s, through a combination of the RAPIDS.AI cuDF library for preprocessing, a custom batch dataloader, LAMB and extreme batch sizes, and an update to the kernel responsible for calculating the embedding gradient in PyTorch. Using cuDF we also accelerated our feature generation by a factor of 9.7x by performing the computations on the GPU, reducing the time taken to generate the features used in our model from 51 minutes to 5. We demonstrate these optimizations on the fastai tabular model which we relied on extensively in our final ensemble. With training time so drastically reduced the iteration involved in generating new features and training new models is much more fluid, allowing for the rapid prototyping of deep learning based recommender systems in hours as opposed to days.

References

[1]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 785--794.
[2]
Jeremy Howard, Rachel Thomas, and Sylvain Gugger. 2018. Fast.ai Library. http://docs.fast.ai
[3]
Bytedance Inc. 2019. BytePS. https://github.com/bytedance/byteps
[4]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 3146--3154. http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf
[5]
Peter Knees, Yashar Deldjoo, Farshad Bakhshandegan Moghaddam, Jens Adamczak, Gerard-Paul Leyson, and Philipp Monreal. 2019. RecSys Challenge 2019: Session-based Hotel Recommendations. In Proceedings of the Thirteenth ACM Conference on Recommender Systems (RecSys '19). ACM, New York, NY, USA, 2.
[6]
Wes McKinney. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Stéfan van der Walt and Jarrod Millman (Eds.). 51 -- 56.
[7]
NVidia. 2017. APEX. https://github.com/NVIDIA/apex
[8]
NVidia. 2019. CUDA Profiler Users Guide. https://docs.nvidia.com/cuda/pdf/CUDA_Profiler_Users_Guide.pdf
[9]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic Differentiation in PyTorch. In NIPS Autodiff Workshop.
[10]
RAPIDS.AI. 2019. RAPIDS.AI cuDF repository. https://github.com/rapidsai/cuDF
[11]
Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 693--701. http://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent.pdf
[12]
Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. CoRR abs/1802.05799 (2018). arXiv:1802.05799 http://arxiv.org/abs/1802.05799
[13]
Mark J van der Laan, Eric C Polley, and Alan E Hubbard. 2007. Super Learner. In Journal of the American Statistical Applications in Genetics and Molecular Biology, Vol. 6. Issue 1.
[14]
StÃl'fan van der Walt, S. Chris Colbert, and GaÃńl Varoquaux. 2011. The NumPy Array: A Structure for Efficient Numerical Computation. Computing in Science & Engineering 13, 2 (2011), 22--30. arXiv:https://aip.scitation.org/doi/pdf/10.1109/MCSE.2011.37
[15]
Yang You, Jing Li, Jonathan Hseu, Xiaodan Song, James Demmel, and Cho-Jui Hsieh. 2019. Reducing BERT Pre-Training Time from 3 Days to 76 Minutes. CoRR abs/1904.00962 (2019). arXiv:1904.00962 http://arxiv.org/abs/1904.00962

Cited By

View all
  • (2022)Fast JSON parser using metaprogramming on GPU2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA54385.2022.10032381(1-10)Online publication date: 13-Oct-2022
  • (2022)Recommendations on Streaming Data: E-Tourism Event Stream Processing Recommender SystemIntelligent and Fuzzy Systems10.1007/978-3-031-09176-6_59(514-523)Online publication date: 2-Jul-2022
  • (2020)Session-based Hotel Recommendations DatasetACM Transactions on Intelligent Systems and Technology10.1145/341237912:1(1-20)Online publication date: 13-Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
RecSys Challenge '19: Proceedings of the Workshop on ACM Recommender Systems Challenge
September 2019
49 pages
ISBN:9781450376679
DOI:10.1145/3359555
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU acceleration
  2. neural networks
  3. recommender systems

Qualifiers

  • Research-article

Conference

RecSys Challenge '19

Acceptance Rates

Overall Acceptance Rate 11 of 15 submissions, 73%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Fast JSON parser using metaprogramming on GPU2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA54385.2022.10032381(1-10)Online publication date: 13-Oct-2022
  • (2022)Recommendations on Streaming Data: E-Tourism Event Stream Processing Recommender SystemIntelligent and Fuzzy Systems10.1007/978-3-031-09176-6_59(514-523)Online publication date: 2-Jul-2022
  • (2020)Session-based Hotel Recommendations DatasetACM Transactions on Intelligent Systems and Technology10.1145/341237912:1(1-20)Online publication date: 13-Nov-2020
  • (2020)Accelerating and Expanding End-to-End Data Science Workflows with DL/ML Interoperability Using RAPIDSProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3406702(3503-3504)Online publication date: 23-Aug-2020
  • (2020)A systematic literature review on hardware implementation of artificial intelligence algorithmsThe Journal of Supercomputing10.1007/s11227-020-03325-877:2(1897-1938)Online publication date: 28-May-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media