research-article

Accelerating recommender system training 15x with RAPIDS

Authors:

Mads R. B. Kristensen,

Even OldridgeAuthors Info & Claims

RecSys Challenge '19: Proceedings of the Workshop on ACM Recommender Systems Challenge

Article No.: 8, Pages 1 - 5

https://doi.org/10.1145/3359555.3359564

Published: 20 September 2019 Publication History

Abstract

In this paper we present the novel aspects of our 15th place solution to the RecSys Challenge 2019 which are focused on the acceleration of feature generation and model training time. In our final solution we sped up training of our model by a factor of 15.6x, from a workflow of 891.8s (14m52s) to 57.2s, through a combination of the RAPIDS.AI cuDF library for preprocessing, a custom batch dataloader, LAMB and extreme batch sizes, and an update to the kernel responsible for calculating the embedding gradient in PyTorch. Using cuDF we also accelerated our feature generation by a factor of 9.7x by performing the computations on the GPU, reducing the time taken to generate the features used in our model from 51 minutes to 5. We demonstrate these optimizations on the fastai tabular model which we relied on extensively in our final ensemble. With training time so drastically reduced the iteration involved in generating new features and training new models is much more fluid, allowing for the rapid prototyping of deep learning based recommender systems in hours as opposed to days.

References

[1]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 785--794.

Digital Library

[2]

Jeremy Howard, Rachel Thomas, and Sylvain Gugger. 2018. Fast.ai Library. http://docs.fast.ai

[3]

Bytedance Inc. 2019. BytePS. https://github.com/bytedance/byteps

[4]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 3146--3154. http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf

Digital Library

[5]

Peter Knees, Yashar Deldjoo, Farshad Bakhshandegan Moghaddam, Jens Adamczak, Gerard-Paul Leyson, and Philipp Monreal. 2019. RecSys Challenge 2019: Session-based Hotel Recommendations. In Proceedings of the Thirteenth ACM Conference on Recommender Systems (RecSys '19). ACM, New York, NY, USA, 2.

Digital Library

[6]

Wes McKinney. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Stéfan van der Walt and Jarrod Millman (Eds.). 51 -- 56.

[7]

NVidia. 2017. APEX. https://github.com/NVIDIA/apex

[8]

NVidia. 2019. CUDA Profiler Users Guide. https://docs.nvidia.com/cuda/pdf/CUDA_Profiler_Users_Guide.pdf

[9]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic Differentiation in PyTorch. In NIPS Autodiff Workshop.

[10]

RAPIDS.AI. 2019. RAPIDS.AI cuDF repository. https://github.com/rapidsai/cuDF

[11]

Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 693--701. http://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent.pdf

Digital Library

[12]

Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. CoRR abs/1802.05799 (2018). arXiv:1802.05799 http://arxiv.org/abs/1802.05799

[13]

Mark J van der Laan, Eric C Polley, and Alan E Hubbard. 2007. Super Learner. In Journal of the American Statistical Applications in Genetics and Molecular Biology, Vol. 6. Issue 1.

[14]

StÃl'fan van der Walt, S. Chris Colbert, and GaÃńl Varoquaux. 2011. The NumPy Array: A Structure for Efficient Numerical Computation. Computing in Science & Engineering 13, 2 (2011), 22--30. arXiv:https://aip.scitation.org/doi/pdf/10.1109/MCSE.2011.37

Digital Library

[15]

Yang You, Jing Li, Jonathan Hseu, Xiaodan Song, James Demmel, and Cho-Jui Hsieh. 2019. Reducing BERT Pre-Training Time from 3 Days to 76 Minutes. CoRR abs/1904.00962 (2019). arXiv:1904.00962 http://arxiv.org/abs/1904.00962

Cited By

Kaczmarski KNarebski JPiotrowski SPrzymus P(2022)Fast JSON parser using metaprogramming on GPU2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA54385.2022.10032381(1-10)Online publication date: 13-Oct-2022
https://doi.org/10.1109/DSAA54385.2022.10032381
Bennawy Mel-Kafrawy P(2022)Recommendations on Streaming Data: E-Tourism Event Stream Processing Recommender SystemIntelligent and Fuzzy Systems10.1007/978-3-031-09176-6_59(514-523)Online publication date: 2-Jul-2022
https://doi.org/10.1007/978-3-031-09176-6_59
Adamczak JDeldjoo YMoghaddam FKnees PLeyson GMonreal P(2020)Session-based Hotel Recommendations DatasetACM Transactions on Intelligent Systems and Technology10.1145/341237912:1(1-20)Online publication date: 13-Nov-2020
https://dl.acm.org/doi/10.1145/3412379
Show More Cited By

Index Terms

Accelerating recommender system training 15x with RAPIDS
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
  2. World Wide Web
    1. Web searching and information discovery
      1. Content ranking

Recommendations

GPU Accelerated Feature Engineering and Training for Recommender Systems
RecSysChallenge '20: Proceedings of the Recommender Systems Challenge 2020

In this paper we present our 1st place solution of the RecSys Challenge 2020 which focused on the prediction of user behavior, specifically the interaction with content, on this year’s dataset from competition host Twitter. Our approach achieved the ...
A Scalable, Accurate Hybrid Recommender System
WKDD '10: Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining

Recommender systems apply machine learning techniques for filtering unseen information and can predict whether a user would like a given resource. There are three main types of recommender systems: collaborative filtering, content-based filtering, and ...
Hybrid Recommender System Based on Multi-Hierarchical Ontologies
WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web

Recommender Systems (RSs) are usually based in User Profiles (UP) to identify items of interest to a user, among the items of a usually vast collection. Traditional RSs are mostly based on ratings of items made by users and do not attempt to estimate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

RecSys Challenge '19: Proceedings of the Workshop on ACM Recommender Systems Challenge

September 2019

49 pages

ISBN:9781450376679

DOI:10.1145/3359555

Program Chairs:
Peter Knees
TU Wien, Austria
,
Yashar Deldjoo
Polytechnic University of Bari, Italy
,
Farshad Bakhshandegan Moghaddam
Karlsruhe Institute ofTechnology, Germany
,
Jens Adamczak
trivago N.V., Germany
,
Gerard Leyson
trivago N.V., Germany
,
Philipp Monreal
trivago N.V., Germany

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

RecSys Challenge '19

RecSys Challenge '19: ACM Recommender Systems Challenge 2019 Workshop

September 20, 2019

Copenhagen, Denmark

Acceptance Rates

Overall Acceptance Rate 11 of 15 submissions, 73%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
273
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kaczmarski KNarebski JPiotrowski SPrzymus P(2022)Fast JSON parser using metaprogramming on GPU2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA54385.2022.10032381(1-10)Online publication date: 13-Oct-2022
https://doi.org/10.1109/DSAA54385.2022.10032381
Bennawy Mel-Kafrawy P(2022)Recommendations on Streaming Data: E-Tourism Event Stream Processing Recommender SystemIntelligent and Fuzzy Systems10.1007/978-3-031-09176-6_59(514-523)Online publication date: 2-Jul-2022
https://doi.org/10.1007/978-3-031-09176-6_59
Adamczak JDeldjoo YMoghaddam FKnees PLeyson GMonreal P(2020)Session-based Hotel Recommendations DatasetACM Transactions on Intelligent Systems and Technology10.1145/341237912:1(1-20)Online publication date: 13-Nov-2020
https://dl.acm.org/doi/10.1145/3412379
Richardson BRees BDrabas TOldridge EBader DAllen RGupta RLiu YShah MRajan STang JPrakash B(2020)Accelerating and Expanding End-to-End Data Science Workflows with DL/ML Interoperability Using RAPIDSProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3406702(3503-3504)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1145/3394486.3406702
Talib MMajzoub SNasir QJamal D(2020)A systematic literature review on hardware implementation of artificial intelligence algorithmsThe Journal of Supercomputing10.1007/s11227-020-03325-877:2(1897-1938)Online publication date: 28-May-2020
https://doi.org/10.1007/s11227-020-03325-8

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten