skip to main content
10.1145/3604915.3608882acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
abstract

Efficient Data Representation Learning in Google-scale Systems

Published:14 September 2023Publication History

ABSTRACT

"Garbage in, Garbage out" is a familiar maxim to ML practitioners and researchers, because the quality of a learned data representation is highly crucial to the quality of any ML model that consumes it as an input. To handle systems that serve billions of users at millions of queries per second (QPS), we need representation learning algorithms with significantly improved efficiency. At Google, we have dedicated thousands of iterations to develop a set of powerful techniques that efficiently learn high quality data representations. We have thoroughly validated these methods through offline evaluation, online A/B testing, and deployed these in over 50 models across major Google products. In this paper, we consider a generalized data representation learning problem that allows us to identify feature embeddings and crosses as common challenges. We propose two solutions, including: 1. Multi-size Unified Embedding to learn high-quality embeddings; and 2. Deep Cross Network V2 for learning effective feature crosses. We discuss the practical challenges we encountered and solutions we developed during deployment to production systems, compare with SOTA methods, and report offline and online experimental results. This work sheds light on the challenges and opportunities for developing next-gen algorithms for web-scale systems.

References

  1. Rohan Anil, Sandra Gadanho, Da Huang, Nijith Jacob, Zhuoshu Li, Dong Lin, Todd Phillips, Cristina Pop, Kevin Regan, Gil I. Shamir, Rakesh Shivanna, and Qiqi Yan. 2022. On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models. In Proceedings of the 5th Workshop on Online Recommender Systems and User Modeling co-located with the 16th ACM Conference on Recommender Systems, ORSUM@RecSys, João Vinagre, Marie Al-Ghossein, Alípio Mário Jorge, Albert Bifet, and Ladislav Peska (Eds.).Google ScholarGoogle Scholar
  2. Yoshua Bengio, Réjean Ducharme, and Pascal Vincent. 2000. A Neural Probabilistic Language Model. In Advances in Neural Information Processing Systems. 932–938.Google ScholarGoogle Scholar
  3. Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H Chi. 2018. Latent cross: Making use of context in recurrent recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 46–54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Workshop on Deep Learning for Recommender Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, and Derek Zhiyuan Cheng. 2023. Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems. arxiv:2305.12102 [cs.LG]Google ScholarGoogle Scholar
  6. Aditya Desai, Li Chou, and Anshumali Shrivastava. 2022. Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systems. Proceedings of Machine Learning and Systems 4 (2022), 762–778.Google ScholarGoogle Scholar
  7. Aditya Desai and Anshumali Shrivastava. 2022. The trade-offs of model size in large recommendation models : 100GB to 10MB Criteo-tb DLRM model. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=c9I_NArDIjDGoogle ScholarGoogle Scholar
  8. Kaize Ding, Albert Jiongqian Liang, Bryan Perrozi, Ting Chen, Ruoxi Wang, Lichan Hong, Ed H. Chi, Huan Liu, and Derek Zhiyuan Cheng. 2023. HyperFormer: Learning Expressive Sparse Feature Representations via Hypergraph Transformer. arxiv:2305.17386 [cs.IR]Google ScholarGoogle Scholar
  9. Matthew Fahrbach, Adel Javanmard, Vahab Mirrokni, and Pratik Worah. 2023. Learning Rate Schedules in the Presence of Distribution Shift. arXiv preprint arXiv:2303.15634 (2023).Google ScholarGoogle Scholar
  10. Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).Google ScholarGoogle Scholar
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarGoogle ScholarCross RefCross Ref
  12. Norman P Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, 2023. Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. Proceedings of the 50th International Symposium of Computer Architecture (2023).Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Wang-Cheng Kang, Derek Zhiyuan Cheng, Ting Chen, Xinyang Yi, Dong Lin, Lichan Hong, and Ed H. Chi. 2020. Learning Multi-Granular Quantized Embeddings for Large-Vocab Categorical Features in Recommender Systems. In Companion Proceedings of the Web Conference 2020 (Taipei, Taiwan) (WWW ’20). Association for Computing Machinery, New York, NY, USA, 562–566. https://doi.org/10.1145/3366424.3383416Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Wang-Cheng Kang, Derek Zhiyuan Cheng, Tiansheng Yao, Xinyang Yi, Ting Chen, Lichan Hong, and Ed H. Chi. 2021. Learning to Embed Categorical Features without Embedding Tables for Recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Virtual Event, Singapore) (KDD ’21). Association for Computing Machinery, New York, NY, USA, 840–850. https://doi.org/10.1145/3447548.3467304Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, and Ali Farhadi. 2022. Matryoshka Representation Learning. arxiv:2205.13147 [cs.LG]Google ScholarGoogle Scholar
  16. Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xDeepFM. In SIGKDD.Google ScholarGoogle Scholar
  17. Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems. 3111–3119.Google ScholarGoogle Scholar
  18. Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Dimitry Melts, Krishna Dhulipala, K. R. Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Guoqiang Jerry Chen, Manoj Krishnan, Avinash Nayak, Krishnakumar Nair, Bharath Muthiah, Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Ajit Mathews, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, and Vijay Rao. 2022. Software-hardware co-design for fast and scalable training of deep learning recommendation models. In The 49th Annual International Symposium on Computer Architecture. ACM, 993–1011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).Google ScholarGoogle Scholar
  20. Vardan Papyan, XY Han, and David L Donoho. 2020. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences 117, 40 (2020), 24652–24663.Google ScholarGoogle ScholarCross RefCross Ref
  21. David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533.Google ScholarGoogle ScholarCross RefCross Ref
  22. Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, and Jiyan Yang. 2020. Compositional embeddings using complementary partitions for memory-efficient recommendation systems. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 165–175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1161–1170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Dan Svenstrup, Jonas Hansen, and Ole Winther. 2017. Hash embeddings for efficient word representations. Advances in Neural Information Processing Systems (2017).Google ScholarGoogle Scholar
  25. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  26. Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. In Proceedings of the ADKDD’17. 1–7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ruoxi Wang, Rakesh Shivanna, Derek Cheng, Sagar Jain, Dong Lin, Lichan Hong, and Ed Chi. 2021. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the web conference 2021. 1785–1797.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yingcan Wei, Matthias Langer, Fan Yu, Minseok Lee, Jie Liu, Ji Shi, and Zehuan Wang. 2022. A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models. In Sixteenth ACM Conference on Recommender Systems. ACM, 408–419.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning. 1113–1120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ji Yang, Xinyang Yi, Derek Zhiyuan Cheng, Lichan Hong, Yang Li, Simon Xiaoming Wang, Taibai Xu, and Ed H. Chi. 2020. Mixed Negative Sampling for Learning Two-Tower Neural Networks in Recommendations. In Companion Proceedings of the Web Conference 2020 (Taipei, Taiwan) (WWW ’20). 441–447.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed H. Chi. 2019. Sampling-bias-corrected neural modeling for large corpus item recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems. ACM, 269–277.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Article Metrics

    • Downloads (Last 12 months)1,516
    • Downloads (Last 6 weeks)39

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format