abstract

Efficient Data Representation Learning in Google-scale Systems

Authors:
Derek Zhiyuan Cheng

Google DeepMind, USA

Google DeepMind, USA

0009-0000-7943-8328
View Profile

,
Ruoxi Wang

Google DeepMind, USA

Google DeepMind, USA

0009-0002-4824-1316
View Profile

,
Wang-Cheng Kang

Google DeepMind, USA

Google DeepMind, USA

0009-0006-8795-3665
View Profile

,
Benjamin Coleman

Google DeepMind, USA

Google DeepMind, USA

0009-0001-8045-3717
View Profile

,
Yin Zhang

Google DeepMind, USA

Google DeepMind, USA

0000-0002-5169-9849
View Profile

,
Jianmo Ni

Google DeepMind, USA

Google DeepMind, USA

0000-0002-6863-8073
View Profile

,
Jonathan Valverde

Google DeepMind, USA

Google DeepMind, USA

0009-0007-4587-1458
View Profile

,
Lichan Hong

Google DeepMind, USA

Google DeepMind, USA

0009-0004-9563-554X
View Profile

,
Ed Chi

Google DeepMind, USA

Google DeepMind, USA

0000-0003-3230-5338
View Profile

RecSys '23: Proceedings of the 17th ACM Conference on Recommender SystemsSeptember 2023Pages 267–271https://doi.org/10.1145/3604915.3608882

Published:14 September 2023Publication History

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

Pages 267–271

ABSTRACT

"Garbage in, Garbage out" is a familiar maxim to ML practitioners and researchers, because the quality of a learned data representation is highly crucial to the quality of any ML model that consumes it as an input. To handle systems that serve billions of users at millions of queries per second (QPS), we need representation learning algorithms with significantly improved efficiency. At Google, we have dedicated thousands of iterations to develop a set of powerful techniques that efficiently learn high quality data representations. We have thoroughly validated these methods through offline evaluation, online A/B testing, and deployed these in over 50 models across major Google products. In this paper, we consider a generalized data representation learning problem that allows us to identify feature embeddings and crosses as common challenges. We propose two solutions, including: 1. Multi-size Unified Embedding to learn high-quality embeddings; and 2. Deep Cross Network V2 for learning effective feature crosses. We discuss the practical challenges we encountered and solutions we developed during deployment to production systems, compare with SOTA methods, and report offline and online experimental results. This work sheds light on the challenges and opportunities for developing next-gen algorithms for web-scale systems.

References

Rohan Anil, Sandra Gadanho, Da Huang, Nijith Jacob, Zhuoshu Li, Dong Lin, Todd Phillips, Cristina Pop, Kevin Regan, Gil I. Shamir, Rakesh Shivanna, and Qiqi Yan. 2022. On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models. In Proceedings of the 5th Workshop on Online Recommender Systems and User Modeling co-located with the 16th ACM Conference on Recommender Systems, ORSUM@RecSys, João Vinagre, Marie Al-Ghossein, Alípio Mário Jorge, Albert Bifet, and Ladislav Peska (Eds.).Google Scholar
Yoshua Bengio, Réjean Ducharme, and Pascal Vincent. 2000. A Neural Probabilistic Language Model. In Advances in Neural Information Processing Systems. 932–938.Google Scholar
Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H Chi. 2018. Latent cross: Making use of context in recurrent recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 46–54.Google ScholarDigital Library
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Workshop on Deep Learning for Recommender Systems.Google ScholarDigital Library
Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, and Derek Zhiyuan Cheng. 2023. Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems. arxiv:2305.12102 [cs.LG]Google Scholar
Aditya Desai, Li Chou, and Anshumali Shrivastava. 2022. Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systems. Proceedings of Machine Learning and Systems 4 (2022), 762–778.Google Scholar
Aditya Desai and Anshumali Shrivastava. 2022. The trade-offs of model size in large recommendation models : 100GB to 10MB Criteo-tb DLRM model. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=c9I_NArDIjDGoogle Scholar
Kaize Ding, Albert Jiongqian Liang, Bryan Perrozi, Ting Chen, Ruoxi Wang, Lichan Hong, Ed H. Chi, Huan Liu, and Derek Zhiyuan Cheng. 2023. HyperFormer: Learning Expressive Sparse Feature Representations via Hypergraph Transformer. arxiv:2305.17386 [cs.IR]Google Scholar
Matthew Fahrbach, Adel Javanmard, Vahab Mirrokni, and Pratik Worah. 2023. Learning Rate Schedules in the Presence of Distribution Shift. arXiv preprint arXiv:2303.15634 (2023).Google Scholar
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
Norman P Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, 2023. Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. Proceedings of the 50th International Symposium of Computer Architecture (2023).Google ScholarDigital Library
Wang-Cheng Kang, Derek Zhiyuan Cheng, Ting Chen, Xinyang Yi, Dong Lin, Lichan Hong, and Ed H. Chi. 2020. Learning Multi-Granular Quantized Embeddings for Large-Vocab Categorical Features in Recommender Systems. In Companion Proceedings of the Web Conference 2020 (Taipei, Taiwan) (WWW ’20). Association for Computing Machinery, New York, NY, USA, 562–566. https://doi.org/10.1145/3366424.3383416Google ScholarDigital Library
Wang-Cheng Kang, Derek Zhiyuan Cheng, Tiansheng Yao, Xinyang Yi, Ting Chen, Lichan Hong, and Ed H. Chi. 2021. Learning to Embed Categorical Features without Embedding Tables for Recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Virtual Event, Singapore) (KDD ’21). Association for Computing Machinery, New York, NY, USA, 840–850. https://doi.org/10.1145/3447548.3467304Google ScholarDigital Library
Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, and Ali Farhadi. 2022. Matryoshka Representation Learning. arxiv:2205.13147 [cs.LG]Google Scholar
Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xDeepFM. In SIGKDD.Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems. 3111–3119.Google Scholar
Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Dimitry Melts, Krishna Dhulipala, K. R. Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Guoqiang Jerry Chen, Manoj Krishnan, Avinash Nayak, Krishnakumar Nair, Bharath Muthiah, Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Ajit Mathews, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, and Vijay Rao. 2022. Software-hardware co-design for fast and scalable training of deep learning recommendation models. In The 49th Annual International Symposium on Computer Architecture. ACM, 993–1011.Google ScholarDigital Library
Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).Google Scholar
Vardan Papyan, XY Han, and David L Donoho. 2020. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences 117, 40 (2020), 24652–24663.Google ScholarCross Ref
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533.Google ScholarCross Ref
Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, and Jiyan Yang. 2020. Compositional embeddings using complementary partitions for memory-efficient recommendation systems. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 165–175.Google ScholarDigital Library
Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1161–1170.Google ScholarDigital Library
Dan Svenstrup, Jonas Hansen, and Ole Winther. 2017. Hash embeddings for efficient word representations. Advances in Neural Information Processing Systems (2017).Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. In Proceedings of the ADKDD’17. 1–7.Google ScholarDigital Library
Ruoxi Wang, Rakesh Shivanna, Derek Cheng, Sagar Jain, Dong Lin, Lichan Hong, and Ed Chi. 2021. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the web conference 2021. 1785–1797.Google ScholarDigital Library
Yingcan Wei, Matthias Langer, Fan Yu, Minseok Lee, Jie Liu, Ji Shi, and Zehuan Wang. 2022. A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models. In Sixteenth ACM Conference on Recommender Systems. ACM, 408–419.Google ScholarDigital Library
Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning. 1113–1120.Google ScholarDigital Library
Ji Yang, Xinyang Yi, Derek Zhiyuan Cheng, Lichan Hong, Yang Li, Simon Xiaoming Wang, Taibai Xu, and Ed H. Chi. 2020. Mixed Negative Sampling for Learning Two-Tower Neural Networks in Recommendations. In Companion Proceedings of the Web Conference 2020 (Taipei, Taiwan) (WWW ’20). 441–447.Google ScholarDigital Library
Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed H. Chi. 2019. Sampling-bias-corrected neural modeling for large corpus item recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems. ACM, 269–277.Google ScholarDigital Library

Recommendations

Data representation learning via dictionary learning and self-representation
Abstract
Dictionary learning is an effective feature learning method, leading to many remarkable results in data representation and classification tasks. However, dictionary learning is performed on the original data representation. In some cases, the ...
Read More
Data-efficient hierarchical reinforcement learning
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve more complex tasks. Yet, the majority of current HRL methods require careful task-specific design and on-policy training, ...
Read More
Representation Learning: A Review and New Perspectives

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems
September 2023
1406 pages
ISBN:9798400702419
DOI:10.1145/3604915
Editors:
Jie Zhang,
Li Chen,
Shlomo Berkovsky,
Min Zhang,
Tommaso di Noia,
Justin Basilico,
Luiz Pizzato,
Yang Song
Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 September 2023
Check for updates
Author Tags
Ads
Data Representation Learning
Efficiency
Embedding Learning
Feature Cross
Recommendation Systems
Scalability
Search
Qualifiers
- abstract
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate254of1,295submissions,20%
Upcoming Conference
RecSys '24

Sponsor:

sigchi

18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 1,516
  Total Downloads
- Downloads (Last 12 months)1,516
- Downloads (Last 6 weeks)39
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Efficient Data Representation Learning in Google-scale Systems

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

ABSTRACT

References

Cited By

Recommendations

Data representation learning via dictionary learning and self-representation

Data-efficient hierarchical reinforcement learning

Representation Learning: A Review and New Perspectives