research-article

Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations

Authors:

Derek Zhiyuan Cheng,

Simon Xiaoming Wang,

Ed H. ChiAuthors Info & Claims

WWW '20: Companion Proceedings of the Web Conference 2020

Pages 441 - 447

https://doi.org/10.1145/3366424.3386195

Published: 20 April 2020 Publication History

Abstract

Learning query and item representations is important for building large scale recommendation systems. In many real applications where there is a huge catalog of items to recommend, the problem of efficiently retrieving top k items given user’s query from deep corpus leads to a family of factorized modeling approaches where queries and items are jointly embedded into a low-dimensional space. In this paper, we first showcase how to apply a two-tower neural network framework, which is also known as dual encoder in the natural language community, to improve a large-scale, production app recommendation system. Furthermore, we offer a novel negative sampling approach called Mixed Negative Sampling (MNS). In particular, different from commonly used batch or unigram sampling methods, MNS uses a mixture of batch and uniformly sampled negatives to tackle the selection bias of implicit user feedback. We conduct extensive offline experiments using large-scale production dataset and show that MNS outperforms other baseline sampling methods. We also conduct online A/B testing and demonstrate that the two-tower retrieval model based on MNS significantly improves retrieval quality by encouraging more high-quality app installs.

References

[1]

Yoshua Bengio and Jean-Sébastien Sénécal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In Proceedings of the conference on Artificial Intelligence and Statistics (AISTATS).

[2]

Y. Bengio and J. S. Senecal. 2008. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model. Trans. Neur. Netw. 19, 4 (April 2008), 713–722. https://doi.org/10.1109/TNN.2007.912312

[3]

Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed Chi. 2018. Top-K Off-Policy Correction for a REINFORCE Recommender System. arxiv:cs.LG/1812.02353

[4]

Tianqi Chen, Weinan Zhang, Qiuxia Lu, Kailong Chen, Zhao Zheng, and Yong Yu. 2012. SVDFeature: A Toolkit for Feature-based Collaborative Filtering. J. Mach. Learn. Res. 13, 1 (Dec. 2012), 3619–3622. http://dl.acm.org/citation.cfm?id=2503308.2503357

Digital Library

[5]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. arXiv:1606.07792 (2016). http://arxiv.org/abs/1606.07792

[6]

Muthuraman Chidambaram, Yinfei Yang, Daniel Cer, Steve Yuan, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model. CoRR abs/1810.12836(2018).

[7]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. New York, NY, USA.

Digital Library

[8]

Daniel Gillick, Alessandro Presta, and Gaurav Singh Tomar. 2018. End-to-End Retrieval in Continuous Space. arxiv:cs.IR/1811.08008

[9]

Carlos A. Gomez-Uribe and Neil Hunt. 2015. The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Manage. Inf. Syst. 6, 4, Article 13 (Dec. 2015), 19 pages. https://doi.org/10.1145/2843948

Digital Library

[10]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.

Digital Library

[11]

Joshua Goodman. 2001. Classes for Fast Maximum Entropy Training. In ICASSP.

[12]

Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha. 2016. Quantization based Fast Inner Product Search. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Vol. 51. PMLR.

[13]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW ’17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 173–182. https://doi.org/10.1145/3038912.3052569

[14]

Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In 2008 Eighth IEEE International Conference on Data Mining. 263–272. https://doi.org/10.1109/ICDM.2008.22

[15]

David C. Liu, Stephanie Rogers, Raymond Shiau, Dmitry Kislyuk, Kevin C. Ma, Zhigang Zhong, Jenny Liu, and Yushi Jing. 2017. Related Pins at Pinterest: The Evolution of a Real-World Recommender System. In WWW.

[16]

Lajanugen Logeswaran and Honglak Lee. 2018. An efficient framework for learning sentence representations. In International Conference on Learning Representations. https://openreview.net/forum?id=rJvJXZb0W

[17]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. abs/1301.3781 (2013).

[18]

Frederic Morin and Yoshua Bengio. 2005. Hierarchical Probabilistic Neural Network Language Model. In AISTATS.

[19]

Paul Neculoiu, Maarten Versteegh, and Mihai Rotaru. 2016. Learning Text Similarity with Siamese Recurrent Networks. In Proceedings of the 1st Workshop on Representation Learning for NLP. Association for Computational Linguistics, Berlin, Germany, 148–157. https://doi.org/10.18653/v1/W16-1617

[20]

Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based News Recommendation for Millions of Users. In KDD.

[21]

S. Rendle. 2010. Factorization Machines. In 2010 IEEE International Conference on Data Mining. 995–1000. https://doi.org/10.1109/ICDM.2010.127

Digital Library

[22]

Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N Holtmann-Rice, David Simcha, and Felix Yu. 2017. Multiscale Quantization for Fast Similarity Search. In Advances in Neural Information Processing Systems 30.

[23]

Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-Yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Learning Semantic Textual Similarity from Conversations. In Proceedings of The Third Workshop on Representation Learning for NLP. Association for Computational Linguistics, Melbourne, Australia, 164–174. https://www.aclweb.org/anthology/W18-3022

[24]

Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Ajit Kumthekar, Zhe Zhao, Li Wei, and Ed Chi. 2019. Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations. 13th ACM Conference on Recommender Systems. Copenhagen, Denmark (2019).

[25]

Andrew Zhai, Dmitry Kislyuk, Yushi Jing, Michael Feng, Eric Tzeng, Jeff Donahue, Yue Li Du, and Trevor Darrell. 2017. Visual Discovery at Pinterest. (02 2017).

Cited By

Zhai JLiao LLiu XWang YLi RCao XGao LGong ZGu FHe JLu YShi YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Actions speak louder than wordsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694484(58484-58509)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694484
Martin CBoutilier CMeshi OSandholm TLarson K(2024)Model-free preference elicitationProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/387(3493-3503)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/387
Chen LLi Y(2024)Enhancing chemical synthesis: a two-stage deep neural network for predicting feasible reaction conditionsJournal of Cheminformatics10.1186/s13321-024-00805-416:1Online publication date: 24-Jan-2024
https://doi.org/10.1186/s13321-024-00805-4
Show More Cited By

Index Terms

Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations

Index terms have been assigned to the content through auto-classification.

Recommendations

Cross-Batch Negative Sampling for Training Two-Tower Recommenders
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

The two-tower architecture has been widely applied for learning item and user representations, which is important for large-scale recommender systems. Many two-tower models are trained using various in-batch negative sampling strategies, where the ...
Latent Probabilistic Model for Context-Aware Recommendations
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01

Recommender systems (RS) are software tools that provide personalized recommendations of relevant items to individual users. However, most of them do not take into account additional contextual information that may affect user preferences, such as place, ...
Bayesian probabilistic model for context-aware recommendations
iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services

Context-aware recommender systems that provide better recommendations for users by using their rating history in different situations have been proposed. Because incorporating all contextual information can make the data sparser and degrade the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Companion Proceedings of the Web Conference 2020

April 2020

854 pages

ISBN:9781450370240

DOI:10.1145/3366424

Editors:
Amal El Fallah Seghrouchni
Sorbonne University, France
,
Gita Sukthankar
University of Central Florida, United States
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

61
Total Citations
View Citations
2,645
Total Downloads

Downloads (Last 12 months)283
Downloads (Last 6 weeks)23

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhai JLiao LLiu XWang YLi RCao XGao LGong ZGu FHe JLu YShi YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Actions speak louder than wordsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694484(58484-58509)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694484
Martin CBoutilier CMeshi OSandholm TLarson K(2024)Model-free preference elicitationProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/387(3493-3503)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/387
Chen LLi Y(2024)Enhancing chemical synthesis: a two-stage deep neural network for predicting feasible reaction conditionsJournal of Cheminformatics10.1186/s13321-024-00805-416:1Online publication date: 24-Jan-2024
https://doi.org/10.1186/s13321-024-00805-4
Iliadis DDe Baets BPahikkala TWaegeman W(2024)A comparison of embedding aggregation strategies in drug–target interaction predictionBMC Bioinformatics10.1186/s12859-024-05684-y25:1Online publication date: 6-Feb-2024
https://doi.org/10.1186/s12859-024-05684-y
Wu XPuthenputhussery AShang HKang CFang Y(2024)Meta Learning to Rank for Sparsely Supervised QueriesACM Transactions on Information Systems10.1145/3698876Online publication date: 8-Oct-2024
https://doi.org/10.1145/3698876
Lin JLi QXie GGuan ZJiang YXu TZhang ZZhao PCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Mitigating Sample Selection Bias with Robust Domain Adaption in Multimedia RecommendationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680615(7581-7590)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680615
Tang YZhang RGuo Jde Rijke MChen WCheng X(2024)Listwise Generative Retrieval Models via a Sequential Learning ProcessACM Transactions on Information Systems10.1145/365371242:5(1-31)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3653712
Wang JLu HLiu YMa HWang YGu YZhang SHan NBi SBaugher LChi EChen M(2024)LLMs for User Interest Exploration in Large-scale Recommendation SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688161(872-877)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688161
Lin HChen HYang JXu J(2024)Bootstrapping Conditional Retrieval for User-to-Item RecommendationsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688057(755-757)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688057
Tandoi MSolis Morales D(2024)Explore versus repeat: insights from an online supermarketProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688050(787-789)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688050
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten