research-article

Stabilizing Neural Search Ranking Models

Authors:
Ruilin Li

Georgia Institute of Technology & Google LLC

Georgia Institute of Technology & Google LLC
View Profile

,
Zhen Qin

Google LLC

Google LLC
View Profile

,
Xuanhui Wang

Google LLC

Google LLC
View Profile

,
Suming J. Chen

Google LLC

Google LLC
View Profile

,
Donald Metzler

Google LLC

Google LLC
View Profile

Authors Info & Claims

WWW '20: Proceedings of The Web Conference 2020April 2020Pages 2725–2732https://doi.org/10.1145/3366423.3380030

Published:20 April 2020Publication History

WWW '20: Proceedings of The Web Conference 2020

Pages 2725–2732

ABSTRACT

Neural search ranking models have been not only actively studied in the information retrieval community, but also widely adopted in real-world industrial applications. However, due to the non-convexity and stochastic training of neural model formulations, the obtained models are unstable in the sense that model predictions can vary a lot for two models trained with the same configuration. In practice, new features are continuously introduced and new model architectures are explored to improve model effectiveness. In these cases, the instability of neural models leads to unnecessary document ranking changes for a large portion of queries. Such changes not only lead to inconsistent user experience, but also add noise to online experimentation and can slow down model improvement cycles. How to stabilize neural search ranking models during model update is an important but largely unexplored problem. Motivated by trigger analysis, we suggest balancing the trade-off between performance improvement and the number of affected queries. Concretely, we formulate it as an optimization problem with the objective as maximizing the average effect over the affected queries. We propose two heuristics and one theory-guided stabilization method to solve the optimization problem. Our proposed methods are evaluated on two of the world’s largest personal search services: Gmail search and Google Drive search. Empirical results show that our proposed methods are very effective in optimizing the proposed objective and are applicable to different model update scenarios.

References

Tal Ben-Nun and Torsten Hoefler. 2018. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. arXiv preprint arXiv:1802.09941(2018).Google Scholar
Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H Chi. 2018. Latent cross: Making use of context in recurrent recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 46–54.Google ScholarDigital Library
Eliot Brenner, Jun Zhao, Aliasgar Kutiyanawala, and Zheng Yan. 2018. End-to-End Neural Ranking for eCommerce Product Search: an Application of Task Models and Textual Embeddings.Proc. of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval(2018).Google Scholar
Sebastian Bruch, Xuanhui Wang, Mike Bendersky, and Marc Najork. 2019. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance. In Proceedings of the 2019 ACM SIGIR International Conference on the Theory of Information Retrieval.Google ScholarDigital Library
Christopher Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine learning. 89–96.Google ScholarDigital Library
Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81.Google Scholar
Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine learning. 129–136.Google ScholarDigital Library
David Carmel, Guy Halawi, Liane Lewin-Eytan, Yoelle Maarek, and Ariel Raviv. 2015. Rank by time or by relevance?: Revisiting email search. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 283–292.Google ScholarDigital Library
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. ACM, 7–10.Google ScholarDigital Library
W Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search engines: Information retrieval in practice. Vol. 520. Addison-Wesley Reading.Google Scholar
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1223–1231.Google Scholar
Alex Deng and Victor Hu. 2015. Diluted treatment effect estimation for trigger analysis in online controlled experiments. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 349–358.Google ScholarDigital Library
Alex Deng, Ya Xu, Ron Kohavi, and Toby Walker. 2013. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. ACM, 123–132.Google ScholarDigital Library
Werner Dinkelbach. 1967. On Nonlinear Fractional Programming. Management Science 13, 7 (1967), 492–498.Google ScholarDigital Library
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, Jul (2011), 2121–2159.Google ScholarDigital Library
Yoav Freund, Robert E Schapire, 1996. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, Vol. 96. 148–156.Google Scholar
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics(2001), 1189–1232.Google Scholar
Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W Bruce Croft, and Xueqi Cheng. 2019. A deep look into neural ranking models for information retrieval. arXiv preprint arXiv:1903.06902(2019).Google Scholar
Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 133–142.Google ScholarDigital Library
Ronny Kohavi, Thomas Crook, Roger Longbotham, Brian Frasca, Randy Henne, Juan Lavista Ferres, and Tamir Melamed. 2009. Online experimentation at Microsoft. Data Mining Case Studies 11 (2009), 39.Google Scholar
Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1168–1176.Google ScholarDigital Library
Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M Henne. 2009. Controlled experiments on the web: survey and practical guide. Data mining and knowledge discovery 18, 1 (2009), 140–181.Google Scholar
Ron Kohavi, Llew Mason, Rajesh Parekh, and Zijian Zheng. 2004. Lessons and challenges from mining retail e-commerce data. Machine Learning 57, 1-2 (2004), 83–113.Google ScholarDigital Library
Ronny Kohavi and Matt Round. 2004. Front line internet analytics at Amazon. com. Santa Barbara, CA (2004).Google Scholar
Erich L Lehmann and Joseph P Romano. 2006. Testing statistical hypotheses. Springer Science & Business Media.Google Scholar
Pan Li, Zhen Qin, Xuanhui Wang, and Donald Metzler. 2019. Combining Decision Trees and Neural Networks for Learning-to-Rank in Personal Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 2032–2040.Google ScholarDigital Library
Tie-Yan Liu 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3(2009), 225–331.Google Scholar
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.Google Scholar
Bhaskar Mitra, Nick Craswell, 2018. An introduction to neural information retrieval. Foundations and Trends® in Information Retrieval 13, 1(2018), 1–126.Google Scholar
Quynh Nguyen and Matthias Hein. 2017. The loss surface of deep and wide neural networks. In Proceedings of the 34th International Conference on Machine Learning. 2603–2612.Google Scholar
Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, Md Mustafizur Rahman, Pinar Karagoz, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, 2018. Neural information retrieval: At the end of the early years. Information Retrieval Journal 21, 2-3 (2018), 111–182.Google ScholarDigital Library
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web.Technical Report. Stanford InfoLab.Google Scholar
Tomaso Poggio and Qianli Liao. 2017. Theory II: Landscape of the empirical risk in deep learning. Ph.D. Dissertation. Center for Brains, Minds and Machines (CBMM), arXiv.Google Scholar
Rohan Ramanath, Gungor Polatkan, Liqin Xu, Harold Lee, Bo Hu, and Shan Zhou. 2018. Deploying deep ranking models for search verticals. arXiv preprint arXiv:1806.02281(2018).Google Scholar
Shai Shalev-Shwartz. 2014. Selfieboost: A boosting algorithm for deep learning. arXiv preprint arXiv:1411.3436(2014).Google Scholar
Daniel Soudry and Yair Carmon. 2016. No bad local minima: Data independent training error guarantees for multilayer neural networks. arXiv preprint arXiv:1605.08361(2016).Google Scholar
Daniel Soudry and Elad Hoffer. 2017. Exponentially vanishing sub-optimal local minima in multilayer neural networks. arXiv preprint arXiv:1702.05777(2017).Google Scholar
Danny Sullivan. 2016. FAQ: All about the Google RankBrain algorithm. Google’s using a machine learning technology called RankBrain to help deliver its search results. Here’s what’s we know about it.[cited 2018 May 15] Available from: https://searchengineland. com/faq-all-about-the-new-google-rankbrain-algorithm-234440(2016).Google Scholar
Diane Tang, Ashish Agarwal, Deirdre O’Brien, and Mike Meyer. 2010. Overlapping experiment infrastructure: More, better, faster experimentation. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 17–26.Google ScholarDigital Library
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proc. of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 115–124.Google ScholarDigital Library
Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining(WSDM ’18). ACM, New York, NY, USA, 610–618. https://doi.org/10.1145/3159652.3159732Google ScholarDigital Library
Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th International Conference on Machine learning. 1192–1199.Google ScholarDigital Library

Index Terms

Stabilizing Neural Search Ranking Models
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking

Index terms have been assigned to the content through auto-classification.

Recommendations

Ranking Relevance in Yahoo Search
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Search engines play a crucial role in our daily lives. Relevance is the core problem of a commercial search engine. It has attracted thousands of researchers from both academia and industry and has been studied for decades. Relevance in a modern search ...
Read More
Listwise Neural Ranking Models
ICTIR '19: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval

Several neural networks have been developed for end-to-end training of information retrieval models. These networks differ in many aspects including architecture, training data, data representations, and loss functions. However, only pointwise and ...
Read More
Explore click models for search ranking
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Recent advances in click model have positioned it as an effective approach to estimate document relevance based on user behavior in web search. Yet, few works have been conducted to explore the use of click model to help web search ranking. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '20: Proceedings of The Web Conference 2020
April 2020
3143 pages
ISBN:9781450370233
DOI:10.1145/3366423
Editors:
Yennun Huang
Acadmica sinica, Taiwan
,
Irwin King
The Chinese University of Hong Kong, Hong Kong
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 April 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
learning to rank
neural network
trigger analysis
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 195
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Stabilizing Neural Search Ranking Models

WWW '20: Proceedings of The Web Conference 2020

ABSTRACT

References

Cited By

Index Terms

Recommendations

Ranking Relevance in Yahoo Search

Listwise Neural Ranking Models

Explore click models for search ranking

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Stabilizing Neural Search Ranking Models

WWW '20: Proceedings of The Web Conference 2020

ABSTRACT

References

Cited By

Index Terms

Recommendations

Ranking Relevance in Yahoo Search

Listwise Neural Ranking Models

Explore click models for search ranking

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media