ABSTRACT
Neural search ranking models have been not only actively studied in the information retrieval community, but also widely adopted in real-world industrial applications. However, due to the non-convexity and stochastic training of neural model formulations, the obtained models are unstable in the sense that model predictions can vary a lot for two models trained with the same configuration. In practice, new features are continuously introduced and new model architectures are explored to improve model effectiveness. In these cases, the instability of neural models leads to unnecessary document ranking changes for a large portion of queries. Such changes not only lead to inconsistent user experience, but also add noise to online experimentation and can slow down model improvement cycles. How to stabilize neural search ranking models during model update is an important but largely unexplored problem. Motivated by trigger analysis, we suggest balancing the trade-off between performance improvement and the number of affected queries. Concretely, we formulate it as an optimization problem with the objective as maximizing the average effect over the affected queries. We propose two heuristics and one theory-guided stabilization method to solve the optimization problem. Our proposed methods are evaluated on two of the world’s largest personal search services: Gmail search and Google Drive search. Empirical results show that our proposed methods are very effective in optimizing the proposed objective and are applicable to different model update scenarios.
- Tal Ben-Nun and Torsten Hoefler. 2018. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. arXiv preprint arXiv:1802.09941(2018).Google Scholar
- Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H Chi. 2018. Latent cross: Making use of context in recurrent recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 46–54.Google ScholarDigital Library
- Eliot Brenner, Jun Zhao, Aliasgar Kutiyanawala, and Zheng Yan. 2018. End-to-End Neural Ranking for eCommerce Product Search: an Application of Task Models and Textual Embeddings.Proc. of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval(2018).Google Scholar
- Sebastian Bruch, Xuanhui Wang, Mike Bendersky, and Marc Najork. 2019. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance. In Proceedings of the 2019 ACM SIGIR International Conference on the Theory of Information Retrieval.Google ScholarDigital Library
- Christopher Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine learning. 89–96.Google ScholarDigital Library
- Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81.Google Scholar
- Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine learning. 129–136.Google ScholarDigital Library
- David Carmel, Guy Halawi, Liane Lewin-Eytan, Yoelle Maarek, and Ariel Raviv. 2015. Rank by time or by relevance?: Revisiting email search. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 283–292.Google ScholarDigital Library
- Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. ACM, 7–10.Google ScholarDigital Library
- W Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search engines: Information retrieval in practice. Vol. 520. Addison-Wesley Reading.Google Scholar
- Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1223–1231.Google Scholar
- Alex Deng and Victor Hu. 2015. Diluted treatment effect estimation for trigger analysis in online controlled experiments. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 349–358.Google ScholarDigital Library
- Alex Deng, Ya Xu, Ron Kohavi, and Toby Walker. 2013. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. ACM, 123–132.Google ScholarDigital Library
- Werner Dinkelbach. 1967. On Nonlinear Fractional Programming. Management Science 13, 7 (1967), 492–498.Google ScholarDigital Library
- John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, Jul (2011), 2121–2159.Google ScholarDigital Library
- Yoav Freund, Robert E Schapire, 1996. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, Vol. 96. 148–156.Google Scholar
- Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics(2001), 1189–1232.Google Scholar
- Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W Bruce Croft, and Xueqi Cheng. 2019. A deep look into neural ranking models for information retrieval. arXiv preprint arXiv:1903.06902(2019).Google Scholar
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 133–142.Google ScholarDigital Library
- Ronny Kohavi, Thomas Crook, Roger Longbotham, Brian Frasca, Randy Henne, Juan Lavista Ferres, and Tamir Melamed. 2009. Online experimentation at Microsoft. Data Mining Case Studies 11 (2009), 39.Google Scholar
- Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1168–1176.Google ScholarDigital Library
- Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M Henne. 2009. Controlled experiments on the web: survey and practical guide. Data mining and knowledge discovery 18, 1 (2009), 140–181.Google Scholar
- Ron Kohavi, Llew Mason, Rajesh Parekh, and Zijian Zheng. 2004. Lessons and challenges from mining retail e-commerce data. Machine Learning 57, 1-2 (2004), 83–113.Google ScholarDigital Library
- Ronny Kohavi and Matt Round. 2004. Front line internet analytics at Amazon. com. Santa Barbara, CA (2004).Google Scholar
- Erich L Lehmann and Joseph P Romano. 2006. Testing statistical hypotheses. Springer Science & Business Media.Google Scholar
- Pan Li, Zhen Qin, Xuanhui Wang, and Donald Metzler. 2019. Combining Decision Trees and Neural Networks for Learning-to-Rank in Personal Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 2032–2040.Google ScholarDigital Library
- Tie-Yan Liu 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3(2009), 225–331.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.Google Scholar
- Bhaskar Mitra, Nick Craswell, 2018. An introduction to neural information retrieval. Foundations and Trends® in Information Retrieval 13, 1(2018), 1–126.Google Scholar
- Quynh Nguyen and Matthias Hein. 2017. The loss surface of deep and wide neural networks. In Proceedings of the 34th International Conference on Machine Learning. 2603–2612.Google Scholar
- Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, Md Mustafizur Rahman, Pinar Karagoz, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, 2018. Neural information retrieval: At the end of the early years. Information Retrieval Journal 21, 2-3 (2018), 111–182.Google ScholarDigital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web.Technical Report. Stanford InfoLab.Google Scholar
- Tomaso Poggio and Qianli Liao. 2017. Theory II: Landscape of the empirical risk in deep learning. Ph.D. Dissertation. Center for Brains, Minds and Machines (CBMM), arXiv.Google Scholar
- Rohan Ramanath, Gungor Polatkan, Liqin Xu, Harold Lee, Bo Hu, and Shan Zhou. 2018. Deploying deep ranking models for search verticals. arXiv preprint arXiv:1806.02281(2018).Google Scholar
- Shai Shalev-Shwartz. 2014. Selfieboost: A boosting algorithm for deep learning. arXiv preprint arXiv:1411.3436(2014).Google Scholar
- Daniel Soudry and Yair Carmon. 2016. No bad local minima: Data independent training error guarantees for multilayer neural networks. arXiv preprint arXiv:1605.08361(2016).Google Scholar
- Daniel Soudry and Elad Hoffer. 2017. Exponentially vanishing sub-optimal local minima in multilayer neural networks. arXiv preprint arXiv:1702.05777(2017).Google Scholar
- Danny Sullivan. 2016. FAQ: All about the Google RankBrain algorithm. Google’s using a machine learning technology called RankBrain to help deliver its search results. Here’s what’s we know about it.[cited 2018 May 15] Available from: https://searchengineland. com/faq-all-about-the-new-google-rankbrain-algorithm-234440(2016).Google Scholar
- Diane Tang, Ashish Agarwal, Deirdre O’Brien, and Mike Meyer. 2010. Overlapping experiment infrastructure: More, better, faster experimentation. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 17–26.Google ScholarDigital Library
- Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proc. of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 115–124.Google ScholarDigital Library
- Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining(WSDM ’18). ACM, New York, NY, USA, 610–618. https://doi.org/10.1145/3159652.3159732Google ScholarDigital Library
- Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th International Conference on Machine learning. 1192–1199.Google ScholarDigital Library
Index Terms
- Stabilizing Neural Search Ranking Models
Recommendations
Ranking Relevance in Yahoo Search
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningSearch engines play a crucial role in our daily lives. Relevance is the core problem of a commercial search engine. It has attracted thousands of researchers from both academia and industry and has been studied for decades. Relevance in a modern search ...
Listwise Neural Ranking Models
ICTIR '19: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information RetrievalSeveral neural networks have been developed for end-to-end training of information retrieval models. These networks differ in many aspects including architecture, training data, data representations, and loss functions. However, only pointwise and ...
Explore click models for search ranking
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementRecent advances in click model have positioned it as an effective approach to estimate document relevance based on user behavior in web search. Yet, few works have been conducted to explore the use of click model to help web search ranking. In this ...
Comments