skip to main content
10.1145/3366423.3380030acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Stabilizing Neural Search Ranking Models

Published:20 April 2020Publication History

ABSTRACT

Neural search ranking models have been not only actively studied in the information retrieval community, but also widely adopted in real-world industrial applications. However, due to the non-convexity and stochastic training of neural model formulations, the obtained models are unstable in the sense that model predictions can vary a lot for two models trained with the same configuration. In practice, new features are continuously introduced and new model architectures are explored to improve model effectiveness. In these cases, the instability of neural models leads to unnecessary document ranking changes for a large portion of queries. Such changes not only lead to inconsistent user experience, but also add noise to online experimentation and can slow down model improvement cycles. How to stabilize neural search ranking models during model update is an important but largely unexplored problem. Motivated by trigger analysis, we suggest balancing the trade-off between performance improvement and the number of affected queries. Concretely, we formulate it as an optimization problem with the objective as maximizing the average effect over the affected queries. We propose two heuristics and one theory-guided stabilization method to solve the optimization problem. Our proposed methods are evaluated on two of the world’s largest personal search services: Gmail search and Google Drive search. Empirical results show that our proposed methods are very effective in optimizing the proposed objective and are applicable to different model update scenarios.

References

  1. Tal Ben-Nun and Torsten Hoefler. 2018. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. arXiv preprint arXiv:1802.09941(2018).Google ScholarGoogle Scholar
  2. Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H Chi. 2018. Latent cross: Making use of context in recurrent recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 46–54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Eliot Brenner, Jun Zhao, Aliasgar Kutiyanawala, and Zheng Yan. 2018. End-to-End Neural Ranking for eCommerce Product Search: an Application of Task Models and Textual Embeddings.Proc. of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval(2018).Google ScholarGoogle Scholar
  4. Sebastian Bruch, Xuanhui Wang, Mike Bendersky, and Marc Najork. 2019. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance. In Proceedings of the 2019 ACM SIGIR International Conference on the Theory of Information Retrieval.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Christopher Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine learning. 89–96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81.Google ScholarGoogle Scholar
  7. Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine learning. 129–136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. David Carmel, Guy Halawi, Liane Lewin-Eytan, Yoelle Maarek, and Ariel Raviv. 2015. Rank by time or by relevance?: Revisiting email search. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 283–292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. ACM, 7–10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search engines: Information retrieval in practice. Vol. 520. Addison-Wesley Reading.Google ScholarGoogle Scholar
  11. Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1223–1231.Google ScholarGoogle Scholar
  12. Alex Deng and Victor Hu. 2015. Diluted treatment effect estimation for trigger analysis in online controlled experiments. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 349–358.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alex Deng, Ya Xu, Ron Kohavi, and Toby Walker. 2013. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. ACM, 123–132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Werner Dinkelbach. 1967. On Nonlinear Fractional Programming. Management Science 13, 7 (1967), 492–498.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, Jul (2011), 2121–2159.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yoav Freund, Robert E Schapire, 1996. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, Vol. 96. 148–156.Google ScholarGoogle Scholar
  17. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics(2001), 1189–1232.Google ScholarGoogle Scholar
  18. Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W Bruce Croft, and Xueqi Cheng. 2019. A deep look into neural ranking models for information retrieval. arXiv preprint arXiv:1903.06902(2019).Google ScholarGoogle Scholar
  19. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 133–142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ronny Kohavi, Thomas Crook, Roger Longbotham, Brian Frasca, Randy Henne, Juan Lavista Ferres, and Tamir Melamed. 2009. Online experimentation at Microsoft. Data Mining Case Studies 11 (2009), 39.Google ScholarGoogle Scholar
  21. Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1168–1176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M Henne. 2009. Controlled experiments on the web: survey and practical guide. Data mining and knowledge discovery 18, 1 (2009), 140–181.Google ScholarGoogle Scholar
  23. Ron Kohavi, Llew Mason, Rajesh Parekh, and Zijian Zheng. 2004. Lessons and challenges from mining retail e-commerce data. Machine Learning 57, 1-2 (2004), 83–113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ronny Kohavi and Matt Round. 2004. Front line internet analytics at Amazon. com. Santa Barbara, CA (2004).Google ScholarGoogle Scholar
  25. Erich L Lehmann and Joseph P Romano. 2006. Testing statistical hypotheses. Springer Science & Business Media.Google ScholarGoogle Scholar
  26. Pan Li, Zhen Qin, Xuanhui Wang, and Donald Metzler. 2019. Combining Decision Trees and Neural Networks for Learning-to-Rank in Personal Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 2032–2040.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tie-Yan Liu 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3(2009), 225–331.Google ScholarGoogle Scholar
  28. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).Google ScholarGoogle Scholar
  29. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.Google ScholarGoogle Scholar
  30. Bhaskar Mitra, Nick Craswell, 2018. An introduction to neural information retrieval. Foundations and Trends® in Information Retrieval 13, 1(2018), 1–126.Google ScholarGoogle Scholar
  31. Quynh Nguyen and Matthias Hein. 2017. The loss surface of deep and wide neural networks. In Proceedings of the 34th International Conference on Machine Learning. 2603–2612.Google ScholarGoogle Scholar
  32. Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, Md Mustafizur Rahman, Pinar Karagoz, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, 2018. Neural information retrieval: At the end of the early years. Information Retrieval Journal 21, 2-3 (2018), 111–182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web.Technical Report. Stanford InfoLab.Google ScholarGoogle Scholar
  34. Tomaso Poggio and Qianli Liao. 2017. Theory II: Landscape of the empirical risk in deep learning. Ph.D. Dissertation. Center for Brains, Minds and Machines (CBMM), arXiv.Google ScholarGoogle Scholar
  35. Rohan Ramanath, Gungor Polatkan, Liqin Xu, Harold Lee, Bo Hu, and Shan Zhou. 2018. Deploying deep ranking models for search verticals. arXiv preprint arXiv:1806.02281(2018).Google ScholarGoogle Scholar
  36. Shai Shalev-Shwartz. 2014. Selfieboost: A boosting algorithm for deep learning. arXiv preprint arXiv:1411.3436(2014).Google ScholarGoogle Scholar
  37. Daniel Soudry and Yair Carmon. 2016. No bad local minima: Data independent training error guarantees for multilayer neural networks. arXiv preprint arXiv:1605.08361(2016).Google ScholarGoogle Scholar
  38. Daniel Soudry and Elad Hoffer. 2017. Exponentially vanishing sub-optimal local minima in multilayer neural networks. arXiv preprint arXiv:1702.05777(2017).Google ScholarGoogle Scholar
  39. Danny Sullivan. 2016. FAQ: All about the Google RankBrain algorithm. Google’s using a machine learning technology called RankBrain to help deliver its search results. Here’s what’s we know about it.[cited 2018 May 15] Available from: https://searchengineland. com/faq-all-about-the-new-google-rankbrain-algorithm-234440(2016).Google ScholarGoogle Scholar
  40. Diane Tang, Ashish Agarwal, Deirdre O’Brien, and Mike Meyer. 2010. Overlapping experiment infrastructure: More, better, faster experimentation. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 17–26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proc. of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 115–124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining(WSDM ’18). ACM, New York, NY, USA, 610–618. https://doi.org/10.1145/3159652.3159732Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th International Conference on Machine learning. 1192–1199.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Stabilizing Neural Search Ranking Models
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '20: Proceedings of The Web Conference 2020
        April 2020
        3143 pages
        ISBN:9781450370233
        DOI:10.1145/3366423

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 April 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

        Upcoming Conference

        WWW '24
        The ACM Web Conference 2024
        May 13 - 17, 2024
        Singapore , Singapore
      • Article Metrics

        • Downloads (Last 12 months)7
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format