skip to main content
10.1145/3412841.3441944acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Cost-effective identification of on-topic search queries using multi-armed bandits

Published:22 April 2021Publication History

ABSTRACT

Identifying the topic of a search query is a challenging problem, for which a solution would be valuable in diverse situations. In this work, we formulate the problem as a ranking task where various rankers order queries in terms of likelihood of being related to a specific topic of interest. In doing so, an explore-exploit trade-off is established whereby exploiting effective rankers may result in more on-topic queries being discovered, but exploring weaker rankers might also offer value for the overall judgement process. We show empirically that multi-armed bandit algorithms can utilise signals from divergent query rankers, resulting in improved performance in extracting on-topic queries. In particular we find Bayesian non-stationary approaches to offer high utility. We explain why the results offer promise for several use-cases both within the field of information retrieval and for data-driven science, generally.

References

  1. J. Allan, D. Harman, E. Kanoulas, D. Li, C. Van Gysel, and E. Voorhees. 2017. TREC 2017 Common Core Track Overview. In Proceedings of The Twenty-Sixth Text REtrieval Conference, TREC 2017, Gaithersburg, Maryland, USA, November 15--17, 2017. https://trec.nist.gov/pubs/trec26/papers/Overview-CC.pdfGoogle ScholarGoogle Scholar
  2. G. Amati. 2003. Probability models for information retrieval based on divergence from randomness. Ph.D. Dissertation. University of Glasgow.Google ScholarGoogle Scholar
  3. P. Auer, N. Cesa-Bianchi, and P. Fischer. 2002. Finite-time analysis of the multi-armed bandit problem. Machine learning 47, 2--3 (2002), 235--256.Google ScholarGoogle Scholar
  4. S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kolcz. 2005. Improving automatic query classification via semi-supervised learning. In Fifth IEEE International Conference on Data Mining (ICDM'05). IEEE, 8--pp.Google ScholarGoogle Scholar
  5. M. Bernstein, J. Teevan, S. Dumais, D. Liebling, and E. Horvitz. 2012. Direct answers for search queries in the long tail. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 237--246.Google ScholarGoogle Scholar
  6. C. Buckley, D. Dimmick, I. Soboroff, and E. Voorhees. 2007. Bias and the Limits of Pooling for Large Collections. Inf. Retr. 10, 6 (Dec. 2007), 491--508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Callan. 2012. The Lemur project and its CLUEWEB12 dataset. In Invited talk at the SIGIR 2012 Workshop on Open-Source Information Retrieval.Google ScholarGoogle Scholar
  8. L. Chilton and J. Teevan. 2011. Addressing people's information needs directly in a web search result page. In Proceedings of the 20th international conference on World wide web. ACM, 27--36.Google ScholarGoogle Scholar
  9. M. Chung, R. Oden, B. Joyner, A. Sims, and R. Moon. 2012. Safe infant sleep recommendations on the Internet: let's Google it. The Journal of pediatrics 161, 6 (2012), 1080--1084.Google ScholarGoogle ScholarCross RefCross Ref
  10. G. Cormack and T. Lynam. 2007. Power and Bias of Subset Pooling Strategies. In Proc. of the 30th Annual Int. Conf. on Research and Development in Information Retrieval (Amsterdam, The Netherlands). ACM, USA, 837--838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Cormack, C. Palmer, and C. Clarke. 1998. Efficient Construction of Large Test Collections. In Proc. of the 21st Annual Int. Conf. on Research and Development in Information Retrieval (Melbourne, Australia). ACM, USA, 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W.B. Croft and D. Harper. 1979. Using Probabilistic Models of Document Retrieval without Relevance Information. Journal of Documentation 35, 4 (1979), 285--295.Google ScholarGoogle ScholarCross RefCross Ref
  13. C. Davidson-Pilon. 2015. Probabilistic Programming & Bayesian Methods for Hackers. Addison-Wesley Data & Analytics Series. http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/Google ScholarGoogle Scholar
  14. O. Granmo. 2008. A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems. In Proc. of Seventh Int. Conference on Machine Learning and Applications (ICMLA '08). 23--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein. 2003. Categorizing web queries according to geographical locality. In Proceedings of the twelfth international conference on Information and knowledge management. ACM, 325--333.Google ScholarGoogle Scholar
  16. K. Hofmann, S. Whiteson, and M. de Rijke. 2011. Contextual Bandits for Information Retrieval. In NIPS 2011 Workshop on Bayesian Optimization, Experimental Design, and Bandits. Granada.Google ScholarGoogle Scholar
  17. B. Jansen, D. Booth, and A. Spink. 2007. Determining the user intent of web search engine queries. In Proceedings of the 16th international conference on World Wide Web. ACM, 1149--1150.Google ScholarGoogle Scholar
  18. G. Jayasinghe, W. Webber, M. Sanderson, and J. Culpepper. 2014. Extending Test Collection Pools Without Manual Runs. In Proc. of the 37th Int. ACM SIGIR Conference on Research and Development in Information Retrieval (Gold Coast, Queensland, Australia) (SIGIR '14). ACM, New York, NY, USA, 915--918. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Karimzadehgan and C. Zhai. 2013. A learning approach to optimizing exploration-exploitation tradeoff in relevance feedback. Inf. Retr. 16, 3 (2013), 307--330. http://dblp.uni-trier.de/db/journals/ir/ir16.html#KarimzadehganZ13Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Lewis and W. Gale. 1994. A sequential algorithm for training text classifiers. In Proc. of the 17th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval. 3--12.Google ScholarGoogle Scholar
  21. Y. Li, Z. Zheng, and H. Dai. 2005. KDD CUP-2005 report: facing a great challenge. SIGKDD Explorations 7 (01 2005), 91--99.Google ScholarGoogle Scholar
  22. D.E. Losada, J. Parapar, and A. Barreiro. 2016. Feeling lucky?: multi-armed bandits for ordering judgements in pooling-based evaluation. In Proceedings of the 31st annual ACM symposium on applied computing. ACM, 1027--1034.Google ScholarGoogle Scholar
  23. D.E. Losada, J. Parapar, and A. Barreiro. 2017. Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems. Information Processing Management 53, 5 (2017), 1005 -- 1025. Google ScholarGoogle ScholarCross RefCross Ref
  24. D.E. Losada, J. Parapar, and A. Barreiro. 2019. When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections. Journal of the Association for Information Science and Technology 70, 1 (2019), 49--60. arXiv:https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/asi.24077 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. http://arxiv.org/abs/1301.3781Google ScholarGoogle Scholar
  26. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111--3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdfGoogle ScholarGoogle Scholar
  27. F. Modave, N. Shokar, E. Peñaranda, and N. Nguyen. 2014. Analysis of the accuracy of weight loss information search engine results on the internet. American journal of public health 104, 10 (2014), 1971--1978.Google ScholarGoogle Scholar
  28. A. Moffat, W. Webber, and J. Zobel. 2007. Strategic System Comparisons via Targeted Relevance Judgments. In Proc. 30th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval (Amsterdam, The Netherlands). ACM, NY, USA, 375--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Pagliardini, P. Gupta, and M. Jaggi. 2018. Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 528--540. Google ScholarGoogle ScholarCross RefCross Ref
  30. F. Radlinski, A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, and L. Riedel. 2008. Optimizing relevance and revenue in ad search: a query substitution approach. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 403--410.Google ScholarGoogle Scholar
  31. F. Radlinski, R. Kleinberg, and T. Joachims. 2008. Learning Diverse Rankings with Multi-armed Bandits. In Proc. of the 25th Int. Conference on Machine Learning (Helsinki, Finland) (ICML '08). ACM, New York, NY, USA, 784--791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Robertson and H. Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval 3, 4 (2009), 333--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Scullard, C. Peacock, and P. Davies. 2010. Googling children's health: reliability of medical advice on the internet. Archives of disease in childhood 95, 8 (2010), 580--582.Google ScholarGoogle Scholar
  34. D. Shen, R. Pan, J-T. Sun, J.J. Pan, K. Wu, J. Yin, and Q. Yang. 2006. Query Enrichment for Web-query Classification. ACM Trans. Inf. Syst. 24, 3 (July 2006), 320--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Shen, J-T. Sun, Q. Yang, and Z. Chen. 2006. Building Bridges for Web Query Classification. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, Washington, USA) (SIGIR '06). ACM, New York, NY, USA, 131--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Sloan and J. Wang. 2012. Dynamical Information Retrieval Modelling: A Portfolio-armed Bandit Machine Approach. In Proc. of the 21st Int. Conf. Companion on World Wide Web (Lyon, France). ACM, USA, 603--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Sparck-Jones. 1971. Automatic keyword classification for information retrieval. Butterworths.Google ScholarGoogle Scholar
  38. K. Sparck-Jones and C.J. Van Rijsbergen. 1975. Report on the Need for and Provision of an Ideal Information Retrieval Test Collection. Cambridge: University Computer Laboratory (1975).Google ScholarGoogle Scholar
  39. R. Sutton and A. Barto. 2018. Reinforcement learning: An introduction. MIT press.Google ScholarGoogle Scholar
  40. E. Voorhees. 2002. The Philosophy of Information Retrieval Evaluation. In Proc. of 2nd Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems. Berlin, Heidelberg, 355--370.Google ScholarGoogle ScholarCross RefCross Ref
  41. E. Voorhees and D. Harman. 2005. TREC: Experiment and Evaluation in Information Retrieval. The MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y. Yue and T. Joachims. 2009. Interactively Optimizing Information Retrieval Systems As a Dueling Bandits Problem. In Proc. of the 26th Annual Int. Conference on Machine Learning (Montreal, Quebec, Canada) (ICML '09). ACM, NY, USA, 1201--1208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Cost-effective identification of on-topic search queries using multi-armed bandits

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing
      March 2021
      2075 pages
      ISBN:9781450381048
      DOI:10.1145/3412841

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 April 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,650of6,669submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader