ABSTRACT
Identifying the topic of a search query is a challenging problem, for which a solution would be valuable in diverse situations. In this work, we formulate the problem as a ranking task where various rankers order queries in terms of likelihood of being related to a specific topic of interest. In doing so, an explore-exploit trade-off is established whereby exploiting effective rankers may result in more on-topic queries being discovered, but exploring weaker rankers might also offer value for the overall judgement process. We show empirically that multi-armed bandit algorithms can utilise signals from divergent query rankers, resulting in improved performance in extracting on-topic queries. In particular we find Bayesian non-stationary approaches to offer high utility. We explain why the results offer promise for several use-cases both within the field of information retrieval and for data-driven science, generally.
- J. Allan, D. Harman, E. Kanoulas, D. Li, C. Van Gysel, and E. Voorhees. 2017. TREC 2017 Common Core Track Overview. In Proceedings of The Twenty-Sixth Text REtrieval Conference, TREC 2017, Gaithersburg, Maryland, USA, November 15--17, 2017. https://trec.nist.gov/pubs/trec26/papers/Overview-CC.pdfGoogle Scholar
- G. Amati. 2003. Probability models for information retrieval based on divergence from randomness. Ph.D. Dissertation. University of Glasgow.Google Scholar
- P. Auer, N. Cesa-Bianchi, and P. Fischer. 2002. Finite-time analysis of the multi-armed bandit problem. Machine learning 47, 2--3 (2002), 235--256.Google Scholar
- S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kolcz. 2005. Improving automatic query classification via semi-supervised learning. In Fifth IEEE International Conference on Data Mining (ICDM'05). IEEE, 8--pp.Google Scholar
- M. Bernstein, J. Teevan, S. Dumais, D. Liebling, and E. Horvitz. 2012. Direct answers for search queries in the long tail. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 237--246.Google Scholar
- C. Buckley, D. Dimmick, I. Soboroff, and E. Voorhees. 2007. Bias and the Limits of Pooling for Large Collections. Inf. Retr. 10, 6 (Dec. 2007), 491--508. Google ScholarDigital Library
- J. Callan. 2012. The Lemur project and its CLUEWEB12 dataset. In Invited talk at the SIGIR 2012 Workshop on Open-Source Information Retrieval.Google Scholar
- L. Chilton and J. Teevan. 2011. Addressing people's information needs directly in a web search result page. In Proceedings of the 20th international conference on World wide web. ACM, 27--36.Google Scholar
- M. Chung, R. Oden, B. Joyner, A. Sims, and R. Moon. 2012. Safe infant sleep recommendations on the Internet: let's Google it. The Journal of pediatrics 161, 6 (2012), 1080--1084.Google ScholarCross Ref
- G. Cormack and T. Lynam. 2007. Power and Bias of Subset Pooling Strategies. In Proc. of the 30th Annual Int. Conf. on Research and Development in Information Retrieval (Amsterdam, The Netherlands). ACM, USA, 837--838. Google ScholarDigital Library
- G. Cormack, C. Palmer, and C. Clarke. 1998. Efficient Construction of Large Test Collections. In Proc. of the 21st Annual Int. Conf. on Research and Development in Information Retrieval (Melbourne, Australia). ACM, USA, 282--289. Google ScholarDigital Library
- W.B. Croft and D. Harper. 1979. Using Probabilistic Models of Document Retrieval without Relevance Information. Journal of Documentation 35, 4 (1979), 285--295.Google ScholarCross Ref
- C. Davidson-Pilon. 2015. Probabilistic Programming & Bayesian Methods for Hackers. Addison-Wesley Data & Analytics Series. http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/Google Scholar
- O. Granmo. 2008. A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems. In Proc. of Seventh Int. Conference on Machine Learning and Applications (ICMLA '08). 23--30. Google ScholarDigital Library
- L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein. 2003. Categorizing web queries according to geographical locality. In Proceedings of the twelfth international conference on Information and knowledge management. ACM, 325--333.Google Scholar
- K. Hofmann, S. Whiteson, and M. de Rijke. 2011. Contextual Bandits for Information Retrieval. In NIPS 2011 Workshop on Bayesian Optimization, Experimental Design, and Bandits. Granada.Google Scholar
- B. Jansen, D. Booth, and A. Spink. 2007. Determining the user intent of web search engine queries. In Proceedings of the 16th international conference on World Wide Web. ACM, 1149--1150.Google Scholar
- G. Jayasinghe, W. Webber, M. Sanderson, and J. Culpepper. 2014. Extending Test Collection Pools Without Manual Runs. In Proc. of the 37th Int. ACM SIGIR Conference on Research and Development in Information Retrieval (Gold Coast, Queensland, Australia) (SIGIR '14). ACM, New York, NY, USA, 915--918. Google ScholarDigital Library
- M. Karimzadehgan and C. Zhai. 2013. A learning approach to optimizing exploration-exploitation tradeoff in relevance feedback. Inf. Retr. 16, 3 (2013), 307--330. http://dblp.uni-trier.de/db/journals/ir/ir16.html#KarimzadehganZ13Google ScholarDigital Library
- D. Lewis and W. Gale. 1994. A sequential algorithm for training text classifiers. In Proc. of the 17th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval. 3--12.Google Scholar
- Y. Li, Z. Zheng, and H. Dai. 2005. KDD CUP-2005 report: facing a great challenge. SIGKDD Explorations 7 (01 2005), 91--99.Google Scholar
- D.E. Losada, J. Parapar, and A. Barreiro. 2016. Feeling lucky?: multi-armed bandits for ordering judgements in pooling-based evaluation. In Proceedings of the 31st annual ACM symposium on applied computing. ACM, 1027--1034.Google Scholar
- D.E. Losada, J. Parapar, and A. Barreiro. 2017. Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems. Information Processing Management 53, 5 (2017), 1005 -- 1025. Google ScholarCross Ref
- D.E. Losada, J. Parapar, and A. Barreiro. 2019. When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections. Journal of the Association for Information Science and Technology 70, 1 (2019), 49--60. arXiv:https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/asi.24077 Google ScholarDigital Library
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. http://arxiv.org/abs/1301.3781Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111--3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdfGoogle Scholar
- F. Modave, N. Shokar, E. Peñaranda, and N. Nguyen. 2014. Analysis of the accuracy of weight loss information search engine results on the internet. American journal of public health 104, 10 (2014), 1971--1978.Google Scholar
- A. Moffat, W. Webber, and J. Zobel. 2007. Strategic System Comparisons via Targeted Relevance Judgments. In Proc. 30th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval (Amsterdam, The Netherlands). ACM, NY, USA, 375--382. Google ScholarDigital Library
- M. Pagliardini, P. Gupta, and M. Jaggi. 2018. Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 528--540. Google ScholarCross Ref
- F. Radlinski, A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, and L. Riedel. 2008. Optimizing relevance and revenue in ad search: a query substitution approach. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 403--410.Google Scholar
- F. Radlinski, R. Kleinberg, and T. Joachims. 2008. Learning Diverse Rankings with Multi-armed Bandits. In Proc. of the 25th Int. Conference on Machine Learning (Helsinki, Finland) (ICML '08). ACM, New York, NY, USA, 784--791. Google ScholarDigital Library
- S. Robertson and H. Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval 3, 4 (2009), 333--389. Google ScholarDigital Library
- P. Scullard, C. Peacock, and P. Davies. 2010. Googling children's health: reliability of medical advice on the internet. Archives of disease in childhood 95, 8 (2010), 580--582.Google Scholar
- D. Shen, R. Pan, J-T. Sun, J.J. Pan, K. Wu, J. Yin, and Q. Yang. 2006. Query Enrichment for Web-query Classification. ACM Trans. Inf. Syst. 24, 3 (July 2006), 320--352. Google ScholarDigital Library
- D. Shen, J-T. Sun, Q. Yang, and Z. Chen. 2006. Building Bridges for Web Query Classification. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, Washington, USA) (SIGIR '06). ACM, New York, NY, USA, 131--138. Google ScholarDigital Library
- M. Sloan and J. Wang. 2012. Dynamical Information Retrieval Modelling: A Portfolio-armed Bandit Machine Approach. In Proc. of the 21st Int. Conf. Companion on World Wide Web (Lyon, France). ACM, USA, 603--604. Google ScholarDigital Library
- K. Sparck-Jones. 1971. Automatic keyword classification for information retrieval. Butterworths.Google Scholar
- K. Sparck-Jones and C.J. Van Rijsbergen. 1975. Report on the Need for and Provision of an Ideal Information Retrieval Test Collection. Cambridge: University Computer Laboratory (1975).Google Scholar
- R. Sutton and A. Barto. 2018. Reinforcement learning: An introduction. MIT press.Google Scholar
- E. Voorhees. 2002. The Philosophy of Information Retrieval Evaluation. In Proc. of 2nd Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems. Berlin, Heidelberg, 355--370.Google ScholarCross Ref
- E. Voorhees and D. Harman. 2005. TREC: Experiment and Evaluation in Information Retrieval. The MIT Press.Google ScholarDigital Library
- Y. Yue and T. Joachims. 2009. Interactively Optimizing Information Retrieval Systems As a Dueling Bandits Problem. In Proc. of the 26th Annual Int. Conference on Machine Learning (Montreal, Quebec, Canada) (ICML '09). ACM, NY, USA, 1201--1208. Google ScholarDigital Library
- Cost-effective identification of on-topic search queries using multi-armed bandits
Recommendations
Topic analysis for topic-focused multi-document summarization
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementTopic-focused multi-document summarization has been a challenging task because the created summary is required to be biased to the given topic or query. Existing methods consider the given topic as a single coarse unit and then directly incorporate the ...
Diversifying search results of controversial queries
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementDiversifying search results of queries seeking for different view points about controversial topics is key to improving satisfaction of users. The challenge for finding different opinions is how to maximize the number of discussed arguments without ...
Discovering search engine related queries using association rules
This work presents a method for online generation of query related suggestions for a Web search engine. The method uses association rules to extract related queries from the log of sbumitted queries to the search engine. Experimental results were ...
Comments