research-article

Cost-effective identification of on-topic search queries using multi-armed bandits

Authors:
David E. Losada

Universidade de Santiago de Compostela, Spain

Universidade de Santiago de Compostela, Spain
View Profile

,
Matthias Herrmann

University of Regensburg, Germany

University of Regensburg, Germany
View Profile

,
David Elsweiler

University of Regensburg, Germany

University of Regensburg, Germany
View Profile

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied ComputingMarch 2021Pages 645–654https://doi.org/10.1145/3412841.3441944

Published:22 April 2021Publication History

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

Pages 645–654

ABSTRACT

Identifying the topic of a search query is a challenging problem, for which a solution would be valuable in diverse situations. In this work, we formulate the problem as a ranking task where various rankers order queries in terms of likelihood of being related to a specific topic of interest. In doing so, an explore-exploit trade-off is established whereby exploiting effective rankers may result in more on-topic queries being discovered, but exploring weaker rankers might also offer value for the overall judgement process. We show empirically that multi-armed bandit algorithms can utilise signals from divergent query rankers, resulting in improved performance in extracting on-topic queries. In particular we find Bayesian non-stationary approaches to offer high utility. We explain why the results offer promise for several use-cases both within the field of information retrieval and for data-driven science, generally.

References

J. Allan, D. Harman, E. Kanoulas, D. Li, C. Van Gysel, and E. Voorhees. 2017. TREC 2017 Common Core Track Overview. In Proceedings of The Twenty-Sixth Text REtrieval Conference, TREC 2017, Gaithersburg, Maryland, USA, November 15--17, 2017. https://trec.nist.gov/pubs/trec26/papers/Overview-CC.pdfGoogle Scholar
G. Amati. 2003. Probability models for information retrieval based on divergence from randomness. Ph.D. Dissertation. University of Glasgow.Google Scholar
P. Auer, N. Cesa-Bianchi, and P. Fischer. 2002. Finite-time analysis of the multi-armed bandit problem. Machine learning 47, 2--3 (2002), 235--256.Google Scholar
S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kolcz. 2005. Improving automatic query classification via semi-supervised learning. In Fifth IEEE International Conference on Data Mining (ICDM'05). IEEE, 8--pp.Google Scholar
M. Bernstein, J. Teevan, S. Dumais, D. Liebling, and E. Horvitz. 2012. Direct answers for search queries in the long tail. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 237--246.Google Scholar
C. Buckley, D. Dimmick, I. Soboroff, and E. Voorhees. 2007. Bias and the Limits of Pooling for Large Collections. Inf. Retr. 10, 6 (Dec. 2007), 491--508. Google ScholarDigital Library
J. Callan. 2012. The Lemur project and its CLUEWEB12 dataset. In Invited talk at the SIGIR 2012 Workshop on Open-Source Information Retrieval.Google Scholar
L. Chilton and J. Teevan. 2011. Addressing people's information needs directly in a web search result page. In Proceedings of the 20th international conference on World wide web. ACM, 27--36.Google Scholar
M. Chung, R. Oden, B. Joyner, A. Sims, and R. Moon. 2012. Safe infant sleep recommendations on the Internet: let's Google it. The Journal of pediatrics 161, 6 (2012), 1080--1084.Google ScholarCross Ref
G. Cormack and T. Lynam. 2007. Power and Bias of Subset Pooling Strategies. In Proc. of the 30th Annual Int. Conf. on Research and Development in Information Retrieval (Amsterdam, The Netherlands). ACM, USA, 837--838. Google ScholarDigital Library
G. Cormack, C. Palmer, and C. Clarke. 1998. Efficient Construction of Large Test Collections. In Proc. of the 21st Annual Int. Conf. on Research and Development in Information Retrieval (Melbourne, Australia). ACM, USA, 282--289. Google ScholarDigital Library
W.B. Croft and D. Harper. 1979. Using Probabilistic Models of Document Retrieval without Relevance Information. Journal of Documentation 35, 4 (1979), 285--295.Google ScholarCross Ref
C. Davidson-Pilon. 2015. Probabilistic Programming & Bayesian Methods for Hackers. Addison-Wesley Data & Analytics Series. http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/Google Scholar
O. Granmo. 2008. A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems. In Proc. of Seventh Int. Conference on Machine Learning and Applications (ICMLA '08). 23--30. Google ScholarDigital Library
L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein. 2003. Categorizing web queries according to geographical locality. In Proceedings of the twelfth international conference on Information and knowledge management. ACM, 325--333.Google Scholar
K. Hofmann, S. Whiteson, and M. de Rijke. 2011. Contextual Bandits for Information Retrieval. In NIPS 2011 Workshop on Bayesian Optimization, Experimental Design, and Bandits. Granada.Google Scholar
B. Jansen, D. Booth, and A. Spink. 2007. Determining the user intent of web search engine queries. In Proceedings of the 16th international conference on World Wide Web. ACM, 1149--1150.Google Scholar
G. Jayasinghe, W. Webber, M. Sanderson, and J. Culpepper. 2014. Extending Test Collection Pools Without Manual Runs. In Proc. of the 37th Int. ACM SIGIR Conference on Research and Development in Information Retrieval (Gold Coast, Queensland, Australia) (SIGIR '14). ACM, New York, NY, USA, 915--918. Google ScholarDigital Library
M. Karimzadehgan and C. Zhai. 2013. A learning approach to optimizing exploration-exploitation tradeoff in relevance feedback. Inf. Retr. 16, 3 (2013), 307--330. http://dblp.uni-trier.de/db/journals/ir/ir16.html#KarimzadehganZ13Google ScholarDigital Library
D. Lewis and W. Gale. 1994. A sequential algorithm for training text classifiers. In Proc. of the 17th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval. 3--12.Google Scholar
Y. Li, Z. Zheng, and H. Dai. 2005. KDD CUP-2005 report: facing a great challenge. SIGKDD Explorations 7 (01 2005), 91--99.Google Scholar
D.E. Losada, J. Parapar, and A. Barreiro. 2016. Feeling lucky?: multi-armed bandits for ordering judgements in pooling-based evaluation. In Proceedings of the 31st annual ACM symposium on applied computing. ACM, 1027--1034.Google Scholar
D.E. Losada, J. Parapar, and A. Barreiro. 2017. Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems. Information Processing Management 53, 5 (2017), 1005 -- 1025. Google ScholarCross Ref
D.E. Losada, J. Parapar, and A. Barreiro. 2019. When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections. Journal of the Association for Information Science and Technology 70, 1 (2019), 49--60. arXiv:https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/asi.24077 Google ScholarDigital Library
T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. http://arxiv.org/abs/1301.3781Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111--3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdfGoogle Scholar
F. Modave, N. Shokar, E. Peñaranda, and N. Nguyen. 2014. Analysis of the accuracy of weight loss information search engine results on the internet. American journal of public health 104, 10 (2014), 1971--1978.Google Scholar
A. Moffat, W. Webber, and J. Zobel. 2007. Strategic System Comparisons via Targeted Relevance Judgments. In Proc. 30th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval (Amsterdam, The Netherlands). ACM, NY, USA, 375--382. Google ScholarDigital Library
M. Pagliardini, P. Gupta, and M. Jaggi. 2018. Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 528--540. Google ScholarCross Ref
F. Radlinski, A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, and L. Riedel. 2008. Optimizing relevance and revenue in ad search: a query substitution approach. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 403--410.Google Scholar
F. Radlinski, R. Kleinberg, and T. Joachims. 2008. Learning Diverse Rankings with Multi-armed Bandits. In Proc. of the 25th Int. Conference on Machine Learning (Helsinki, Finland) (ICML '08). ACM, New York, NY, USA, 784--791. Google ScholarDigital Library
S. Robertson and H. Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval 3, 4 (2009), 333--389. Google ScholarDigital Library
P. Scullard, C. Peacock, and P. Davies. 2010. Googling children's health: reliability of medical advice on the internet. Archives of disease in childhood 95, 8 (2010), 580--582.Google Scholar
D. Shen, R. Pan, J-T. Sun, J.J. Pan, K. Wu, J. Yin, and Q. Yang. 2006. Query Enrichment for Web-query Classification. ACM Trans. Inf. Syst. 24, 3 (July 2006), 320--352. Google ScholarDigital Library
D. Shen, J-T. Sun, Q. Yang, and Z. Chen. 2006. Building Bridges for Web Query Classification. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, Washington, USA) (SIGIR '06). ACM, New York, NY, USA, 131--138. Google ScholarDigital Library
M. Sloan and J. Wang. 2012. Dynamical Information Retrieval Modelling: A Portfolio-armed Bandit Machine Approach. In Proc. of the 21st Int. Conf. Companion on World Wide Web (Lyon, France). ACM, USA, 603--604. Google ScholarDigital Library
K. Sparck-Jones. 1971. Automatic keyword classification for information retrieval. Butterworths.Google Scholar
K. Sparck-Jones and C.J. Van Rijsbergen. 1975. Report on the Need for and Provision of an Ideal Information Retrieval Test Collection. Cambridge: University Computer Laboratory (1975).Google Scholar
R. Sutton and A. Barto. 2018. Reinforcement learning: An introduction. MIT press.Google Scholar
E. Voorhees. 2002. The Philosophy of Information Retrieval Evaluation. In Proc. of 2nd Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems. Berlin, Heidelberg, 355--370.Google ScholarCross Ref
E. Voorhees and D. Harman. 2005. TREC: Experiment and Evaluation in Information Retrieval. The MIT Press.Google ScholarDigital Library
Y. Yue and T. Joachims. 2009. Interactively Optimizing Information Retrieval Systems As a Dueling Bandits Problem. In Proc. of the 26th Annual Int. Conference on Machine Learning (Montreal, Quebec, Canada) (ICML '09). ACM, NY, USA, 1201--1208. Google ScholarDigital Library

Cost-effective identification of on-topic search queries using multi-armed bandits
1. Information systems

Recommendations

Topic analysis for topic-focused multi-document summarization
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Topic-focused multi-document summarization has been a challenging task because the created summary is required to be biased to the given topic or query. Existing methods consider the given topic as a single coarse unit and then directly incorporate the ...
Read More
Diversifying search results of controversial queries
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Diversifying search results of queries seeking for different view points about controversial topics is key to improving satisfaction of users. The challenge for finding different opinions is how to maximize the number of discussed arguments without ...
Read More
Discovering search engine related queries using association rules

This work presents a method for online generation of query related suggestions for a Web search engine. The method uses association rules to extract related queries from the log of sbumitted queries to the search engine. Experimental results were ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing
March 2021
2075 pages
ISBN:9781450381048
DOI:10.1145/3412841
Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Jiman Hong
Soongsil University, South Korea
,
Program Chairs:
Alessio Bechini
University of Pisa, Italy
,
Eunjee Song
Baylor University
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 April 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 46
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cost-effective identification of on-topic search queries using multi-armed bandits

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Recommendations

Topic analysis for topic-focused multi-document summarization

Diversifying search results of controversial queries

Discovering search engine related queries using association rules

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cost-effective identification of on-topic search queries using multi-armed bandits

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Recommendations

Topic analysis for topic-focused multi-document summarization

Diversifying search results of controversial queries

Discovering search engine related queries using association rules

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media