skip to main content
10.1145/3488560.3498516acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

A Cooperative Neural Information Retrieval Pipeline with Knowledge Enhanced Automatic Query Reformulation

Published: 15 February 2022 Publication History

Abstract

This paper presents a neural information retrieval pipeline that integrates cooperative learning of query reformulation and neural retrieval models. Our pipeline first exploits an automatic query reformulator to reformulate the user-issued query and then submits the reformulated query to the neural retrieval model. We simultaneously optimize the quality of reformulated queries and ranking performance with an alternate training strategy where query reformulator and neural retrieval model learn from the feedback of each other. Besides, we incorporate knowledge information into automatic query reformulation. The reformulated queries are further improved and contribute to a better ranking performance of the following neural retrieval model. We study two representative neural retrieval models KNRM and BERT in our pipeline. Experiments on two datasets show that our pipeline consistently improves the retrieval performance of the original neural retrieval models while only increases negligible time on automatic query reformulation.

Supplementary Material

MP4 File (WSDM22-fp787.mp4)
This paper presents a neural information retrieval pipeline that integrates cooperative learning of query reformulation and neural retrieval models. Our pipeline first exploits an automatic query reformulator to reformulate the user-issued query and then submits the reformulated query to the neural retrieval model. We simultaneously optimize the quality of reformulated queries and ranking performance with an alternate training strategy where query reformulator and neural retrieval model learn from the feedback of each other. Besides, we incorporate knowledge information into automatic query reformulation. The reformulated queries are further improved and contribute to a better ranking performance of the following neural retrieval model. We study two representative neural retrieval models KNRM and BERT in our pipeline. Experiments on two datasets show that our pipeline consistently improves the retrieval performance of the original neural retrieval models while only increases negligible time on automatic query reformulation.

References

[1]
Nasreen Abdul-Jaleel, James Allan, W Bruce Croft, Fernando Diaz, Leah Larkey, Xiaoyan Li, Mark D Smucker, and Courtney Wade. 2004. UMass at TREC 2004: Novelty and HARD. Computer Science Department Faculty Publication Series (2004), 189.
[2]
Nicolas Usunier Alberto Garcia-Duran JasonWeston Bordes, Antoine and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems. 2787--2795.
[3]
Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. 2008. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 243--250.
[4]
Jia Chen, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. TianGong- ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions. In Proceedings of the 28th ACM International on Conference on Information and Knowledge Management. ACM, 2485--2488.
[5]
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. San Rafael: Morgan and Claypool.
[6]
WBruce Croft, Stephen Cronen-Townsend, and Victor Lavrenko. 2001. Relevance Feedback and Personalization: A Language Modeling Perspective. In DELOS. Citeseer.
[7]
Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with contextual neural language modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 985--988.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[9]
Tamas E Doszkocs. 1978. AID, an associative interactive dictionary for online searching. Online Review (1978).
[10]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining.
[11]
David J Harper and Cornelis Joost Van Rijsbergen. 1978. An evaluation of feedback in document retrieval using co-occurrence data. Journal of documentation (1978).
[12]
Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2017. Entity Linking in Queries: Efficiency vs. Effectiveness. Springer, Cham.
[13]
Yunlong He, Jiliang Tang, Hua Ouyang, Changsung Kang, Dawei Yin, and Yi Chang. 2016. Learning to rewrite queries. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1443--1452.
[14]
Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems. 2042--2050.
[15]
Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020. Embeddingbased retrieval in facebook search. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2553--2561.
[16]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2333--2338.
[17]
Xiangsheng Li, Yiqun Liu, Xin Li, Cheng Luo, Jian-Yun Nie, Min Zhang, and Shaoping Ma. 2018. Hierarchical Attention Network for Context-Aware Query Suggestion. In Asia Information Retrieval Symposium. Springer, 173--186.
[18]
Xiangsheng Li, Yiqun Liu, Jiaxin Mao, Zexue He, Min Zhang, and Shaoping Ma. 2018. Understanding Reading Attention Distribution during Relevance Judgement. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 733--742.
[19]
Zhenghao Liu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2018. Entity-duet neural ranking: Understanding the role of knowledge graph semantics in neural information retrieval. arXiv preprint arXiv:1805.07591 (2018).
[20]
Claudio Lucchese, Franco Maria Nardini, Rama Kumar Pasumarthi, Sebastian Bruch, Michael Bendersky, Xuanhui Wang, Harrie Oosterhuis, Rolf Jagerman, and Maarten de Rijke. 2019. Learning to Rank in Theory and Practice: From Gradient Boosting to Neural Networks and Unbiased Learning. In SIGIR 2019: 42nd international ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 1419--1420.
[21]
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39--41.
[22]
Bhaskar Mitra and Nick Craswell. 2017. Neural models for information retrieval. arXiv preprint arXiv:1705.01509 (2017).
[23]
Rodrigo Nogueira and Kyunghyun Cho. 2017. Task-oriented query reformulation with reinforcement learning. arXiv preprint arXiv:1704.04572 (2017).
[24]
Yifan Qiao, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. 2019. Understanding the Behaviors of BERT in Ranking. arXiv preprint arXiv:1904.07531 (2019).
[25]
Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR'94.
[26]
Joseph Rocchio. 1971. Relevance feedback in information retrieval. The Smart retrieval system-experiments in automatic document processing (1971), 313--323.
[27]
Nikos Voskarides, Dan Li, Pengjie Ren, Evangelos Kanoulas, and Maarten de Rijke. 2020. Query resolution for conversational search with limited supervision. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 921--930.
[28]
Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating non-sequential behavior into click models. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 283--292.
[29]
ZhigangWang, Juanzi Li, ZhichunWang, Shuangjie Li, Mingyang Li, Dongsheng Zhang, Yao Shi, Yongbin Liu, Peng Zhang, and Jie Tang. 2013. XLore: A Large- Scale English-Chinese Bilingual Knowledge Graph. In Proceedings of the 12th International Semantic Web Conference. 121--124.
[30]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256.
[31]
Yiqun Liu Jiaxin Mao Weizhi Ma Min Zhang Xiangsheng Li, Maarten de Rijke and Shaoping Ma. 2020. Learning Better Representations for Neural Information Retrieval with Graph Information. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. ACM.
[32]
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conferenc. ACM, 55--64.
[33]
Chenyan Xiong, Zhengzhong Liu, Jamie Callan, and Tieyan Liu. 2018. Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling. (2018), 575--584.
[34]
Jinxi Xu and W Bruce Croft. 2017. Quary expansion using local and global document analysis. In Acm sigir forum, Vol. 51. ACM New York, NY, USA, 168-- 175.
[35]
Liu Yang, Minghui Qiu, Chen Qu, Jiafeng Guo, Yongfeng Zhang, W Bruce Croft, Jun Huang, and Haiqing Chen. 2018. Response ranking with deep matching networks and external knowledge in information-seeking conversation systems. In The 41st international acm sigir conference on research & development in information retrieval. 245--254.
[36]
Shi Yu, Zhenghao Liu, Chenyan Xiong, Tao Feng, and Zhiyuan Liu. 2021. Few- Shot Conversational Dense Retrieval. arXiv preprint arXiv:2105.04166 (2021).
[37]
Hamed Zamani, Javid Dadashkarimi, Azadeh Shakery, and W Bruce Croft. 2016. Pseudo-relevance feedback based on matrix factorization. In Proceedings of the 25th ACM international on conference on information and knowledge management. 1483--1492.
[38]
Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS) 22, 2 (2004), 179--214.
[39]
Kaitao Zhang, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. 2020. Selective weak supervision for neural information retrieval. In Proceedings of The Web Conference 2020. 474--485.
[40]
Yukun Zheng, Zhen Fan, Yiqun Liu, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Sogou-qcl: A new dataset with click relevance label. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1117-- 1120.

Cited By

View all
  • (2024)Automatic Query Generation Based on Adaptive Naked Mole-Rate AlgorithmMultimedia Tools and Applications10.1007/s11042-024-19492-2Online publication date: 27-Jun-2024
  • (2023)Boosting legal case retrieval by query content selection with large language modelsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625328(176-184)Online publication date: 26-Nov-2023
  • (2023)Automatic Synonym Extraction and Context-based Query Reformulation for Points-of-Interest Search2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00235(3072-3078)Online publication date: Apr-2023
  • Show More Cited By

Index Terms

  1. A Cooperative Neural Information Retrieval Pipeline with Knowledge Enhanced Automatic Query Reformulation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
    February 2022
    1690 pages
    ISBN:9781450391320
    DOI:10.1145/3488560
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 February 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. knowledge graph
    2. neural ir
    3. query reformulation

    Qualifiers

    • Research-article

    Funding Sources

    • Beijing Outstanding Young Scientist Program
    • National Key Research and Development Program of China
    • Natural Science Foundation of China

    Conference

    WSDM '22

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)145
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Automatic Query Generation Based on Adaptive Naked Mole-Rate AlgorithmMultimedia Tools and Applications10.1007/s11042-024-19492-2Online publication date: 27-Jun-2024
    • (2023)Boosting legal case retrieval by query content selection with large language modelsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625328(176-184)Online publication date: 26-Nov-2023
    • (2023)Automatic Synonym Extraction and Context-based Query Reformulation for Points-of-Interest Search2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00235(3072-3078)Online publication date: Apr-2023
    • (2022)HIAE: Hyper-Relational Interaction Aware Embedding for Link Prediction2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI56018.2022.00058(355-360)Online publication date: Oct-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media