Contextualized query expansion via unsupervised chunk selection for text retrieval
Introduction
In an information retrieval (IR) system, the documents are ranked in descending order according to their relevance relative to the given query. Recent advances have shown superior performance gain on ad-hoc text retrieval tasks by utilizing large-scale pre-trained transformer-based language models to evaluate the relevance between individual query-document pairs as in BERT-based (Devlin, Chang, Lee, & Toutanova, 2019) re-rankers, improving upon classical IR models by a wide margin on different benchmarks (Dai and Callan, 2019, Li et al., 2020, Nogueira and Cho, 2019, Yilmaz et al., 2019). However, the vocabulary mismatches between the query and the document, due to their different brevity, length, and even format, make the relevance evaluation sub-optimal. Different query expansion methods have been proposed to address the mismatch by exploiting the pseudo relevance feedback (PRF) to expand the query (Amati, 2003, Lavrenko and Croft, 2001, Metzler and Croft, 2007, Rocchio, 1971) before evaluating the relevance of a query-document pair. For the expansion selection, the existing works either rely on words as in RM3 (Lavrenko & Croft, 2001) and KL (Amati, 2003), or phrases for expansion (Metzler & Croft, 2007). In the context of neural approaches, the recent neural PRF architecture (Li et al., 2018, Wang, Luo et al., 2020) uses entire feedback documents for expansion. However, as have been shown in Padaki, Dai, and Callan (2020), the existing expansion methods may not be directly applicable to the BERT-based ranking models. Actually, Padaki et al. (2020) demonstrate that the use of RM3 (Lavrenko & Croft, 2001) on top of a fine-tuned BERT model significantly dampens the ranking quality, highlighting the difficulties to perform query expansion for BERT-based ranking models. Besides, the reliance on the PRF information makes the existing expansion methods more prone to the non-relevant information from the feedback documents, which could pollute the query and lead to topic shift (Macdonald & Ounis, 2007). Moreover, the selected expansions in the existing expansion methods are either too short, e.g., the words (Amati, 2003, Lavrenko and Croft, 2001), or too long, e.g., the entire document (Li et al., 2018, Wang, Luo et al., 2020), to introduce relevant information without bringing in non-relevant information.
To mitigate such gap, in this work, we propose a novel query expansion model, coined as BERT-QE, for BERT-based ranking models particularly. The proposed BERT-QE model unsupervisedly selects the relevant information from PRF documents in the form of sentences, providing more flexibility for the expansion granularity. Furthermore, a novel architecture is proposed to re-weight the relevance of individual documents using the selected expansions on top of the BERT-based ranker, achieving superior performance by overcoming the topic shift problems. In particular, given a query and a list of feedback documents from an initial ranking (e.g., from BM25), we propose to re-rank the documents in three sequential phases, as illustrated in Fig. 1. In phase one, the documents are re-ranked with a fine-tuned BERT model and the top-ranked documents are used as PRF documents; in phase two, these PRF documents are decomposed into text chunks with fixed length (e.g., 10), and the relevance of individual chunks are evaluated; finally, to assess the relevance of a given document, the selected chunks and original query are used to score the document jointly. We release the source code and related resources for reproducibility.2 The contributions of this work are as follows:
- 1.
A novel query expansion model is proposed to exploit the strength of contextualized model BERT in identifying relevant information from feedback documents.
- 2.
Evaluation on two standard TREC test collections, namely, Robust04 and GOV2, demonstrates that the proposed BERT-QE-LLL could advance the performance of BERT-Large significantly on both shallow and deep pool, when using BERT-Large in all three phases.
- 3.
Several novel components are proposed on top of BERT-QE to better trade-off the efficiency against effectiveness. In particular, we investigate the uses of smaller BERT components for different phases and demonstrate that, with a smaller variant of BERT-QE, e.g., BERT-QE-LMT, one could outperform BERT-Large significantly on shallow pool with no more than 1% extra computational cost; meanwhile, a larger variant, e.g., BERT-QE-LLS, could significantly outperform BERT-Large on both shallow and deep pool with 2% more computations. Besides, we also propose two novel building blocks for phase two and phase three to further improve the efficiency.
This paper is an extension of a previous paper appeared in Findings of ACL: EMNLP 2020 (Zheng et al., 2020). In this paper, we extend the previous version by:
- 1.
Two novel alternatives designs have been proposed for phase two and phase three, providing better trade-off between effectiveness and efficiency.
- 2.
The efficiency is more thoroughly investigated especially in terms of document ranking, which has not been discussed in the preliminary version.
- 3.
More analyses have been included to provide a better understanding of the proposed method, as well as the results.
- 4.
Conducting a detailed analysis for the configurations of fine-tuning the first-round BERT re-ranker, which is crucial for the document ranking task.
The remainder of this paper is organized as follows. In Section 2, we recap the related works. In Section 3, we describe the proposed BERT-QE model and methods to trade-off the efficiency against effectiveness. Experimental settings and evaluation results are introduced in Sections 4 Experimental setup, 5 Results . We conduct further experimental analyses in Section 6, before concluding this work in Section 7.
Section snippets
BERT for IR
In the past few years, based on Word2vec (Mikolov, Sutskever, Chen, Corrado, & Dean, 2013) or GloVe word embeddings (Pennington, Socher, & Manning, 2014), neural ranking models such as DRMM (Guo, Fan, Ai, & Croft, 2016), PACRR (Hui, Yates, Berberich, & de Melo, 2017), ADRM (Liu, Li et al., 2020), and KNRM (Xiong, Dai, Callan, Liu, & Power, 2017) have shown the ability in improving upon classical probabilistic retrieval approaches. More recently, inspired by the success of contextualized models
Method
In this section we describe BERT-QE, which takes a query and a ranked list of documents for this query as input (e.g., from an unsupervised ranking model) and outputs a re-ranked list based on the expanded queries. The proposed BERT-QE performs query expansion unsupervisedly, and thus can be used on top of any BERT-based ranking models.
Dataset and metrics
Akin to Guo et al. (2016) and Yilmaz et al. (2019), we use two representative ad-hoc retrieval datasets: Robust04 (Voorhees, 2004) and GOV2 (Clarke, Craswell, & Soboroff, 2004). Robust04 is a newswire collection with 249 queries and 528,155 news articles used for TREC Robust Track 2004. GOV2 is a Web collection used for TREC Terabyte Tracks 2004, 2005 and 2006, consisting of 150 queries and 25,205,179 documents crawled from government websites. For both datasets, we employ the title-only
Results
In this section, we report results for the proposed BERT-QE model and compare them to the baseline models. We consider the following research questions:
- •
RQ1: Can BERT-QE outperform baseline models in terms of effectiveness, especially compared with those based on BERT-Large? (Section 5.1)
- •
RQ2: Is BERT-QE still effective when using smaller BERT building blocks? (Section 5.2)
- •
RQ3: Is BERT-QE still effective when using two more efficient alternatives, namely, the Late Interaction (LI) for phase two
Analysis
Conclusion
We propose a novel expansion model, coined as BERT-QE, to better select relevant information for query expansion. The proposed BERT-QE consists of three phases. In phase one, we perform the first-round re-ranking using a fine-tuned BERT model. In phase two and three, we use the BERT model to unsupervisedly select expansion chunks and use them to re-evaluate the relevance scores of documents. Besides, in order to trade-off the efficiency and the effectiveness, we explore two different methods in
CRediT authorship contribution statement
Zhi Zheng: Methodology, Investigation, Software, Validation, Writing - original draft. Kai Hui: Methodology, Writing - original draft, Writing - review & editing. Ben He: Conceptualization, Writing - review & editing. Xianpei Han: Conceptualization, Methodology. Le Sun: Supervision. Andrew Yates: Writing - review & editing.
Acknowledgment
This work is supported by the University of Chinese Academy of Sciences .
References (51)
- et al.
Query expansion techniques for information retrieval: A survey
Information Processing & Management
(2019) - et al.
A deep relevance matching model for ad-hoc retrieval
- et al.
A knowledge-based semantic framework for query expansion
Information Processing & Management
(2019) - et al.
An end-to-end pseudo relevance feedback framework for neural document retrieval
Information Processing & Management
(2020) - et al.
A Pseudo-relevance feedback framework combining relevance matching and semantic matching for information retrieval
Information Processing & Management
(2020) Probability models for information retrieval based on divergence from randomness
(2003)- et al.
FUB, IASI-CNR and university of tor vergata at TREC 2007 blog track
- et al.
Selecting good expansion terms for pseudo-relevance feedback
- et al.
Retrievability based document selection for relevance feedback with automatically generated query variants
- et al.
Overview of the TREC 2004 terabyte track
Deeper text understanding for IR with contextual neural language modeling
BERT: pre-training of deep bidirectional transformers for language understanding
Query expansion with locally-trained word embeddings
Modularized transfomer-based ranking framework
A comparative study of pseudo relevance feedback for ad-hoc retrieval
PACRR: a position-aware neural IR model for relevance matching
Co-PACRR: A context-aware neural IR model for ad-hoc retrieval
Dense passage retrieval for open-domain question answering
Colbert: Efficient and effective passage search via contextualized late interaction over BERT
Relevance-based language models
NPRF: a neural pseudo relevance feedback framework for ad-hoc information retrieval
PARADE: passage representation aggregation for document reranking
An attention-based deep relevance model for few-shot document filtering
ACM Transactions on Information Systems
FastBERT: a self-distilling BERT with adaptive inference time
Cited by (18)
NLP-based approach for automated safety requirements information retrieval from project documents
2024, Expert Systems with ApplicationsA Social-aware Gaussian Pre-trained model for effective cold-start recommendation
2024, Information Processing and ManagementDealing with textual noise for robust and effective BERT re-ranking
2023, Information Processing and ManagementCitation Excerpt :Text ranking is a key task for many natural language processing applications, such as web search (Zheng et al., 2021), text summarization (Narayan, Cohen, & Lapata, 2018) and question answering (Kratzwald, Eigenmann, & Feuerriegel, 2019).
A semantically enhanced text retrieval framework with abstractive summarization
2024, Computational Intelligence
- 1
This work was performed before joining Amazon.