ABSTRACT
Query expansion is commonly used to combat the vocabulary mismatch problem, it bridges the disparity between the vocabulary used in the corpus and search queries. However, if expansion terms are not chosen carefully, there is a risk of including spurious expansion terms, which can broaden the potential interpretations of the modified query. Unintentionally increasing the semantic ambiguity in this way is known as query drift.
In this short paper we propose using the query context to inform the expansion term selection process. Using WordNet as an initial source of expansion terms, we refine the candidate expansions by discriminating relevancy. We found that our term selection process is more effective than the standard approach. Our technique targets terms which relate to the entire query as a whole, but predominately focuses on excluding spurious expansion terms. Both help reduce query drift and increase query performance.
- Jing Bai, Dawei Song, Peter Bruza, Jian-Yun Nie, and Guihong Cao. 2005. Query Expansion Using Term Relationships in Language Models for Information Retrieval. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM '05). ACM, New York, NY, USA, 688--695. Google ScholarDigital Library
- Claudio Carpineto, Renato de Mori, Giovanni Romano, and Brigitte Bigi. 2001. An Information-theoretic Approach to Automatic Query Expansion. ACM Trans. Inf. Syst. 19, 1 (Jan. 2001), 1--27. Google ScholarDigital Library
- Claudio Carpineto and Giovanni Romano. 2012. A Survey of Automatic Query Expansion in Information Retrieval. ACM Comput. Surv. 44, 1, Article 1 (Jan. 2012), 50 pages. Google ScholarDigital Library
- Reuben Crimp and Andrew Trotman. 2017. Automatic Term Reweighting for Query Expansion. In Proceedings of the 22Nd Australasian Document Computing Symposium (ADCS 2017). ACM, New York, NY, USA, Article 3, 4 pages. Google ScholarDigital Library
- Tamas E. Doszkocs. 1978. AID, An Associative Interactive Dictionary for Online Searching. Online Information Review 2 (12 1978), 163--173.Google Scholar
- David S. Johnson, Maria Minkoff, and Steven Phillips. 2000. The Prize Collecting Steiner Tree Problem: Theory and Practice. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '00). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 760--769. Google ScholarDigital Library
- Prashanti Manda and Todd Vision. 2018. An analysis and comparison of the statistical sensitivity of semantic similarity metrics. bioRxiv (2018).Google Scholar
- G. A. Miller. 1995. WordNet: A Lexical Database for English. CACM 38, 11 (1995), 39--41. Google ScholarDigital Library
- S.E. Robertson, S. Walker, S. Jones, M.M. Hancock-Beaulieu, and M. Gatford. 1996. Okapi at TREC-3. 109--126.Google Scholar
- S.E. Robertson. 1991. On Term Selection for Query Expansion. J. Doc. 46, 4 (Jan. 1991), 359--364. Google ScholarDigital Library
- S. E. Robertson and Sparck J. K. 1976. Relevance Weighting of Search Terms. Journal of the American Society for Information Science (pre-1986) 27, 3 (May 1976), 129. Copyright - Copyright Wiley Periodicals Inc. May/Jun 1976; Last updated - 2010-06-09.Google Scholar
- J. J. Rocchio. 1971. Relevance feedback in information retrieval. In The Smart retrieval system - experiments in automatic document processing, G. Salton (Ed.). Englewood Cliffs, NJ: Prentice-Hall, 313--323.Google Scholar
- G. Salton and M. E. Lesk. 1968. Computer Evaluation of Indexing and Text Processing. J. ACM 15, 1 (1968), 8--36. Google ScholarDigital Library
- A. Trotman, C. L. A. Clarke, I. Ounis, S. Culpepper, M.-A. Cartright, and S. Geva. 2012. Open Source Information Retrieval: A Report on the SIGIR 2012 Workshop. SIGIR Forum 46, 2 (2012), 95--101. Google ScholarDigital Library
- A. Trotman, A. Puurula, and B. Burgess. 2014. Improvements to BM25 and Language Models Examined. In ADCS '14. 58:58--58:65. Google ScholarDigital Library
- E. M. Voorhees. 1994. Query Expansion Using Lexical-semantic Relations. In SIGIR '94. 61--69. Google ScholarDigital Library
- Y.-C. Wang, J. Vandendorpe, and M. Evens. 1985. Relational thesauri in information retrieval. JASIS 36, 1 (1985), 15--27. Google ScholarDigital Library
- Zhibiao Wu and Martha Palmer. 1994. Verbs Semantics and Lexical Selection. In Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics (ACL '94). Association for Computational Linguistics, Stroudsburg, PA, USA, 133--138. Google ScholarDigital Library
- L. Zhao and J. Callan. 2010. Term Necessity Prediction. In CIKM 2010. 259--268. Google ScholarDigital Library
Index Terms
- Refining Query Expansion Terms using Query Context
Recommendations
Automatic Term Reweighting for Query Expansion
ADCS '17: Proceedings of the 22nd Australasian Document Computing SymposiumQuery expansion is used to overcome the vocabulary mismatch between the documents and queries, but it can lead to query drift. We propose an automatic term reweighting strategy for BM25 ranking functions. Using expansion terms obtained from general ...
Evaluating sources of query expansion terms
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrievalThis study investigates the effectiveness of retrieval systems and human users in generating terms for query expansion. We compare three sources of terms: system generated terms, terms users select from top-ranked sentences, and user generated terms. ...
Query expansion of zero-hit subject searches: using a thesaurus in conjunction with NLP techniques
TPDL'12: Proceedings of the Second international conference on Theory and Practice of Digital LibrariesThe focus of our study is zero-hit queries in keyword subject searches and the effort of increasing recall in these cases by reformulating and, then, expanding the initial queries using an external source of knowledge, namely a thesaurus. To this end, ...
Comments