skip to main content

A Survey of Automatic Query Expansion in Information Retrieval

Published: 01 January 2012 Publication History


The relative ineffectiveness of information retrieval systems is largely caused by the inaccuracy with which a query formed by a few keywords models the actual user information need. One well known method to overcome this limitation is automatic query expansion (AQE), whereby the user’s original query is augmented by new features with a similar meaning. AQE has a long history in the information retrieval community but it is only in the last years that it has reached a level of scientific and experimental maturity, especially in laboratory settings such as TREC. This survey presents a unified view of a large number of recent approaches to AQE that leverage various data sources and employ very different principles and techniques. The following questions are addressed. Why is query expansion so important to improve search effectiveness? What are the main steps involved in the design and implementation of an AQE component? What approaches to AQE are available and how do they compare? Which issues must still be resolved before AQE becomes a standard component of large operational information retrieval systems (e.g., search engines)?


Agichtein, E., Lawrence, S., and Gravano, L. 2004. Learning to find answers to questions on the Web. ACM Trans. on Internet Technol. 4, 2, 1299--162.
Agirre, E., Ansa, O., Arregi, X., de Lacalle, M. L., Otegi, A., Saralegi, X., and Saragoza, H. 2009. Elhuyar-ixa: Semantic relatedness and cross-lingual passage retrieval. In Proceedings of CLEF. Springer.
Agirre, E., Di Nunzio, G. M., Mandl, T., and Otegi, A. 2009. Clef 2009 ad hoc track overview: Robust--wsd task. In Proceedings of CLEF. Springer.
Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, 207--216.
Allan, J. 1996. Incremental relevance feedback for information filtering. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 270--278.
Amati, G. 2003. Probabilistic models for information retrieval based on divergence from randomness. Ph.D. thesis, Department of Computing Science, University of Glasgow, UK.
Amati, G., Carpineto, C., and Romano, G. 2001. FUB at TREC-10 Web Track: A probabilistic framework for topic relevance term weighting. In Proceedings of the 10th Text REtrieval Conference (TREC’10). NIST Special Publication 500--250. National Institute of Standards and Technology (NIST), Gaithersburg, MD, 182--191.
Amati, G., Carpineto, C., and Romano, G. 2003. Comparing weighting models for monolingual information retrieval. In Proceedings of the 4th Workshop of the Cross-Language Evaluation Forum (CLEF’03). Springer, 310--318.
Amati, G., Carpineto, C., and Romano, G. 2004. Query difficulty, robustness, and selective application of query expansion. In Proceedings of the 26th European Conference on Information Retrieval (ECIR’04). Springer, 127--137.
Anderson, J. R. 1983. A spreading activation theory of memory. J. Verbal Learn. Verbal Behav. 22, 261--295.
Arguello, J., Elsas, J. L., Callan, J., and Carbonell, J. G. 2008. Document representation and query expansion models for blog recommendation. In Proceedings of the 2nd International Conference on Weblogs and Social Media. AAAI Press, 10--18.
Attar, R. and Fraenkel, A. S. 1977. Local feedback in full-text retrieval systems. J. ACM 24, 3, 397--417.
Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison Wesley.
Bai, J., Nie, J.-Y., and Cao, G. 2006. Context-dependent term relations for information retrieval. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 551--559.
Bai, J., Song, D., Bruza, P., Nie, J.-Y., and Cao, G. 2005. Query expansion using term relationships in language models for information retrieval. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM Press, 688--695.
Bai, J., Nie, J.-Y., Cao, G., and Bouchard, H. 2007. Using query contexts in information retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 15--22.
Ballesteros, L. and Croft, W. B. 1997. Phrasal translation and query expansion techniques for cross-language information retrieval. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 84--91.
Ballesteros, L. and Croft, W. B. 1998. Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 64--71.
Bast, H. and Weber, I. 2006. Type less, find more: fast autocompletion search with a succinct index. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 364--371.
Bast, H., Majumdar, D., and Weber, I. 2007. Efficient interactive query expansion with complete search. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 857--860.
Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 407--416.
Belkin, N. J. and Croft, W. B. 1992. Information filtering and information retrieval: Two sides of the same coin? Comm. ACM 35, 12, 29--38.
Bernardini, A. and Carpineto, C. 2008. Fub at trec 2008 relevance feedback track: extending rocchio with distributional term analysis. In Proceedings of TREC-2008. National Institute of Standards and Technology, Gaithersburg, MD, USA.
Bernardini, A., Carpineto, C., and D’Amico, M. 2009. Full-subtopic retrieval with keyphrase-based search results clustering. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, 206--213.
Bhogal, J., Macfarlane, A., and Smith, P. 2007. A review of ontology based query expansion. Info. Process. Manage. 43, 4, 866--886.
Billerbeck, B. 2005. Efficient query expansion. Ph.D. thesis, RMIT University, Melbourne, Australia.
Billerbeck, B. and Zobel, J. 2003. When query expansion fails. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM Press, 387--388.
Billerbeck, B. and Zobel, J. 2004a. Questioning query expansion: An examination of behaviour and parameters. In Proceedings of the 15th Australasian Database Conference. Vol. 27, Australian Computer Society, 69--76.
Billerbeck, B. and Zobel, J. 2004b. Techniques for efficient query expansion. In Proceedings of the String Processing and Information Retrieval Symposium. Springer, 30--42.
Billerbeck, B. and Zobel, J. 2005. Document expansion versus query expansion for ad-hoc retrieval. In Proceedings of the 10th Australasian Document Computing Symposium. Australian Computer Society, Sydney, Australia, 34--41.
Billerbeck, B., Scholer, F., Williams, H. E., and Zobel, J. 2003. Query expansion using associated queries. In Proceedings of the 12th ACM International Conference on Information and Knowledge Management. ACM Press, 2--9.
Bilotti, M., Katz, B., and Lin, J. 2004. What works better for question answering: Stemming or morphological query expansion? In Proceedings of the Information Retrieval for Question Answering (IR4QA) Workshop at SIGIR’04.
Bodoff, D. and Kambil, A. 1998. Partial coordination. I. The best of pre-coordination and post-coordination. J. Amer. Soc. Info. Sciences 49, 14, 1254--1269.
Broder, A. 2002. A taxonomy of web search. ACM SIGIR Forum 36, 2, 3--10.
Broder, A., Ciccolo, P., E.Gabrilovich, Josifovski, V., Metzler, D., Riedel, L., and Yuan, J. 2009. Online expansion of rare queries for sponsored search. In Proceedings of the 18th international conference on World Wide Web. ACM, 511--520.
Buckley, C. and Harman, D. K. 2003. Reliable information access final workshop report. In Proceedings of the Reliable Information Access Workshop (RIA). NRRC, 1--30.
Buckley, C., Salton, G., Allan, G., and Singhal, A. 1995. Automatic query expansion using smart: Trec3. In Proceedings of the 3rd Text REtrieval Conference (TREC-3). NIST Special Publication 500--226. National Institute of Standards and Technology (NIST), Gaithersburg, MD, 69--80.
Buscher, G., Dengel, A., and van Elst, L. 2008. Query expansion using gaze-based feedback on the subdocument level. In Proceedings of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 387--394.
Cao, G., Gao, J., Nie, J.-Y., and Bai, J. 2007. Extending query translation to cross-language query expansion with markov chain models. In Proceedings of the 16th Conference on Information and Knowledge Management (CIKM’07). ACM Press.
Cao, G., Gao, J., Nie, J.-Y., and Robertson, S. 2008. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 243--250.
Carmel, D., Farchi, E., Petruschka, Y., and Soffer, A. 2002. Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 283--290.
Carpineto, C. and Romano, G. 2004. Concept Data Analysis: Theory and Applications. John Wiley & Sons.
Carpineto, C., De Mori, R., Romano, G., and Bigi, B. 2001. An information theoretic approach to automatic query expansion. ACM Trans. Info. Syst. 19, 1, 1--27.
Carpineto, C., Romano, G., and Giannini, V. 2002. Improving retrieval feedback with multiple term-ranking function combination. ACM Trans. Info. Syst. 20, 3, 259--290.
Carpineto, C., Osiński, S., Romano, G., and Weiss, D. 2009. A survey of Web clustering engines. ACM Comput. Surv. 41, 3.
Chang, Y., Ounis, I., and Kim, M. 2006. Query reformulation using automatically generated query concepts from a document space. Info. Process. Manage. 42, 2, 453--468.
Chen, L., L’Abbate, M., Thiel, U., and Neuhold, E. J. 2004. Increasing the customers choice: Query expansion based on the layer-seeds method and its application in e-commerce. In Proceedings of the IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE’04). IEEE Computer Society, 317--324.
Chirita, P.-A., Firan, C. S., and Nejdl, W. 2007. Personalized query expansion for the web. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 7--14.
Chu, W. W., Liu, Z., and Mao, W. 2002. Textual document indexing and retrieval via knowledge sources and data mining. Comm. Institute of Info. Comput. Machinery 5, 2.
Church, K. and Hanks, P. 1990. Word association norms, mutual information and lexicography. Computat. Linguist. 16, 1, 22--29.
Church, K. and Smyth, B. 2007. Mobile content enrichment. In Proceedings of the 12th International Conference on Intelligent User Interfaces. ACM Press, 112--121.
Collins-Thompson, K. 2009. Reducing the risk of query expansion via robust constrained optimization. In Proceedings of the 18th Conference on Information and Knowledge Management (CIKM’09). ACM Press, 837--846.
Collins-Thompson, K. and Callan, J. 2005. Query expansion using random walk models. In Proceedings of the 14th Conference on Information and Knowledge Management (CIKM’05). ACM Press, 704--711.
Collins-Thompson, K. and Callan, J. 2007. Estimation and use of uncertainty in pseudo-relevance feedback. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 303--310.
Crabtree, D., Andreae, P., and Gao, X. 2007. Exploiting underrepresented query aspects for automatic query expansion. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 191--200.
Crestani, F. 1997. Application of spreading activation techniques in information retrieval. Artif. Intell. 11, 6, 453--482.
Cronen-Townsend, S. and Croft, W. B. 2002. Quantifying query ambiguity. In Proceedings of the 2nd International Conference on Human Language Technology Research. ACM Press, 104--109.
Crouch, C. and Yang, B. 1992. Experiments in automatic statistical thesaurus construction. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 77--88.
Cui, H., Wen, J.-R., Nie, J.-Y., and Ma, W.-Y. 2003. Query expansion by mining user logs. IEEE Trans. Knowl. Data Engin. 15, 4, 829--839.
Custis, T. and Al-Kofahi, K. 2007. A new approach for evaluating query expansion: Query-document term mismatch. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 575--582.
Deerwester, S., Dumais, S. T., Furnas, W., Landauer, T. K., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Info. Science 41, 6, 391--407.
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. Series B (Methodological) 39, 1, 1--38.
Diaz, F. and Metzler, D. 2006. Improving the estimation of relevance models using large external corpora. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 154--161.
Doszkocs, T. E. 1978. AID, an Associative Interactive Dictionary for Online Searching. Online Rev. 2, 2, 163--174.
Efron, M. 2008. Query Expansion and Dimensionality Reduction: Notions of Optimality in Rocchio Relevance Feedback and Latent Semantic Indexing. Info. Process. Manage. 44, 1, 163--180.
Efthimiadis, E. N. 1993. A user-centred evaluation of ranking algorithms for interactive query expansion. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 146--159.
Efthimiadis, E. N. 1996. Query expansion. In Annual Review of Information Systems and Technology, M. E. Williams Ed., ASIS&T, 121--187.
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. 2002. Placing search in context: The concept revisited. ACM Trans. Info. Syst. 20, 1, 116--131.
Fitzpatrick, L. and Dent, M. 1997. Automatic feedback using past queries: Social searching? In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 306--313.
Flemmings, R., Barros, J., Geraldo, A. P., and Moreira, V. P. 2009. Bbk-ufrgs@clef2009: Query expansion of geographic place names. In Proceedings of CLEF.
Fujii, A. 2008. Modeling anchor text and classifying queries to enhance web document retrieval. In Proceeding of the 17th International Conference on World Wide Web. ACM Press, 337--346.
Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T. 1987. The vocabulary problem in human-system communication. Comm. ACM 30, 11, 964--971.
Gauch, S., Wang, J., and Rachakonda, S. M. 1999. A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Trans. Info. Syst. 17, 3, 250--269.
Gong, Z., Cheang, C.-W., and U, L. 2006. Multi-term web query expansion using wordnet. In Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA’06). Springer, 379--388.
Gonzalo, J., Verdejo, F., Chugur, I., and Cigarrän, J. M. 1998. Indexing with wordnet synsets can improve text retrieval. In Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems. Association for Computational Linguistics, 647--678.
Graupmann, J., Cai, J., and Schenkel, R. 2005. Automatic query refinement using mined semantic relations. In Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration (WIRI). IEEE Computer Society, 205--213.
Hanani, U., Shapira, B., and Shoval, P. 2004. Information filtering: Overview of issues, research and systems. User Model. User-Adapt. Interact. 11, 3, 203--259.
Harabagiu, S. and Lacatusu, F. 2004. Strategies for advanced question answering. In Proceedings of the HLT- NAACL’04 Workshop on Pragmatics of Question Answering. 1--9.
Harabagiu, S., Moldovan, D., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Grju, R., Rus, V., and Morarescu, P. 2001. The role of lexico-semantic feedback in open-domain textual question-answering. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL-01). Association for Computational Linguistics, 282--289.
Harman, D. K. 1992. Relevance feedback and other query modification techniques. In Information Retrieval -- Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates Eds., Prentice Hall, Englewood Cliffs, N. J., 241--263.
Harper, G. W. and van Rijsbergen, C. J. 1978. An evaluation of feedback in document retrieval using co-occurrence data. J. Documentation 34, 3, 189--216.
Hauff, C., Hiemstra, D., and de Jong, F. 2008. A survey of pre-retrieval query performance predictors. In Proceedings of the 17th Conference on Information and Knowledge Management (CIKM’08). ACM Press, 1419--1420.
He, B. and Ounis, I. 2007. Combining fields for query expansion and adaptive query expansion. Info. Process. Manage. 43, 1294--1307.
He, B. and Ounis, I. 2009a. Finding good feedback documents. In Proceedings of the 18th Conference on Information and Knowledge Management (CIKM’09). ACM Press, 2011--2014.
He, B. and Ounis, I. 2009b. Studying query expansion effectiveness. In Proceedings of the 31th European Conference on Information Retrieval (ECIR’09). Springer, 611--619.
Hidalgo, J. M. G., de Buenaga Rodríguez, M., and Pérez, J. C. C. 2005. The role of word sense disambiguation in automated text categorization. In Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems. Springer, 298--309.
Hu, J., Deng, W., and Guo, J. 2006. Improving retrieval performance by global analysis. In Proceedings of the 18th International Conference on Pattern Recognition. IEEE Computer Society, 703--706.
Huang, C.-C., Chien, L.-F., and Oyang, Y.-J. 2003. Relevant term suggestion in interactive web search based on contextual information in query session logs. J. Amer. Soc. Info. Science Technol. 54, 7, 638--649.
Huang, C.-C., Lin, K.-M., and Chien, L.-F. 2005. Automatic training corpora acquisition through web mining. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, 193--199.
Hull, D. A. 1996. Stemming algorithms: a case study for detailed evaluation. J. Amer. Soc. Info. Science 47, 1, 70--84.
Ide, E. 1971. New experiments in relevance feedback. In The SMART Retrieval System, G. Salton Ed., Prentice Hall, Englewood Cliffs, N. J., 337--354.
Jelinek, F. and Mercer, R. L. 1980. Interpolated estimation of markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice. North-Holland, Amsterdam, The Netherlands, 381--397.
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., and Gay, G. 2007. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Info. Syst. 25, 2, 7.
Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the 15th International Conference on World Wide Web. ACM Press, 387--396.
Jones, S. 1993. A thesaurus data model for an intelligent retrieval system. J. Info. Science 19, 3, 167--178.
Jones, S. 1995. Interactive thesaurus navigation: Intelligence rules ok? J. Amer. Soc. for Info. Science 46, 1, 52--59.
Kamvar, M. and Baluja, S. 2007. The role of context in query input: Using contextual signals to complete queries on mobile devices. In Proceedings of the 9th International Conference on Human Computer Interaction with Mobile Devices and Services. ACM Press, 405--412.
Kanaan, G., Al-Shalabi, R., Ghwanmeh, S., and Bani-Ismail, B. 2008. Interactive and automatic query expansion: A comparative study with an application on Arabic. Amer. J. Appl. Sciences 5, 11, 1433--1436.
Kekäläinen, J. and Järvelin, K. 1998. The impact of query structure and query expansion on retrieval performance. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 130--137.
Kherfi, M. L., Ziou, D., and Bernardi., A. 2004. Image retrieval from the World Wide Web: Issues, techniques, and systems. ACM Comput. Surv. 36, 1, 35--67.
Koehn, P. 2010. Statistical Machine Translation. Cambridge University Press.
Kraaij, W., Nie, J., and Simard, M. 2003. Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval. Computat. Linguist. 29, 3, 381--420.
Kraft, R. and Zien, J. 2004. Mining anchor text for query refinement. In Proceedings of the 13th International Conference on World Wide Web. ACM Press, 666--674.
Krovetz, R. 1993. Viewing morphology as an inference process. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 191--202.
Krovetz, R. and Croft, W. B. 1992. Lexical ambiguity and information retrieval. ACM Trans. Info. Syst. 10, 2, 115--141.
Kurland, O., Lee, L., and Domshlak, C. 2005. Better than the real thing?: Iterative pseudo-query processing using cluster-based language models. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 19--26.
Kwok, K. L., Grunfeld, L., Sun, K. L., and Deng, P. 2004. TREC2004 robust track experiments using PIRCS. In Proceedings of the 13th Text REtrieval Conference (TREC-8). National Institute of Standards and Technology, Gaithersburg, MD.
Lam-Adesina, A. M. and Jones, G. J. F. 2001. Applying summarization techniques for term selection in relevance feedback. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1--9.
Latiri, C. C., Yahia, S. B., Chevallet, J. P., and Jaoua, A. 2004. Query expansion using fuzzy association rules between terms. In Proceedings of the 4th International Conference Journées de l’Informatique Messine (JIM’03).
Lau, R. Y. K., Bruza, P. D., and Song, D. 2004. Belief revision for adaptive information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 130--137.
Lau, T. and Horvitz, E. 1999. Patterns of search: Analyzing and modeling web query refinement. In Proceedings of the 7th International Conference on User Modeling. Springer, 119--128.
Lavelli, A., Sebastiani, F., and Zanoli, R. 2004. Distributional term representations: an experimental comparison. In Proceedings of the 16th Conference on Information and Knowledge Management (CIKM’04). ACM Press, 615--624.
Lavrenko, V. and Allan, J. 2006. Realtime query expansion in relevance models. IR 473, University of Massachusetts.
Lavrenko, V. and Croft, W. B. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 120--127.
Lee, K. S., Croft, W. B., and Allan, J. 2008. A cluster-based resampling method for pseudo-relevance feedback. In Proceedings of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 235--242.
Lesk, M. E. 1969. Word-Word Associations in Document Retrieval Systems. Amer. Documentation 20, 1, 8--36.
Lesk, M. E. 1988. They said true things, but called them by wrong names -- vocabulary problems over time in retrieval. In Proceedings of the Waterloo OED Conference. ACM Press, 1--10.
Lin, J. and Murray, G. C. 2005. Assessing the term independence assumption in blind relevance feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 635--636.
Liu, S., Liu, F., Yu, C., and Meng, W. 2004. An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 266--272.
Liu, Y., Li, C., Zhang, P., and Xiong, Z. 2008. A query expansion algorithm based on phrases semantic similarity. In Proceedings of the International Symposiums on Information Processing. IEEE Computer Society, 31--35.
Lv, Y. and Zhai, C. 2009. Adaptive relevance feedback in information retrieval. In Proceedings of the 18th Conference on Information and Knowledge Management (CIKM’09). ACM Press, 255--264.
Macdonald, C. and Ounis, I. 2007. Expertise drift and query expansion in expert search. In Proceedings of the 16th Conference on Information and Knowledge Management (CIKM’07). ACM Press.
Mandala, R., Takenobu, T., and Hozumi, T. 1998. The use of wordnet in information retrieval. In Proceedings of the ACL Workshop on the Usage of WordNet in Information Retrieval. Association for Computational Linguistics, 31--37.
Mandala, R., Tokunaga, T., and Tanaka, H. 1999. Combining multiple evidence from different types of thesaurus for query expansion. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 191--197.
Manning, C. D., Raghavan, P., and Sch’́utze, H. 2008. Introduction to Information Retrieval. Cambridge University Press.
Maron, M. E. and Kuhns, J. L. 1960. On relevance, probabilistic indexing and information retrieval. J. ACM 7, 3, 216--244.
McNamee, P. and Mayfield, J. 2002. Comparing cross-language query expansion techniques by degrading translation resources. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 159--166.
Melucci, M. 2008. A Basis for Information Retrieval in Context. ACM Trans. Info. Syst. 26, 3, Article No 14.
Metzler, D. and Croft, W. B. 2007. Latent concept expansion using Markov random fields. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 311--318.
Miller, G. A., Beckwith, R. T., Fellbaum, C. D., Gross, D., and Miller, K. 1990. WordNet: An online lexical database. Int. J. Lexicography 3, 4, 235--244.
Milne, D. N., Witten, I. H., and Nichols, D. M. 2007. A knowledge-based search engine powered by wikipedia. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. ACM Press, 445--454.
Minker, J., Wilson, G. A., and Zimmerman, B. H. 1972. An evaluation of query expansion by the addition of clustered terms for a document retrieval system. Info. Stor. Retrieval 8, 6, 329--348.
Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 206--214.
Montague, M. and Aslam, J. 2001. Relevance score normalization for metasearch. In Proceedings of the 10th International Conference on Information and Knowledge Management. ACM Press, 427--433.
Nallapati, R. and Shah, C. 2006. Evaluating the quality of query refinement suggestions in information retrieval. IR 521, University of Massachusetts.
Natsev, A., Haubold, A., Tes̆ić, J., Xie, L., and Yan, R. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th International Conference on Multimedia. ACM Press, 991--1000.
Navigli, R. 2009. Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, 1--69.
Navigli, R. and Velardi, P. 2003. An analysis of ontology-based query expansion strategies. In Proceedings of the ECML/PKDD-2003 Workshop on Adaptive Text Extraction and Mining.
Navigli, R. and Velardi, P. 2005. Structural semantic interconnections: A knowledge-based approach to word sense disambiguation. IEEE Trans. Pattern Anal. Mach. Intell. 27, 7, 1075--1086.
Osiński, S. and Weiss, D. 2005. A concept-driven algorithm for clustering search results. IEEE Intell. Syst. 20, 3, 48--54.
Palleti, P., Karnick, H., and Mitra, P. 2007. Personalizedweb search using probabilistic query expansion. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, 83--86.
Park, L. A. F. and Ramamohanarao, K. 2007. Query expansion using a collection dependent probabilistic latent semantic thesaurus. In Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’07). Springer, 224--235.
Perugini, S. and Ramakrishnan, N. 2006. Interacting with web hierarchies. IT Professional 8, 4, 19--28.
Pirkola, A., Hedlund, T., Keskusalo, H., and Ja̋rvelin, K. 2001. Dictionary-based cross-language information retrieval: Problems, methods, and research findings. Info. Retrieval 4, 209--230.
Porter, M. F. 1982. Implementing a probabilistic information retrieval system. Info. Technol.: Resear. Develop. 1, 2, 131--156.
Porter, M. F. 1997. An algorithm for suffix stripping. In Readings in Information Retrieval, K. S. Jones and P. Willett Eds., Morgan Kaufmann, 313--316.
Qiu, Y. and Frei, H.-P. 1993. Concept-based query expansion. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 160--169.
Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V., and Liu, Y. 2007. Statistical machine translation for query expansion in answer retrieval. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-07). Association for Computational Linguistics, 464--471.
Robertson, S. E. 1986. On relevance weight estimation and query expansion. J. Documentation 42, 3, 182--188.
Robertson, S. E. 1990. On term selection for query expansion. J. Documentation 46, 4, 359--364.
Robertson, S. E. and Sparck Jones, K. 1976. Relevance weighting of search terms. J. Amer. Soc. Info. Science 27, 129--146.
Robertson, S. E. and Walker, S. 2000. Microsoft cambridge at trec-9: Filtering track. In Proceedings of the 9th Text REtrieval Conference (TREC-9). NIST Special Publication 500-249. National Institute of Standards and Technology (NIST), Gaithersburg, MD, 361--368.
Robertson, S. E., Walker, S., and Beaulieu, M. M. 1998. Okapi at TREC-7: Automatic ad hoc, filtering, VLC, and interactive track. In Proceedings of the 7th Text REtrieval Conference (TREC-7), NIST Special Publication 500-242. National Institute of Standards and Technology (NIST), Gaithersburg, MD, 253--264.
Rocchio, J. J. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System, G. Salton Ed., Prentice-Hall, Englewood Cliffs, NJ, 313--323.
Ruthven, I. 2003. Re-examining the potential effectiveness of interactive query expansion. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 213--220.
Ruthven, I. and Lalmas, M. 2003. A survey on the use of relevance feedback for information access systems. Knowl. Engin. Rev. 18, 2, 95--145.
Sahlgren, M. 2005. An introduction to random indexing. In Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering.
Sakai, T., Manabe, M., and Koyama, M. 2005. Flexible pseudo-relevance feedback via selective sampling. ACM Trans. Info. Syst. 4, 2, 111--35.
Salton, G. and Buckley, C. 1990. Improving retrieval performance by relevance feedback. J. Amer. Soc. Info. Science 41, 4, 288--297.
Salton, G. and McGill, M. 1983. Introduction to Modern Information Retrieval. McGraw Hill, New York, NY.
Sanderson, M. 1994. Word sense disambiguation and information retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 142--151.
Sanderson, M. 2000. Retrieving with good sense. Info. Retrieval 2, 1, 49--69.
Savoy, J. 2005. Comparative study of monolingual and multilingual search models for use with asian languages. ACM Trans. Asian Lang. Info. Process. 4, 2, 163--189.
Schlaefer, N., Ko, J., Betteridge, J., Sautter, G., and amd E. Nyberg, M. P. 2007. Semantic extensions of the Ephyra QA system for TREC 2007. In Proceedings of the 16th Text REtrieval Conference (TREC’07). NIST Special Publication 500-274. National Institute of Standards and Technology (NIST), Gaithersburg, MD, 332--341.
Schütze, H. and Pedersen, J. O. 1995. Information retrieval based on word senses. In Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval. 161--175.
Schütze, H. and Pedersen, O. 1997. A co-occurrence based thesaurus and two applications to information retrieval. Info. Process. Manage. 33, 3, 307--318.
Semeraro, G., Lops, P., Basile, P., and de Gemmis, M. 2009. On the tip of my thought: Playing the guillotine game. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. AAAI Press, 1543--1548.
Shen, X. and Zhai, C. 2005. Active feedback in ad hoc information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 59--66.
Shokouhi, M., Azzopardi, L., and Thomas, P. 2009. Effective query expansion for federated search. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 427--434.
Singhal, A. and Pereira, F. 1999. Document expansion for speech retrieval. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 34--41.
Song, M., Song, I.-Y., Allen, R. B., and Obradovic, Z. 2006. Keyphrase extraction-based query expansion in digital libraries. In Proceedings of the 6th ACM/IEEE-CS joint International Conference on Digital Libraries (JCDL’06). ACM Press, 202--209.
Song, M., Song, I.-Y., Hu, X., and Allen, R. B. 2007. Integration of association rules and ontologies for semantic query expansion. Data Knowl. Engin. 63, 1, 63--75.
Sun, R., Ong, C.-H., and Chua, T.-S. 2006. Mining dependency relations for query expansion in passage retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 382--389.
Suryanto, M. A., Lim, E.-P., Sun, A., and Chiang, R. H. 2007. Document expansion versus query expansion for ad-hoc retrieval. In Proceedings of the ACM 1st Workshop on CyberInfrastructure: Information Management in eScience. ACM Press, 47--54.
Theobald, M., Shenkel, R., and Weikum, G. 2004. Top-k query evaluation with probabilistic guarantees. In Proceedings of the 13th International Conference on Very Large Data Bases. ACM Press, 648--659.
Theobald, M., Shenkel, R., and Weikum, G. 2005. Efficient and selftuning incremental query expansion for top-k query processing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 242--249.
van Rijsbergen, C. J. 1979. Information Retrieval. Butterworths.
Vechtomova, O. 2009. Query expansion for information retrieval. In Encyclopedia of Database Systems, L. Liu and M. T. Özsu Eds., Springer, 2254--2257.
Vechtomova, O. and Karamuftuoglu, M. 2004. Elicitation and use of relevance feedback information. Info. Process. Manage. 42, 1, 191--206.
Véronis, J. 2004. HyperLex: lexical cartography for information retrieval. Computer Speech Lang. 18, 3, 223--252.
Voorhees, E. 1993. Using wordnet to disambiguate word senses for text retrieval. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 171--180.
Voorhees, E. 1994. Query expansion using lexical-semantic relations. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 61--69.
Voorhees, E. 2004. Overview of the trec 2004 robust track. In Proceedings of the 13th Text REtrieval Conference (TREC-7). NIST Special Publication 500-261. National Institute of Standards and Technology (NIST), Gaithersburg, MD.
Voorhees, E. and Harman, D. 1998. Overview of the seventh text retrieval conference (TREC-7). In Proceedings of the 7th Text REtrieval Conference (TREC-7). NIST Special Publication 500-242. National Institute of Standards and Technology (NIST), Gaithersburg, MD, 1--24.
Wang, H., Liang, Y., Fu, L., Xue, G.-R., and Yu, Y. 2009. Efficient query expansion for advertisement search. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 51--58.
Wang, X., Fang, H., and Zhai, C. 2008. A study of methods for negative relevance feedback. In Proceedings of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 219--226.
Wei, X. and Croft, W. B. 2007. Modeling term associations for ad-hoc retrieval performance within language modeling framework. In Proceedings of the 29th European Conference on IR Research (ECIR’07). Springer, 52--63.
White, R. W., Ruthven, I., and Jose, J. M. 2005. A study of factors affecting the utility of implicit relevance feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 35--42.
Winaver, M., Kurland, O., and Domshlak, C. 2007. Towards robust query expansion: Model selection in the language modeling framework. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 729--730.
Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images 2nd Ed. Morgan Kaufman.
Wong, S. K. M., Ziarko, W., Raghavan, V. V., and Wong, P. C. N. 1987. On modeling of information retrieval concepts in vector spaces. ACM Trans. Datab. Syst. 12, 2, 299--321.
Wong, W. S., Luk, R. W. P., Leong, H. V., Ho, K. S., and Lee, D. L. 2008. Re-examining the effects of adding relevance information in a relevance feedback environment. Info. Process. Manage. 44, 3, 1086--1116.
Xu, J. and Croft, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 4--11.
Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Info. Syst. 18, 1, 79--112.
Xu, Y., Jones, G. J. F., and Wang, B. 2009. Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 59--66.
Xu, Z. and Akella, R. 2007. Incorporating diversity and density in active learning for relevance feedback. In Proceedings of the 29th European Conference on IR Research (ECIR’07). Springer, 246--257.
Xue, G.-R., Zeng, H.-J., Chen, Z., Yu, Y., Ma, W.-Y., Xi, W., and Fan, W. 2004. Optimizing web search using web click-through data. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management. ACM Press, 118--126.
Yin, Z., Shokouhi, M., and Craswell, N. 2009. Query expansion using external evidence. In Proceedings of the 31th European Conference on Information Retrieval (ECIR’09). Springer, 362--374.
Yu, S., Cai, D., Wen, J. R., and Ma, W. Y. 2003. Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In Proceedings of the 12th International Conference on World Wide Web. ACM, 11--18.
Zelikovitz, S. and Hirsh, H. 2000. Improving short-text classification using unlabeled background knowledge to assess document similarity. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). National Institute of Standards and Technology (NIST), 1183--1190.
Zha, Z.-J., Yang, L., Mei, T., Wang, M., and Wang, Z. 2009. Visual query suggestion. In Proceedings of the 17th ACM International Conference on Multimedia. ACM Press, 15--24.
Zhai, C. and Lafferty, J. 2001a. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the 10th International Conference on Information and Knowledge Management. ACM Press, 403--410.
Zhai, C. and Lafferty, J. 2001b. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 334--342.
Zimmer, C., Tryfonopoulos, C., and Weikum, G. 2008. Exploiting correlated keywords to improve approximate information filtering. In Proceedings of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 323--330.

Cited By

View all
  • (2024)pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in AstronomyThe Astrophysical Journal Supplement Series10.3847/1538-4365/ad7c43275:2(38)Online publication date: 29-Nov-2024
  • (2024)Moderator: Moderating Text-to-Image Diffusion Models through Fine-grained Context-based PoliciesProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690327(1181-1195)Online publication date: 2-Dec-2024
  • (2024)Content-Based Exclusion Queries in Keyword-Based Image RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657619(1145-1149)Online publication date: 30-May-2024
  • Show More Cited By

Index Terms

  1. A Survey of Automatic Query Expansion in Information Retrieval



      Information & Contributors


      Published In

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 44, Issue 1
      January 2012
      181 pages
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 January 2012
      Accepted: 01 March 2010
      Revised: 01 February 2010
      Received: 01 November 2009
      Published in CSUR Volume 44, Issue 1


      Request permissions for this article.

      Check for updates

      Author Tags

      1. Query expansion
      2. document ranking
      3. pseudo-relevance feedback
      4. query refinement
      5. search
      6. word associations


      • Research-article
      • Research
      • Refereed


      Other Metrics

      Bibliometrics & Citations


      Article Metrics

      • Downloads (Last 12 months)280
      • Downloads (Last 6 weeks)42
      Reflects downloads up to 25 Feb 2025

      Other Metrics


      Cited By

      View all
      • (2024)pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in AstronomyThe Astrophysical Journal Supplement Series10.3847/1538-4365/ad7c43275:2(38)Online publication date: 29-Nov-2024
      • (2024)Moderator: Moderating Text-to-Image Diffusion Models through Fine-grained Context-based PoliciesProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690327(1181-1195)Online publication date: 2-Dec-2024
      • (2024)Content-Based Exclusion Queries in Keyword-Based Image RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657619(1145-1149)Online publication date: 30-May-2024
      • (2024)The Surprising Effectiveness of Rankers trained on Expanded QueriesProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657938(2652-2656)Online publication date: 10-Jul-2024
      • (2024)A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657933(2271-2275)Online publication date: 10-Jul-2024
      • (2024)Capability-aware Prompt Reformulation Learning for Text-to-Image GenerationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657787(2145-2155)Online publication date: 10-Jul-2024
      • (2024)MOJI: Enhancing Emoji Search System with Query Expansions and Emoji RecommendationsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650838(1-8)Online publication date: 11-May-2024
      • (2024)On Using GUI Interaction Data to Improve Text Retrieval-based Bug LocalizationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608139(1-13)Online publication date: 20-May-2024
      • (2024)A Case Study of Enhancing Sparse Retrieval using LLMsCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651945(1609-1615)Online publication date: 13-May-2024
      • (2024)Retrieval-augmented Query Reformulation for Heterogeneous Research Asset Retrieval in Virtual Research EnvironmentCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651553(907-910)Online publication date: 13-May-2024
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options


      View or Download as a PDF file.



      View online with eReader.







      Share this Publication link

      Share on social media