Abstract
A major issue in effective information retrieval is the problem of vocabulary mismatches. The method called query expansion addresses this issue by reformulating each search query with additional terms that better define the information needs of the user. Many researchers have contributed to improving the accuracy of information retrieval systems, through different approaches to query expansion. In this article, we primarily discuss statistical query expansion approaches that include document analysis, search and browse log analyses, and web knowledge analyses. In addition to proposing a comprehensive classification for these approaches, we also briefly analyse the pros and cons of each technique. Finally, we evaluate these techniques using five functional features and experimental settings such as TREC collection and results of performance metrics. An in-depth survey of different statistical query expansion approaches suggests that the selection of the best approach depends on the type of search query, the nature and availability of data resources, and performance efficiency requirements.


Similar content being viewed by others
References
Gabrilovich E, Broder A, Fontoura M, Joshi A, Josifovski V, Riedel L, Zhang T (2009) Classifying search queries using the web as a source of knowledge. ACM Trans Web (TWEB) 3(2):5
Rivas AR, Iglesias EL, Borrajo L (2014) Study of query expansion techniques and their application in the biomedical information retrieval. Sci World J 2014:10. https://doi.org/10.1155/2014/132158
Bhogal J, MacFarlane A, Smith P (2007) A review of ontology based query expansion. Inf Process Manage 43(4):866–886
Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv (CSUR) 44(1):1
Sartori FA (2009) comparison of methods and techniques for ontological query expansion. In: Research conference on metadata and semantic research. Springer, pp 203–214
Natsev AP, Haubold A, Tešić J, Xie L, Yan R (2007) Semantic concept-based query expansion and re-ranking for multimedia retrieval. In: Proceedings of the 15th ACM international conference on multimedia. ACM, pp 991–1000
Mahdabi P, Popescu-Belis A (2016) Comparing two strategies for query expansion in a news monitoring system. In: Métais E, Meziane F, Saraee M, Sugumaran V, Vadera S (eds) Natural language processing and information systems: 21st international conference on applications of natural language to information systems, NLDB 2016, Salford, UK, June 22–24, 2016, Proceedings. Springer International Publishing, Cham, pp 267–275. https://doi.org/10.1007/978-3-319-41754-7_24
Carpineto C, De Mori R, Romano G, Bigi B (2001) An information-theoretic approach to automatic query expansion. ACM Trans Inf Syst (TOIS) 19(1):1–27
Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing? Int J Hum Comput Stud 43(5):907–928. https://doi.org/10.1006/ijhc.1995.1081
Zarrouk M, Lafourcade M, Joubert A (2014) About inferences in a crowdsourced lexical-semantic network. EACL 2014:174
Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to WordNet: an on-line lexical database. Int J Lexicogr 3(4):235–244
Qiu Y, Frei H-P (1993) Concept based query expansion. In: Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 160–169
Salton G (1968) Automatic Information Organization and Retrieval. McGraw Hill Text, New York
Pinto FJ, Martinez AF, Perez-Sanjulian CF (2008) Joining automatic query expansion based on thesaurus and word sense disambiguation using WordNet. IJCAT 33:271–279
Liu S, Ni Y, Mei J, Li H, Xie GT, Hu G, Liu H, Hou X, Pan Y (2009) iSMART: Ontology-based semantic query of CDA documents
Mihalcea R, Moldovan D (2000) Semantic indexing using WordNet senses. In: Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th annual meeting of the association for computational linguistics-Vol 11. Association for Computational Linguistics, pp 35–45
Kara S, Alan Ö, Sabuncu O, Akpınar S, Cicekli NK, Alpaslan FN (2012) An ontology-based retrieval system using semantic indexing. Inf Syst 37(4):294–305. https://doi.org/10.1016/j.is.2011.09.004
Zhai J, Zhou K (2010) Semantic retrieval for sports information based on ontology and SPARQL. In: 2010 international conference of information science and management engineering (ISME). IEEE, pp 395–398
Nguyen H-M, Tran K-N, Vo X-V (2015) GeTFIRST: ontology-based keyword search towards semantic disambiguation. Int J Web Inf Syst 11(4):442–467. https://doi.org/10.1108/ijwis-06-2015-0019
Li H, Xu J (2014) Semantic Matching in Search. Foundations and Trends®. Inf Retr 7(5):343–469. https://doi.org/10.1561/1500000035
Macdonald C, Ounis I (2007) Expertise drift and query expansion in expert search. In: Proceedings of the sixteenth ACM conference on information and knowledge management. ACM, pp 341–350
Mahler D (2004) Holistic query expansion using graphical models. New Dir Quest Answ 2004:203–227
Han L, Chen G (2009) HQE: a hybrid method for query expansion. Expert Syst Appl 36(4):7985–7991
Zhixiao Wang QN (2012) Research on hybrid query expansion algorithm. Int J Hybrid Inf Technol 5(2):207–212
Jiyeon C, Youkyoung P, Mun Y (2016) A hybrid method for retrieving medical documents with query expansion. In: 2016 international conference on big data and smart computing (BigComp), 18–20 Jan. 2016. pp 411–414. https://doi.org/10.1109/bigcomp.2016.7425959
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM press, New York
Zohar H, Liebeskind C, Schler J, Dagan I (2013) Automatic thesaurus construction for cross generation corpus. J Comput Cultural Heritage (JOCCH) 6(1):4
Jing Y, Croft WB (1994) An association thesaurus for information retrieval. In: Intelligent multimedia information retrieval systems and management-Vol 1. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE, pp 146–160
Park LAF, Ramamohanarao K (2007) Query expansion using a collection dependent probabilistic latent semantic thesaurus. In: Zhou Z-H, Li H, Yang Q (eds) Advances in Knowledge Discovery and Data Mining: 11th Pacific-Asia Conference, PAKDD 2007, Nanjing, China, May 22–25, 2007. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 224–235. https://doi.org/10.1007/978-3-540-71701-0_24
Hu J, Deng W, Guo J (2006) Improving retrieval performance by global analysis. In: 18th international conference on pattern recognition, 2006. ICPR 2006. IEEE, pp 703–706
Xu Y, Jones GJ, Wang B (2009) Query dependent pseudo-relevance feedback based on wikipedia. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 59–66
Xu J, Croft WB (1996) Query expansion using local and global document analysis. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 4–11
Rocchio JJ (1971) Relevance feedback in information retrieval. Prentice Hall, Uppper Saddle River
Buckley C, Salton G, Allan J, Singhal A (1995) Automatic query expansion using SMART: TREC 3. NIST special publication sp: 69–69
Bernardini A, Carpineto C (2008) Fub at trec 2008 relevance feedback track: extending Rocchio with distributional term analysis. DTIC Document
Efron M (2008) Query expansion and dimensionality reduction: notions of optimality in Rocchio relevance feedback and latent semantic indexing. Inf Process Manag 44(1):163–180
Ermakova L, Mothe J (2016) Query expansion by local context analysis. In: CORIA-CIFED, pp 235–250
Miao J, Huang JX, Ye Z (2012) Proximity-based Rocchio’s model for pseudo relevance. In: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 535–544
Willett P (1988) Recent trends in hierarchic document clustering: a critical review. Inf Process Manag 24(5):577–597
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39:1–38
Metzler D, Croft WB (2007) Latent concept expansion using markov random fields. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 311–318
El-Hamdouchi A, Willett P (1986) Hierarchic document classification using Ward’s clustering method. In: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 149–156
Gelfer Kalmanovich I, Kurland O (2009) Cluster-based query expansion. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 646–647
Liu Z, Natarajan S, Chen Y (2011) Query expansion based on clustered results. Proc VLDB Endow 4(6):350–361
Oh H-S, Jung Y (2015) Cluster-based query expansion using external collections in medical information retrieval. J Biomed Inform 58:70–79
Cui H, Wen J-R, Nie J-Y, Ma W-Y (2002) Query expansion for short queries by mining user logs. IEEE Trans Knowl Data Eng 15(4):829–839
Xue G-R, Zeng H-J, Chen Z, Yu Y, Ma W-Y, Xi W, Fan W (2004) Optimizing web search using web click-through data. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management. ACM, pp 118–126
Zhu Y, Gruenwald L (2005) Query expansion using web access log files. In: International conference on database and expert systems applications. Springer, pp 686–695
Tannebaum W, Mahdabi P, Rauber A (2015) Effect of log-based query term expansion on retrieval effectiveness in patent searching. In: International conference of the cross-language evaluation forum for European languages. Springer, Berlin, pp 300–305
Yin Z, Shokouhi M, Craswell N (2009) Query expansion using external evidence. In: European conference on information retrieval. Springer, Berlin, pp 362–374
Sun J-T, Zeng H-J, Liu H, Lu Y, Chen Z (2005) Cubesvd: a novel approach to personalized web search. In: Proceedings of the 14th international conference on world wide web. ACM, pp 382–390
Gauch S, Speretta M, Chandramouli A, Micarelli A (2007) User profiles for personalized information access. In: Brusilovsky P, Kobsa A, Neidl W (eds) The adaptive web: methods and strategies of web personalization. Springer, Berlin, pp 54–89
Ahmadian N, Nematbakhsh MA, Vahdat-Nejad H (2011) A context aware approach to semantic query expansion. In: Proceedings of the 2011 international conference on innovations in information technology (IIT). IEEE, pp 57–60
Jiang D, Pei J, Li H (2013) Mining search and browse logs for web search: a survey. ACM Trans Intell Syst Technol (TIST) 4(4):57
Zhu Z, Xu J, Ren X, Tian Y, Li L (2007) Query expansion based on a personalized web search model. In: Proceedings of the third international conference on semantics, knowledge and grid. IEEE, pp 128–133
Cao H, Jiang D, Pei J, Chen E, Li H (2009) Towards context-aware search by learning a very large variable length hidden Markov model from search logs. In: Proceedings of the 18th international conference on World Wide Web. ACM, pp 191–200
Boughareb D, Farah N (2013) A query expansion approach using the context of the search. In: van Berlo A, Hallenborg K, Corchado Rodríguez JM, Tapia DI, Novais P (eds) Ambient intelligence-software and applications. Springer, Berlin, pp 57–63
Agichtein E, Zheng Z (2006) Identifying best bet web search results by mining past user behavior. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 902–908
Agichtein E, Brill E, Dumais S (2006) Improving web search ranking by incorporating user behavior information. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 19–26
Ruthven I, Lalmas M, Van Rijsbergen K (2003) Incorporating user search behavior into relevance feedback. J Am Soc Inform Sci Technol 54(6):529–549
Gao J, Cao G, He H, Zhang M, Nie J-Y, Walker S (2001) Robertson SE TREC-10 web track experiments at MSRA. In: TREC
Kraft R, Zien J (2004) Mining anchor text for query refinement. In: Proceedings of the 13th international conference on world wide web. ACM, pp 666–674
Eiron N, McCurley KS (2003) Analysis of anchor text for web search. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 459–460
Dang V, Croft BW (2010) Query reformulation using anchor text. In: Proceedings of the third ACM international conference on web search and data mining. ACM, pp 41–50
Kaptein R, Kamps J (2008) Finding entities in Wikipedia using links and categories. In: International workshop of the initiative for the evaluation of XML retrieval. Springer, pp 273–279
Guisado-Gámez J, Prat-Pérez A (2015) Understanding graph structure of wikipedia for query expansion. In: Proceedings of the GRADES’15. ACM, p 6
Xiong C, Callan J (2015) Query expansion with freebase. In: Proceedings of the 2015 international conference on the theory of information retrieval. ACM, pp 111–120
Ronen I, Shahar E, Ur S, Uziel E, Yogev S, Zwerdling N, Carmel D, Guy I, Har’El N, Ofek-Koifman S (2009) Social networks and discovery in the enterprise (SaND). In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 836–836
Biancalana C, Gasparetti F, Micarelli A, Sansonetti G (2013) Social semantic query expansion. ACM Trans Intell Syst Technol (TIST) 4(4):60
Zhou D, Lawless S, Wade V (2012) Web search personalization using social data. In: International conference on theory and practice of digital libraries. Springer, Berlin, pp 298–310
Bao S, Xue G, Wu X, Yu Y, Fei B, Su Z (2007) Optimizing web search using social annotations. Paper presented at the proceedings of the 16th international conference on World Wide Web, Banff, Alberta, Canada
Biancalana C (2009) Social tagging for personalized web search. In: Serra R, Cucchiara R (eds) AI*IA 2009: emergent perspectives in artificial intelligence: 14th international conference of the Italian association for artificial intelligence Reggio Emilia, Italy, December 9–12, 2009 Proceedings. Springer, Berlin, pp 232–242. https://doi.org/10.1007/978-3-642-10291-2_24
Kuzi S, Carmel D, Libov A, Raviv A (2017) Query expansion for email search. Paper presented at the Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan
Chen C, Chunyan H, Xiaojie Y (2012) Relevance feedback fusion via query expansion. Paper presented at the proceedings of the the 2012 IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technology, vol 03
Zhu D, Wu S, Carterette B, Liu H (2014) Using large clinical corpora for query expansion in text-based cohort identification. J Biomed Inform 49(Supplement C):275–281. https://doi.org/10.1016/j.jbi.2014.03.010
Abdulla AAA, Lin H, Xu B, Banbhrani SK (2016) Improving biomedical information retrieval by linear combinations of different query expansion techniques. BMC Bioinform 17(S-7):238. https://doi.org/10.1186/s12859-016-1092-8
Wu H, Fang H (2013) An incremental approach to efficient pseudo-relevance feedback. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 553–562
Alarfaj F, Kruschwitz U, Fox C (2015) Experiments with query expansion for entity finding. In: Gelbukh A (ed) Proceedings of the 16th international conference computational linguistics and intelligent text processing CICLing 2015, Cairo, Egypt, April 14–20, 2015, Proceedings, Part II. Springer International Publishing, Cham, pp 417–426. https://doi.org/10.1007/978-3-319-18117-2_31
Lin Y, Xu B, Li L, Lin H, Xu K (2017) Social annotation for query expansion learning from multiple expansion strategies. In: Cheng X, Ma W, Liu H, Shen H, Feng S, Xie X (eds) Proceedings of the 6th national conference social media processing, SMP 2017, Beijing, China, September 14–17, 2017. Springer Singapore, Singapore, pp 181–192. https://doi.org/10.1007/978-981-10-6805-8_15
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raza, M.A., Mokhtar, R. & Ahmad, N. A survey of statistical approaches for query expansion. Knowl Inf Syst 61, 1–25 (2019). https://doi.org/10.1007/s10115-018-1269-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1269-8