Skip to main content
Log in

A survey of statistical approaches for query expansion

  • Survey Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A major issue in effective information retrieval is the problem of vocabulary mismatches. The method called query expansion addresses this issue by reformulating each search query with additional terms that better define the information needs of the user. Many researchers have contributed to improving the accuracy of information retrieval systems, through different approaches to query expansion. In this article, we primarily discuss statistical query expansion approaches that include document analysis, search and browse log analyses, and web knowledge analyses. In addition to proposing a comprehensive classification for these approaches, we also briefly analyse the pros and cons of each technique. Finally, we evaluate these techniques using five functional features and experimental settings such as TREC collection and results of performance metrics. An in-depth survey of different statistical query expansion approaches suggests that the selection of the best approach depends on the type of search query, the nature and availability of data resources, and performance efficiency requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Gabrilovich E, Broder A, Fontoura M, Joshi A, Josifovski V, Riedel L, Zhang T (2009) Classifying search queries using the web as a source of knowledge. ACM Trans Web (TWEB) 3(2):5

    Google Scholar 

  2. Rivas AR, Iglesias EL, Borrajo L (2014) Study of query expansion techniques and their application in the biomedical information retrieval. Sci World J 2014:10. https://doi.org/10.1155/2014/132158

    Google Scholar 

  3. Bhogal J, MacFarlane A, Smith P (2007) A review of ontology based query expansion. Inf Process Manage 43(4):866–886

    Article  Google Scholar 

  4. Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv (CSUR) 44(1):1

    Article  MATH  Google Scholar 

  5. Sartori FA (2009) comparison of methods and techniques for ontological query expansion. In: Research conference on metadata and semantic research. Springer, pp 203–214

  6. Natsev AP, Haubold A, Tešić J, Xie L, Yan R (2007) Semantic concept-based query expansion and re-ranking for multimedia retrieval. In: Proceedings of the 15th ACM international conference on multimedia. ACM, pp 991–1000

  7. Mahdabi P, Popescu-Belis A (2016) Comparing two strategies for query expansion in a news monitoring system. In: Métais E, Meziane F, Saraee M, Sugumaran V, Vadera S (eds) Natural language processing and information systems: 21st international conference on applications of natural language to information systems, NLDB 2016, Salford, UK, June 22–24, 2016, Proceedings. Springer International Publishing, Cham, pp 267–275. https://doi.org/10.1007/978-3-319-41754-7_24

  8. Carpineto C, De Mori R, Romano G, Bigi B (2001) An information-theoretic approach to automatic query expansion. ACM Trans Inf Syst (TOIS) 19(1):1–27

    Article  Google Scholar 

  9. Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing? Int J Hum Comput Stud 43(5):907–928. https://doi.org/10.1006/ijhc.1995.1081

    Article  Google Scholar 

  10. Zarrouk M, Lafourcade M, Joubert A (2014) About inferences in a crowdsourced lexical-semantic network. EACL 2014:174

    Google Scholar 

  11. Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to WordNet: an on-line lexical database. Int J Lexicogr 3(4):235–244

    Article  Google Scholar 

  12. Qiu Y, Frei H-P (1993) Concept based query expansion. In: Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 160–169

  13. Salton G (1968) Automatic Information Organization and Retrieval. McGraw Hill Text, New York

    Google Scholar 

  14. Pinto FJ, Martinez AF, Perez-Sanjulian CF (2008) Joining automatic query expansion based on thesaurus and word sense disambiguation using WordNet. IJCAT 33:271–279

    Article  Google Scholar 

  15. Liu S, Ni Y, Mei J, Li H, Xie GT, Hu G, Liu H, Hou X, Pan Y (2009) iSMART: Ontology-based semantic query of CDA documents

  16. Mihalcea R, Moldovan D (2000) Semantic indexing using WordNet senses. In: Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th annual meeting of the association for computational linguistics-Vol 11. Association for Computational Linguistics, pp 35–45

  17. Kara S, Alan Ö, Sabuncu O, Akpınar S, Cicekli NK, Alpaslan FN (2012) An ontology-based retrieval system using semantic indexing. Inf Syst 37(4):294–305. https://doi.org/10.1016/j.is.2011.09.004

    Article  Google Scholar 

  18. Zhai J, Zhou K (2010) Semantic retrieval for sports information based on ontology and SPARQL. In: 2010 international conference of information science and management engineering (ISME). IEEE, pp 395–398

  19. Nguyen H-M, Tran K-N, Vo X-V (2015) GeTFIRST: ontology-based keyword search towards semantic disambiguation. Int J Web Inf Syst 11(4):442–467. https://doi.org/10.1108/ijwis-06-2015-0019

    Article  Google Scholar 

  20. Li H, Xu J (2014) Semantic Matching in Search. Foundations and Trends®. Inf Retr 7(5):343–469. https://doi.org/10.1561/1500000035

    MathSciNet  Google Scholar 

  21. Macdonald C, Ounis I (2007) Expertise drift and query expansion in expert search. In: Proceedings of the sixteenth ACM conference on information and knowledge management. ACM, pp 341–350

  22. Mahler D (2004) Holistic query expansion using graphical models. New Dir Quest Answ 2004:203–227

    Google Scholar 

  23. Han L, Chen G (2009) HQE: a hybrid method for query expansion. Expert Syst Appl 36(4):7985–7991

    Article  Google Scholar 

  24. Zhixiao Wang QN (2012) Research on hybrid query expansion algorithm. Int J Hybrid Inf Technol 5(2):207–212

    Google Scholar 

  25. Jiyeon C, Youkyoung P, Mun Y (2016) A hybrid method for retrieving medical documents with query expansion. In: 2016 international conference on big data and smart computing (BigComp), 18–20 Jan. 2016. pp 411–414. https://doi.org/10.1109/bigcomp.2016.7425959

  26. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM press, New York

    Google Scholar 

  27. Zohar H, Liebeskind C, Schler J, Dagan I (2013) Automatic thesaurus construction for cross generation corpus. J Comput Cultural Heritage (JOCCH) 6(1):4

    Google Scholar 

  28. Jing Y, Croft WB (1994) An association thesaurus for information retrieval. In: Intelligent multimedia information retrieval systems and management-Vol 1. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE, pp 146–160

  29. Park LAF, Ramamohanarao K (2007) Query expansion using a collection dependent probabilistic latent semantic thesaurus. In: Zhou Z-H, Li H, Yang Q (eds) Advances in Knowledge Discovery and Data Mining: 11th Pacific-Asia Conference, PAKDD 2007, Nanjing, China, May 22–25, 2007. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 224–235. https://doi.org/10.1007/978-3-540-71701-0_24

  30. Hu J, Deng W, Guo J (2006) Improving retrieval performance by global analysis. In: 18th international conference on pattern recognition, 2006. ICPR 2006. IEEE, pp 703–706

  31. Xu Y, Jones GJ, Wang B (2009) Query dependent pseudo-relevance feedback based on wikipedia. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 59–66

  32. Xu J, Croft WB (1996) Query expansion using local and global document analysis. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 4–11

  33. Rocchio JJ (1971) Relevance feedback in information retrieval. Prentice Hall, Uppper Saddle River

    Google Scholar 

  34. Buckley C, Salton G, Allan J, Singhal A (1995) Automatic query expansion using SMART: TREC 3. NIST special publication sp: 69–69

  35. Bernardini A, Carpineto C (2008) Fub at trec 2008 relevance feedback track: extending Rocchio with distributional term analysis. DTIC Document

  36. Efron M (2008) Query expansion and dimensionality reduction: notions of optimality in Rocchio relevance feedback and latent semantic indexing. Inf Process Manag 44(1):163–180

    Article  MathSciNet  Google Scholar 

  37. Ermakova L, Mothe J (2016) Query expansion by local context analysis. In: CORIA-CIFED, pp 235–250

  38. Miao J, Huang JX, Ye Z (2012) Proximity-based Rocchio’s model for pseudo relevance. In: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 535–544

  39. Willett P (1988) Recent trends in hierarchic document clustering: a critical review. Inf Process Manag 24(5):577–597

    Article  Google Scholar 

  40. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39:1–38

    MathSciNet  MATH  Google Scholar 

  41. Metzler D, Croft WB (2007) Latent concept expansion using markov random fields. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 311–318

  42. El-Hamdouchi A, Willett P (1986) Hierarchic document classification using Ward’s clustering method. In: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 149–156

  43. Gelfer Kalmanovich I, Kurland O (2009) Cluster-based query expansion. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 646–647

  44. Liu Z, Natarajan S, Chen Y (2011) Query expansion based on clustered results. Proc VLDB Endow 4(6):350–361

    Article  Google Scholar 

  45. Oh H-S, Jung Y (2015) Cluster-based query expansion using external collections in medical information retrieval. J Biomed Inform 58:70–79

    Article  Google Scholar 

  46. Cui H, Wen J-R, Nie J-Y, Ma W-Y (2002) Query expansion for short queries by mining user logs. IEEE Trans Knowl Data Eng 15(4):829–839

    Google Scholar 

  47. Xue G-R, Zeng H-J, Chen Z, Yu Y, Ma W-Y, Xi W, Fan W (2004) Optimizing web search using web click-through data. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management. ACM, pp 118–126

  48. Zhu Y, Gruenwald L (2005) Query expansion using web access log files. In: International conference on database and expert systems applications. Springer, pp 686–695

  49. Tannebaum W, Mahdabi P, Rauber A (2015) Effect of log-based query term expansion on retrieval effectiveness in patent searching. In: International conference of the cross-language evaluation forum for European languages. Springer, Berlin, pp 300–305

  50. Yin Z, Shokouhi M, Craswell N (2009) Query expansion using external evidence. In: European conference on information retrieval. Springer, Berlin, pp 362–374

  51. Sun J-T, Zeng H-J, Liu H, Lu Y, Chen Z (2005) Cubesvd: a novel approach to personalized web search. In: Proceedings of the 14th international conference on world wide web. ACM, pp 382–390

  52. Gauch S, Speretta M, Chandramouli A, Micarelli A (2007) User profiles for personalized information access. In: Brusilovsky P, Kobsa A, Neidl W (eds) The adaptive web: methods and strategies of web personalization. Springer, Berlin, pp 54–89

    Chapter  Google Scholar 

  53. Ahmadian N, Nematbakhsh MA, Vahdat-Nejad H (2011) A context aware approach to semantic query expansion. In: Proceedings of the 2011 international conference on innovations in information technology (IIT). IEEE, pp 57–60

  54. Jiang D, Pei J, Li H (2013) Mining search and browse logs for web search: a survey. ACM Trans Intell Syst Technol (TIST) 4(4):57

    Google Scholar 

  55. Zhu Z, Xu J, Ren X, Tian Y, Li L (2007) Query expansion based on a personalized web search model. In: Proceedings of the third international conference on semantics, knowledge and grid. IEEE, pp 128–133

  56. Cao H, Jiang D, Pei J, Chen E, Li H (2009) Towards context-aware search by learning a very large variable length hidden Markov model from search logs. In: Proceedings of the 18th international conference on World Wide Web. ACM, pp 191–200

  57. Boughareb D, Farah N (2013) A query expansion approach using the context of the search. In: van Berlo A, Hallenborg K, Corchado Rodríguez JM, Tapia DI, Novais P (eds) Ambient intelligence-software and applications. Springer, Berlin, pp 57–63

    Chapter  Google Scholar 

  58. Agichtein E, Zheng Z (2006) Identifying best bet web search results by mining past user behavior. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 902–908

  59. Agichtein E, Brill E, Dumais S (2006) Improving web search ranking by incorporating user behavior information. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 19–26

  60. Ruthven I, Lalmas M, Van Rijsbergen K (2003) Incorporating user search behavior into relevance feedback. J Am Soc Inform Sci Technol 54(6):529–549

    Article  Google Scholar 

  61. Gao J, Cao G, He H, Zhang M, Nie J-Y, Walker S (2001) Robertson SE TREC-10 web track experiments at MSRA. In: TREC

  62. Kraft R, Zien J (2004) Mining anchor text for query refinement. In: Proceedings of the 13th international conference on world wide web. ACM, pp 666–674

  63. Eiron N, McCurley KS (2003) Analysis of anchor text for web search. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 459–460

  64. Dang V, Croft BW (2010) Query reformulation using anchor text. In: Proceedings of the third ACM international conference on web search and data mining. ACM, pp 41–50

  65. Kaptein R, Kamps J (2008) Finding entities in Wikipedia using links and categories. In: International workshop of the initiative for the evaluation of XML retrieval. Springer, pp 273–279

  66. Guisado-Gámez J, Prat-Pérez A (2015) Understanding graph structure of wikipedia for query expansion. In: Proceedings of the GRADES’15. ACM, p 6

  67. Xiong C, Callan J (2015) Query expansion with freebase. In: Proceedings of the 2015 international conference on the theory of information retrieval. ACM, pp 111–120

  68. Ronen I, Shahar E, Ur S, Uziel E, Yogev S, Zwerdling N, Carmel D, Guy I, Har’El N, Ofek-Koifman S (2009) Social networks and discovery in the enterprise (SaND). In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 836–836

  69. Biancalana C, Gasparetti F, Micarelli A, Sansonetti G (2013) Social semantic query expansion. ACM Trans Intell Syst Technol (TIST) 4(4):60

    Google Scholar 

  70. Zhou D, Lawless S, Wade V (2012) Web search personalization using social data. In: International conference on theory and practice of digital libraries. Springer, Berlin, pp 298–310

  71. Bao S, Xue G, Wu X, Yu Y, Fei B, Su Z (2007) Optimizing web search using social annotations. Paper presented at the proceedings of the 16th international conference on World Wide Web, Banff, Alberta, Canada

  72. Biancalana C (2009) Social tagging for personalized web search. In: Serra R, Cucchiara R (eds) AI*IA 2009: emergent perspectives in artificial intelligence: 14th international conference of the Italian association for artificial intelligence Reggio Emilia, Italy, December 9–12, 2009 Proceedings. Springer, Berlin, pp 232–242. https://doi.org/10.1007/978-3-642-10291-2_24

  73. Kuzi S, Carmel D, Libov A, Raviv A (2017) Query expansion for email search. Paper presented at the Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan

  74. Chen C, Chunyan H, Xiaojie Y (2012) Relevance feedback fusion via query expansion. Paper presented at the proceedings of the the 2012 IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technology, vol 03

  75. Zhu D, Wu S, Carterette B, Liu H (2014) Using large clinical corpora for query expansion in text-based cohort identification. J Biomed Inform 49(Supplement C):275–281. https://doi.org/10.1016/j.jbi.2014.03.010

    Article  Google Scholar 

  76. Abdulla AAA, Lin H, Xu B, Banbhrani SK (2016) Improving biomedical information retrieval by linear combinations of different query expansion techniques. BMC Bioinform 17(S-7):238. https://doi.org/10.1186/s12859-016-1092-8

    Article  Google Scholar 

  77. Wu H, Fang H (2013) An incremental approach to efficient pseudo-relevance feedback. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 553–562

  78. Alarfaj F, Kruschwitz U, Fox C (2015) Experiments with query expansion for entity finding. In: Gelbukh A (ed) Proceedings of the 16th international conference computational linguistics and intelligent text processing CICLing 2015, Cairo, Egypt, April 14–20, 2015, Proceedings, Part II. Springer International Publishing, Cham, pp 417–426. https://doi.org/10.1007/978-3-319-18117-2_31

  79. Lin Y, Xu B, Li L, Lin H, Xu K (2017) Social annotation for query expansion learning from multiple expansion strategies. In: Cheng X, Ma W, Liu H, Shen H, Feng S, Xie X (eds) Proceedings of the 6th national conference social media processing, SMP 2017, Beijing, China, September 14–17, 2017. Springer Singapore, Singapore, pp 181–192. https://doi.org/10.1007/978-981-10-6805-8_15

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Ahsan Raza.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raza, M.A., Mokhtar, R. & Ahmad, N. A survey of statistical approaches for query expansion. Knowl Inf Syst 61, 1–25 (2019). https://doi.org/10.1007/s10115-018-1269-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1269-8

Keywords

Navigation