One of the major challenges in Web search pertains to the correct interpretation of users’ intent. Query Expansion is one of the well-known approaches for determining the intent of the user by addressing the vocabulary mismatch problem. A limitation of the current query expansion approaches is that the relations between the query terms and the expanded terms is limited. In this paper, we capture users’ intent through query expansion. We build on earlier work in the area by adopting a pseudo-relevance feedback approach; however, we advance the state of the art by proposing an approach for feature learning within the process of query expansion. In our work, we specifically consider the Wikipedia corpus as the feedback collection space and identify the best features within this context for term selection in two supervised and unsupervised models. We compare our work with state of the art query expansion techniques, the results of which show promising robustness and improved precision.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aha, D.W., & Bankert, R.L. (1996). A comparative evaluation of sequential feature selection algorithms. In Learning from data (pp. 199–206). Springer.
Al-Shboul, B., & Myaeng, S.H. (2011). Query phrase expansion using wikipedia in patent class search. In Information retrieval technology (pp. 115126). Springer.
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). Dbpedia: a nucleus for a web of open data. Springer.
Bendersky, M., Metzler, D., & Croft, W.B. (2012). Effective query formulation with multiple information sources. In Proceedings of the fifth ACM international conference on web search and data mining, ACM (pp. 443–452).
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM (pp. 1247–1250).
Bruce, C., Gao, X., Andreae, P., & Jabeen, S. (2012). Query expansion powered by wikipedia hyperlinks. In AI 2012: advances in artificial intelligence (pp. 421–432). Springer.
Buckley, C., & Voorhees, E.M. (2004). Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 25–32).
Carpineto, C., De Mori, R., Romano, G., & Bigi, B. (2001). An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems (TOIS), 19(1), 1–27.
Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR), 44(1), 1.
Chakaravarthy, V.T., Gupta, H., Roy, P., & Mohania, M. (2006). Efficiently linking text documents with relevant structured information. In Proceedings of the 32nd international conference on very large data bases, VLDB endowment (pp. 667–678).
Cheung, J.C.K., & Li, X. (2012). Sequence clustering and labeling for unsupervised query intent discovery. In Proceedings of the fifth ACM international conference on web search and data mining, ACM (pp. 383–392).
Crabtree, D.W., Andreae, P., & Gao, X. (2007). Exploiting underrepresented query aspects for automatic query expansion. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 191–200).
Crabtree, D.W., Andreae, P., & Gao, X. (2007). Exploiting underrepresented query aspects for automatic query expansion. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 191–200).
Craswell, N., & Szummer, M. (2007). Random walks on the click graph. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 239–246).
Croft, W.B., Metzler, D., & Strohman, T. (2010). Search engines: information retrieval in practice. Reading: Addison-Wesley.
Dalton, J., Dietz, L., & Allan, J. (2014). Entity query feature expansion using knowledge base links. In Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, ACM (pp. 365–374).
Dang, V., & Croft, B.W. (2010). Query reformulation using anchor text. In Proceedings of the third ACM international conference on web search and data mining, ACM (pp. 41–50).
Di Marco, A., & Navigli, R. (2013). Clustering and diversifying web search results with graph-based word sense induction. Computational Linguistics, 39(3), 709–754.
Doszkocs, T.E. (1978). Aid, an associative interactive dictionary for online searching. Online Review, 2(2), 163–173.
Fellbaum, C. (1998). Wordnet. Wiley Online Library.
Ferragina, P., & Scaiella, U. (2010). Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM international conference on information and knowledge management, ACM (pp. 1625–1628).
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157–1182.
Hatcher, E., & Gospodnetic, O. (2004). Lucene in action. Manning Publications. ISBN: 1932394281.
Hu, J., Wang, G., Lochovsky, F., Sun, J.t., & Chen, Z. (2009). Understanding user’s query intent with wikipedia. In Proceedings of the 18th international conference on world wide web, ACM (pp. 471–480).
Jain, A., & Zongker, D. (1997). Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2), 153–158.
Järvelin, K., & Kekäläinen, J. (2000). Ir evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 41–48).
Jovanovic, J., Bagheri, E., Cuzzola, J., Gasevic, D., Jeremic, Z., & Bashash, R. (2014). Automated semantic tagging of textual content. IT Professional, 16(6), 38–46.
Lavrenko, V., & Croft, W.B. (2001). Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 120–127).
Li, Y., Luk, W.P.R., Ho, K.S.E., & Chung, F.L.K. (2007). Improving weak ad-hoc queries using wikipedia asexternal corpus. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 797–798).
Liu, S., Liu, F., Yu, C., & Meng, W. (2004). An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 266–272).
Liu, X., Bouchoucha, A., Sordoni, A., & Nie, J.Y. (2014). Compact aspect embedding for diversified query expansions. In Proceedings of AAAI (Vol. 14, pp. 115–121).
Meij, E., Bron, M., Hollink, L., Huurnink, B., & De Rijke, M. (2009). Learning semantic query suggestions. The Semantic Web-ISWC, 2009, 424–440.
Mendes, P.N., Jakob, M., García-Silva, A., & Bizer, C. (2011). Dbpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th international conference on semantic systems, ACM (pp. 1–8).
Pass, G., Chowdhury, A., & Torgeson, C. (2006). A picture of search. In Infoscale (Vol. 152, p. 1).
Radlinski, F., Szummer, M., & Craswell, N. (2010). Inferring query intent from reformulations and clicks. In Proceedings of the 19th international conference on world wide web, ACM (pp. 1171–1172).
Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning.
Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. arXiv:cmp-lg/9511007.
Robertson, S.E., & Jones, K.S. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129–146.
Robertson, S.E., Walker, S., Beaulieu, M., & Willett, P. (1999). Okapi at trec-7: automatic ad hoc, filtering, vlc and interactive track. Nist Special Publication SP, 253–264.
Rocchio, J.J. (1971). Prentice-Hall series in automatic computation, relevance feedback in information retrieval. In G. Salton (Ed.), The SMART retrieval system: experiments in automatic document processing, chap 14 (pp. 313–323). Englewood Cliffs NJ: Prentice-Hall.
Ruiz, R., Riquelme, J.C., & Aguilar-Ruiz, J.S. (2008). Best agglomerative ranked subset for feature selection, FSDM (pp. 148–162).
Salton, G., & Buckley, C. (1997). Improving retrieval performance by relevance feedback. Readings in Information Retrieval, 24(5), 355–363.
Santamaría, C., Gonzalo, J., & Artiles, J. (2010). Wikipedia as sense inventory to improve diversity in web search results. In Proceedings of the 48th annual meeting of the association for computational linguistics, association for computational linguistics (pp. 1357–1366).
Spink, A., Wolfram, D., Jansen, M.B., & Saracevic, T. (2001). Searching the web: the public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234.
Xu, J., & Croft, W.B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems (TOIS), 18(1), 79–112.
Xu, Y., Jones, G.J., & Wang, B. (2009). Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 59–66).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Keikha, A., Ensan, F. & Bagheri, E. Query expansion using pseudo relevance feedback on wikipedia. J Intell Inf Syst 50, 455–478 (2018). https://doi.org/10.1007/s10844-017-0466-3
Issue Date:
DOI: https://doi.org/10.1007/s10844-017-0466-3