Abstract
Pseudo Relevance feedback (PRF) based query expansion approaches assumes that the top ranked retrieved documents are relevant. But this assumption is not always true; it may also possible that a PRF document may contain different topics, which may or may not be relevant to the query terms even if the documents are judged relevant. In this paper our focus is to capture the limitation of PRF based query expansion and propose a hybrid method to improve the performance of PRF based query expansion by combining corpus based term co-occurrence information and semantic information of term. Firstly, the paper suggest use of corpus based term co-occurrence approach to select an optimal combination of query terms from a pool of terms obtained using PRF based query expansion. Second, we use semantic similarity approach to rank the query expansion terms obtained from top feedback documents. Third, we combine co-occurrence and semantic similarity together to rank the query expansion terms obtained from first step on the basis of semantic similarity. The experiments were performed on FIRE ad hoc and TREC-3 benchmark datasets of information retrieval. The results show significant improvement in terms of precision, recall and mean average precision (MAP). This experiments shows that the combination of both techniques in an intelligent way gives us goodness of both of them. As this is the first attempt in this direction there is a large scope of improving these techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Van Rijsbergen, C.J.: A theoretical basis for the use of co-occurrence data in information Retrieval. Journal of Documentation 33, 106–119 (1977)
Robertson, S.E., Walker, S., Beaulieu, M.H.: Okapi at TREC-7. In: Proceedings of the Seventh Text REtrieval Conference. Gaithersburg, USA (1998)
Kobayakawa, M., Kinjo, S., Hoshi, M., Ohmori, T., Yamamoto, A.: Fast Computation of Similarity Based on Jaccard Coefficient for Composition-Based Image Retrieval. In: Muneesawang, P., Wu, F., Kumazawa, I., Roeksabutr, A., Liao, M., Tang, X. (eds.) PCM 2009. LNCS, vol. 5879, pp. 949–955. Springer, Heidelberg (2009)
Miller, G.A., Beckwith, R., Fellbaum, C.D., Gross, D., Miller, K.: WordNet: An online lexical database. Int. J. Lexicograph. 3(4), 235–244 (1990)
Resnik, P.: Semantic Similarity in Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Wu, Z., Palmer, M.: Verb Semantics and Lexical Selection. In: Annual Meeting of the Associations for Computational Linguistic, Las Cruces, New, Mexico, pp. 133–138 (1994)
Leacock, C., Miller, G.A., Chodorow, M.: Combining Local Context and WordNet Similarity for Word Sense Identification. Journal of Computational Linguistic, 265–283 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Singh, J., Sharan, A. (2015). Co-occurrence and Semantic Similarity Based Hybrid Approach for Improving Automatic Query Expansion in Information Retrieval. In: Natarajan, R., Barua, G., Patra, M.R. (eds) Distributed Computing and Internet Technology. ICDCIT 2015. Lecture Notes in Computer Science, vol 8956. Springer, Cham. https://doi.org/10.1007/978-3-319-14977-6_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-14977-6_45
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14976-9
Online ISBN: 978-3-319-14977-6
eBook Packages: Computer ScienceComputer Science (R0)