Abstract
Apart from the frequency of terms in a document collection, the distribution of words plays an important role in determining the relevance of documents for a given search query. In this paper, word distribution analysis as a novel approach for using descriptive statistics to calculate a compressed representation of word positions in a document corpus is introduced. Based on this statistical approximation, two methods for improving the evaluation of document relevance are proposed: (a) a relevance ranking procedure based on how query terms are distributed over initially retrieved documents, and (b) a query expansion technique based on overlapping the distributions of terms in the top-ranked documents. Experimental results obtained for the TREC-8 document collection demonstrate that the proposed approach leads to an improvement of about 6.6% over the term frequency/inverse document frequency weighting scheme without applying query reformulation or relevance feedback techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berry, M., Dumais, S., O’Brien, G.: Using linear algebra for intelligent information retrieval. Technical Report UT-CS-94-270, SIAM Review (1994)
Huang, X., Huang, Y.: Using contextual information to improve retrieval performance. In: Proceedings of 2005 IEEE International Conference on Granular Computing, July 2005, Beijing, China (2005)
Lawrence, S.: Context in web search. IEEE Data Engineering Bulletin 23(3), 25–32 (2000)
Shen, X., Zhai, C.: Exploiting query history for document ranking in interactive information retrieval. In: 26th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 377–378. ACM Press, New York (2003)
Finkelstein, L., et al.: Placing search in context: the concept revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)
Razek, M.A., Frasson, C., Kaltenbach, M.: Context - based information agent for supporting intelligent distance learning environment. In: Proc. of the Twelfth International World Wide Web Conference, WWW 2003, Budapest, Hungary, p. 968. Springer, Heidelberg (2003)
Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41(4), 288–297 (1990)
Efthimiadis, E.: Interactive query expansion and relevance feedback for document retrieval systems. PhD thesis, City University, London UK (1992)
Buckley, C., Salton, G., Allan, J.: The effect of adding relevance information in a relevance feedback environment. In: 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, London, July 1994, pp. 292–300 (1994)
Robertson, S., Jones, K.S.: Relevance weighting of search terms. American Society for Information Sciences 27(3), 129–146 (1976)
Buckley, C., et al.: Automatic query expansion using smart: TREC-3. In: Overview of the 3rd Text Retrieval Conference, pp. 69–80. NIST Special Publication (1995)
Attar, R., Fraenkel, A.: Experiments in local metrical feedback in full-text retrieval systems. Information Processing and Management 17(3), 115–126 (1981)
Efthimiadis, E., Biron, P.: Ucla-okapi at TREC-2: Query expansion experiments. In: Proceedings of the 2nd Text Retrieval Conference (TREC-2), pp. 279–290. NIST Special Publication 500-215 (1994)
Xu, J., Croft, W.: Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18(1), 79–112 (2000)
Yu, S., et al.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of the 12th International Conference on World Wide Web, Budapest, pp. 11–18. ACM Press, New York (2003)
Hearst, M., Pedersen, G.: Reexamining the cluster hypothesis: scatterlgather on retrieval results. In: Proceedings of International ACM SIGIR Conference on Research and Development in IR, New York, pp. 76–84. ACM Press, New York (1996)
Fan, W., et al.: Tuning before feedback: combining ranking discovery and blind feedback for robust retrieval. In: Sheffield (ed.) Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in information Retrieval, July 2004, United Kingdom (2004)
Sun, R., Ong, C.H., Chua, T.S.: Mining dependency relations for query expansion in passage retrieval. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 382–389. ACM Press, New York (2006)
Cai, D., et al.: Block-based web search. In: SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 456–463. ACM Press, New York (2004)
Li, J., Guo, M., Tian, S.: A new approach to query expansion. Machine Learning and Cybernetics, 2302–2306 (August 2005)
Katz, S.M.: Distribution of content words and phrases in text and language modelling. Natural Language Engineering 2(1), 15–59 (1996)
Fernandez, M., Villemonte de La Clergerie, E., Vilares, M.: Knowledge acquisition through error-mining. In: Proc. of International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, Bulgaria, pp. 220–229 (2007)
Bookstein, A., Klein, S., Raita, T.: Clumping properties of content-bearing words. Journal of the American Society for Information Science 49(2), 102–114 (1998)
Tukey, J.W.: Exploratory Data Analysis. Series in Behavioral Science. Addison-Wesley, Reading (1977)
Efthimiadis, E.: Query expansion. Annual Review of Information Science and Technology (ARIST) (2), 121–187 (1996)
Billerbeck, B., et al.: Query expansion using associated queries. In: CIKM 2003: Proceedings of the 12th Int. Conference on Information and Knowledge Management, pp. 2–9. ACM Press, New York (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Galeas, P., Freisleben, B. (2008). Word Distribution Analysis for Relevance Ranking and Query Expansion. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_43
Download citation
DOI: https://doi.org/10.1007/978-3-540-78135-6_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)