Skip to main content

Word Distribution Analysis for Relevance Ranking and Query Expansion

  • Conference paper
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Abstract

Apart from the frequency of terms in a document collection, the distribution of words plays an important role in determining the relevance of documents for a given search query. In this paper, word distribution analysis as a novel approach for using descriptive statistics to calculate a compressed representation of word positions in a document corpus is introduced. Based on this statistical approximation, two methods for improving the evaluation of document relevance are proposed: (a) a relevance ranking procedure based on how query terms are distributed over initially retrieved documents, and (b) a query expansion technique based on overlapping the distributions of terms in the top-ranked documents. Experimental results obtained for the TREC-8 document collection demonstrate that the proposed approach leads to an improvement of about 6.6% over the term frequency/inverse document frequency weighting scheme without applying query reformulation or relevance feedback techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berry, M., Dumais, S., O’Brien, G.: Using linear algebra for intelligent information retrieval. Technical Report UT-CS-94-270, SIAM Review (1994)

    Google Scholar 

  2. Huang, X., Huang, Y.: Using contextual information to improve retrieval performance. In: Proceedings of 2005 IEEE International Conference on Granular Computing, July 2005, Beijing, China (2005)

    Google Scholar 

  3. Lawrence, S.: Context in web search. IEEE Data Engineering Bulletin 23(3), 25–32 (2000)

    Google Scholar 

  4. Shen, X., Zhai, C.: Exploiting query history for document ranking in interactive information retrieval. In: 26th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 377–378. ACM Press, New York (2003)

    Google Scholar 

  5. Finkelstein, L., et al.: Placing search in context: the concept revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)

    Article  Google Scholar 

  6. Razek, M.A., Frasson, C., Kaltenbach, M.: Context - based information agent for supporting intelligent distance learning environment. In: Proc. of the Twelfth International World Wide Web Conference, WWW 2003, Budapest, Hungary, p. 968. Springer, Heidelberg (2003)

    Google Scholar 

  7. Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41(4), 288–297 (1990)

    Article  Google Scholar 

  8. Efthimiadis, E.: Interactive query expansion and relevance feedback for document retrieval systems. PhD thesis, City University, London UK (1992)

    Google Scholar 

  9. Buckley, C., Salton, G., Allan, J.: The effect of adding relevance information in a relevance feedback environment. In: 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, London, July 1994, pp. 292–300 (1994)

    Google Scholar 

  10. Robertson, S., Jones, K.S.: Relevance weighting of search terms. American Society for Information Sciences 27(3), 129–146 (1976)

    Article  Google Scholar 

  11. Buckley, C., et al.: Automatic query expansion using smart: TREC-3. In: Overview of the 3rd Text Retrieval Conference, pp. 69–80. NIST Special Publication (1995)

    Google Scholar 

  12. Attar, R., Fraenkel, A.: Experiments in local metrical feedback in full-text retrieval systems. Information Processing and Management 17(3), 115–126 (1981)

    Google Scholar 

  13. Efthimiadis, E., Biron, P.: Ucla-okapi at TREC-2: Query expansion experiments. In: Proceedings of the 2nd Text Retrieval Conference (TREC-2), pp. 279–290. NIST Special Publication 500-215 (1994)

    Google Scholar 

  14. Xu, J., Croft, W.: Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18(1), 79–112 (2000)

    Article  Google Scholar 

  15. Yu, S., et al.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of the 12th International Conference on World Wide Web, Budapest, pp. 11–18. ACM Press, New York (2003)

    Google Scholar 

  16. Hearst, M., Pedersen, G.: Reexamining the cluster hypothesis: scatterlgather on retrieval results. In: Proceedings of International ACM SIGIR Conference on Research and Development in IR, New York, pp. 76–84. ACM Press, New York (1996)

    Google Scholar 

  17. Fan, W., et al.: Tuning before feedback: combining ranking discovery and blind feedback for robust retrieval. In: Sheffield (ed.) Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in information Retrieval, July 2004, United Kingdom (2004)

    Google Scholar 

  18. Sun, R., Ong, C.H., Chua, T.S.: Mining dependency relations for query expansion in passage retrieval. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 382–389. ACM Press, New York (2006)

    Chapter  Google Scholar 

  19. Cai, D., et al.: Block-based web search. In: SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 456–463. ACM Press, New York (2004)

    Google Scholar 

  20. Li, J., Guo, M., Tian, S.: A new approach to query expansion. Machine Learning and Cybernetics, 2302–2306 (August 2005)

    Google Scholar 

  21. Katz, S.M.: Distribution of content words and phrases in text and language modelling. Natural Language Engineering 2(1), 15–59 (1996)

    Article  Google Scholar 

  22. Fernandez, M., Villemonte de La Clergerie, E., Vilares, M.: Knowledge acquisition through error-mining. In: Proc. of International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, Bulgaria, pp. 220–229 (2007)

    Google Scholar 

  23. Bookstein, A., Klein, S., Raita, T.: Clumping properties of content-bearing words. Journal of the American Society for Information Science 49(2), 102–114 (1998)

    Google Scholar 

  24. Tukey, J.W.: Exploratory Data Analysis. Series in Behavioral Science. Addison-Wesley, Reading (1977)

    MATH  Google Scholar 

  25. Efthimiadis, E.: Query expansion. Annual Review of Information Science and Technology (ARIST) (2), 121–187 (1996)

    Google Scholar 

  26. Billerbeck, B., et al.: Query expansion using associated queries. In: CIKM 2003: Proceedings of the 12th Int. Conference on Information and Knowledge Management, pp. 2–9. ACM Press, New York (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Galeas, P., Freisleben, B. (2008). Word Distribution Analysis for Relevance Ranking and Query Expansion. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78135-6_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78134-9

  • Online ISBN: 978-3-540-78135-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics