Skip to main content

Improving Effectiveness of Query Expansion Using Information Theoretic Approach

  • Conference paper
Trends in Applied Intelligent Systems (IEA/AIE 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6097))

Abstract

Automatic Query expansion is a well-known method to improve the performance of information retrieval systems. In this paper we have suggested information theoretic measures to improve efficiency of co-occurrence based automatic query expansion. We have used pseudo relevance feedback based local approach. The expansion terms were selected from the top N documents using co-occurrence based approach. They were then ranked using two different information theoretic approaches. First one is standard Kullback-Leibler divergence (KLD). As a second measure we have suggested use of a variant KLD. Experiments were performed on TREC-1 dataset. The result suggests that there is a scope of improving co-occurrence based query expansion by using information theoretic measures. Extensive experiments were done to select two important parameters: number of top N documents to be used and number of terms to be used for expansion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lee, C.J., Lin, Y.C., Chen, R.C., Cheng, P.J.: Selecting effective terms for query formulation. In: Proc. of the Fifth Asia Information Retrieval Symposium (2009)

    Google Scholar 

  2. Van Rijsbergen, C.J.: A theoretical basis for the use of cooccurrence data in information retrieval. Journal of Documentation (33), 106–119 (1977)

    Google Scholar 

  3. Carpineto, C., Romano, G.: TREC-8 Automatic Ad-Hoc Experiments at Fondazione Ugo Bordoni,TREC (1999)

    Google Scholar 

  4. Croft, W.B., Harper, D.J.: Using probabilistic models of document retrieval without relevance information. Journal of Documentation 35, 285–295 (1979)

    Article  Google Scholar 

  5. Carmel, D., Yom-Tov, E., Soboroff, I.: SIGIR Workshop Report: Predicting query difficulty – methods and applications. In: Proc. of the ACM SIGIR 2005 Workshop on Predicting Query Difficulty – Methods and Applications, pp. 25–28 (2005)

    Google Scholar 

  6. Voorhees, E.M.: Query expansion using lexical semantic relations. In: Proceedings of the 1994 ACM SIGIR Conference on Research and Development in Information Retrieval (1994)

    Google Scholar 

  7. Efthimiadis, E.N.: Query expansion. Annual Review of Information Systems and Technology 31, 121–187 (1996)

    Google Scholar 

  8. Voorhees, E.M.: Overview of the TREC 2003 robust retrieval track. In: TREC, pp. 69–77 (2003)

    Google Scholar 

  9. Voorhees, E.M.: The TREC 2005 robust track. SIGIR Forum 40(1), 41–48 (2006)

    Article  Google Scholar 

  10. Voorhees, E.M.: The TREC robust retrieval track. SIGIR Forum 39(1), 11–20 (2005)

    Article  Google Scholar 

  11. Cao, G., Nie, J.Y., Gao, J.F., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proc. of 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 243–250 (2008)

    Google Scholar 

  12. Imran, H., Sharan, A.: Thesaurus and Query Expansion. International journal of computer science & information Technology (IJCSIT) 1(2), 89–97 (2009)

    Google Scholar 

  13. Harper, D.J., van Rijsbergen, C.J.: Evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 189–216 (1978)

    Article  Google Scholar 

  14. Peat, H.J., Willett, P.: The limitations of term co-occurrence data for query expansion in document retrieval systems. JASIS 42(5), 378–383 (1991)

    Article  Google Scholar 

  15. Schütze, H., Pedersen, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. Inf. Process. Manage 33(3), 307–318 (1997)

    Article  Google Scholar 

  16. Jing, Y., Croft, W.B.: An association thesaurus for information retrieval. In: 4th International Conference on Proceedings of RIAO 1994, New York, US, pp. 146–160 (1994)

    Google Scholar 

  17. Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. 18(1), 79–112 (2000)

    Article  Google Scholar 

  18. Lesk, M.E.: Word-word associations in document retrieval systems. American Documentation 20, 27–38 (1969)

    Article  Google Scholar 

  19. Stairmand, M.A.: Textual context analysis for information retrieval. In: Proceedings of the 1997 ACM SIGIR Conference on Research and Development in Information Retrieval (1997)

    Google Scholar 

  20. Porter, M.F.: An algorithm for suffix stripping. Program - automated library and information systems 14(3), 130–137 (1980)

    Article  Google Scholar 

  21. Maron, M.E., Kuhns, J.K.: On relevance, probabilistic indexing and information retrieval. Journal of rhe ACM 7, 216–244 (1960)

    Article  Google Scholar 

  22. Minker, J., Wilson, G.A., Zimmerman, B.H.: Query expansion by the addition of clustered terms for a document retrieval system. Information Storage and Retrieval 8, 329–348 (1972)

    Article  Google Scholar 

  23. Ruch, P., Tbahriti, I., Gobeill, J., Aronson, A.R.: Argumentative feedback: A linguistically-motivated term expansion for information retrieval. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pp. 675–682 (2006)

    Google Scholar 

  24. Mandala, R., Tokunaga, T., Tanaka, H.: Combining multiple evidence from different types of thesaurus for query expansion. In: Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval (1999)

    Google Scholar 

  25. Mandala, R., Tokunaga, T., Tanaka, H.: Ad hoc retrieval experiments using wornet and automatically constructed theasuri. In: Proceedings of the seventh Text REtrieval Conference, TREC7 (1999)

    Google Scholar 

  26. Robertson, S.E., Sparck Jones, K.: Relevance weighting of search terms. Journal of the American Society of Informarion Science 21, 129–146 (1976)

    Article  Google Scholar 

  27. Liu, S., Liu, F., Yu, C., Meng, W.: An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In: Proceedings of the 2004 ACM SIGIR Conference on Research and Development in Information Retrieval (2004)

    Google Scholar 

  28. Smeaton, A.F.: The retrieval effects of query expansion on a feedback document retrieval system, University College Dublin, MSc thesis (1982)

    Google Scholar 

  29. Smeaton, A.F., van Rijsbergen, C.J.: The retrieval effects of query expansion on a feedback document retrieval system. Computer Journal 26, 239–246 (1983)

    Article  Google Scholar 

  30. Sparck Jones, K.: Automatic keyword classification for information retrieval. Butterworth, London (1971)

    Google Scholar 

  31. Van Rijsbergen, C.J., Harper, D.J., Porter, M.F.: The selection of good search terms. Information Processing and Management 17, 77–91 (1981)

    Article  Google Scholar 

  32. Qiu, Y., Frei, H.-P.: Concept based query expansion. In: SIGIR, pp. 160–169 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Imran, H., Sharan, A. (2010). Improving Effectiveness of Query Expansion Using Information Theoretic Approach. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13025-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13025-0_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13024-3

  • Online ISBN: 978-3-642-13025-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics