Skip to main content

Improving Retrieval Effectiveness by Using Key Terms in Top Retrieved Documents

  • Conference paper
Book cover Advances in Information Retrieval (ECIR 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3408))

Included in the following conference series:

Abstract

In this paper, we propose a method to improve the precision of top retrieved documents in Chinese information retrieval where the query is a short description by re-ordering retrieved documents in the initial retrieval. To re-order the documents, we firstly find out terms in query and their importance scales by making use of the information derived from top N(N<=30) retrieved documents in the initial retrieval; secondly, we re-order retrieved K(N<<K) documents by what kinds of terms of query they contain. That is, we first automatically extract key terms from top N retrieved documents, then we collect key terms that occur in query and their document frequencies in the N retrieved documents, finally we use these collected terms to re-order the initially retrieved documents. Each collected term is assigned a weight by its length and its document frequency in top N retrieved documents. Each document is re-ranked by the sum of weights of collected terms it contains. In our experiments on 42 query topics in NTCIR3 Cross Lingual Information Retrieval (CLIR) dataset, an average 17.8%-27.5% improvement can be made for top 10 documents and an average 6.6%-26.9% improvement can be made for top 100 documents at relax/rigid relevance judgment and different parameter setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bear, J., Israel, D., Petit, J., Martin, D.: Using Information Extraction to Improve Document Retrieval. In: Proceedings of the Sixth Text Retrieval Conference (1997)

    Google Scholar 

  2. Carpineto, C., Romano, G., Giannini, V.: Improving Retrieval Feedback with Multiple Term-Ranking Function Combination. ACM Transactions on Information Systems 20(3), 259–290 (2002)

    Article  Google Scholar 

  3. Fuhr, N.: Probabilistic Models in Information Retrieval. The Computer Journal 35(3), 243–254 (1992)

    Article  MATH  Google Scholar 

  4. Ji, D.H., Yang, L.P., Nie, Y.: Chinese Language IR Based on Term Extraction. In: The Third NTCIR Workshop (2002)

    Google Scholar 

  5. Ji, D.H., Yang, L.P., Nie, Y., Tang, L.: Online Discovery of Relevant Terms from Internet. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLPKE 2003), Beijing, China (October 2003)

    Google Scholar 

  6. Kamps, J.: Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary. In: The 21th European Conference on Information Retrieval (2004)

    Google Scholar 

  7. Kwok, K.L.: Comparing Representation in Chinese Information Retrieval. In: Proceedings of the ACM SIGIR 1997, pp. 34–41 (1997)

    Google Scholar 

  8. Lee, K., Park, Y., Choi, K.S.: Document Re-ranking Model Using Clusters. Information Processing and Management 37(1), 1–14 (2001)

    Article  MATH  Google Scholar 

  9. Li, P.: Research on Improvement of Single Chinese Character Indexing Method. Journal of the China Society for Scientific and Technical Information 18(5) (1999)

    Google Scholar 

  10. Mitra, M., Singhal, A., Buckley, C.: Improving Automatic Query Expansion. In: Proc. ACM SIGIR 1998 (August 1998)

    Google Scholar 

  11. Nie, J.Y., Gao, J., Zhang, J., Zhou, M.: On the Use of Words and N-grams for Chinese Information Retrieval. In: Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages, IRAL 2000, pp. 141–148 (2000)

    Google Scholar 

  12. Qu, Y.L., Xu, G.W., Wang, J.: Rerank Method Based on Individual Thesaurus. In: Proceedings of NTCIR2 Workshop (2000)

    Google Scholar 

  13. Robertson, S.E., Walker, S.: Microsoft Cambridge at TREC-9: Filtering track. In: NIST Special Pub. 500-264: The Eight Text Retrieval Conference (TREC-8), Gaithersburg, MD, pp. 151–161 (2001)

    Google Scholar 

  14. Robertson, S.E., Walker, S., Jones, S.: Okapi at TREC-2. In: The Second Text Retrieval Conference, TREC-2 (1994)

    Google Scholar 

  15. Salton, G., Mcgill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  16. Schutze, H.: The Hypertext Concordance: A Better Back-of-the-Book Index. In: Proceedings of First Workshop on Computational Terminology, pp. 101–104 (1998)

    Google Scholar 

  17. Vechtomova, O., Robertson, S.E., Jones, S.: Query Expansion With Long-Span Collocates. Information Retrieval 6(2), 251–273 (2003)

    Article  Google Scholar 

  18. Yang, L.P., Ji, D.H., Tang, L.: Document Re-ranking Based on Automatically Acquired Key Terms in Chinese Information Retrieval. In: Proceedings of 20th International Conference on Computational Linguistics, COLING (2004)

    Google Scholar 

  19. Yang, L.P., Ji, D.H., Tang, L.: Chinese Information Retrieval Based on Terms and Ontology. In: Proceedings of NTCIR4 Workshop (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lingpeng, Y., Donghong, J., Guodong, Z., Yu, N. (2005). Improving Retrieval Effectiveness by Using Key Terms in Top Retrieved Documents. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31865-1_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25295-5

  • Online ISBN: 978-3-540-31865-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics