Skip to main content

Evaluating Document-to-Document Relevance Based on Document Language Model: Modeling, Implementation and Performance Evaluation

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Abstract

To evaluate document-to-document relevance is very important to many advanced applications such as IR, text mining and natural language processing. Since it is very hard to define document relevance in a mathematic way on account of users’ uncertainty, the concept of topical relevance is widely accepted by most of research fields. It suggests that a document relevance model should explain whether the document representation describes its topical contents and the matching method reveals the topical differences among the documents. However, the current document-to-document relevance models, such as vector space model, string distance, don’t put explicitly emphasis on the perspective of topical relevance. This paper exploits a document language model to represent the document topical content and explains why it can reveal the document topics and then establishes two distributional similarity measure based on the document language model to evaluate document-to-document relevance. The experiment on the TREC testing collection is made to compare it with the vector space model, and the results show that the Kullback-Leibler divergence measure with Jelinek-Mercer smoothing outperforms the vector space model significantly.

Supported by the National Natural Science Foundation of China under Grant No.60173051 and the Teaching and Research Award Program for Outstanding Young Teachers in Higher Education Institution of the Ministry of Education, China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abdollahzadeh, A.: Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition. Computer Science Dept., University of Maryland Techni-cal Reports TR-CS-4291 (2002)

    Google Scholar 

  2. Saracevic, T.: Relevance Reconsidered. In: Ingwersen, P., Pors, N.O. (eds.) Information Science: Integration in Perspective (1996)

    Google Scholar 

  3. Wang, J.: The Relevance in Information Retrieval. Modern Foreign Languages 24(2) (2001)

    Google Scholar 

  4. Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: Proc. of SIGIR 2001 (2001)

    Google Scholar 

  5. Gibson, W.: Pattern Recognition. Academic Press, London (2003)

    Google Scholar 

  6. Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: Proceedings of SIGIR 1998 (1998)

    Google Scholar 

  7. Miller, D., Leek, T., Schwartz, R.M.: A hidden Markov Model Information Retrieval System. In: Proc. of SIGIR 1999 (1999)

    Google Scholar 

  8. Zaragoza, H., Hiemstra, D., Tipping, M.: Bayesian Extension to The Lan-guage Model for Ad Hoc Information Retrieval. In: Proc. of SIGIR 2003 (2003)

    Google Scholar 

  9. Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimiza-tion for Information Retrieval. In: Proc. of SIGIR 2001 (2001)

    Google Scholar 

  10. Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: Proc. of SIGIR 1999 (1999)

    Google Scholar 

  11. Levenshtein, V.I.: Binary Codes Capable of Correcting Spurious Insertions and Deletions of Ones. Problems of Information Transmission (1965)

    Google Scholar 

  12. Yianilos, P.: Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces. In: Proc. of the 4th ACM-SIAM Symposium on Discrete Algorithms (1993)

    Google Scholar 

  13. Yianilos, P.: The Likeit Intelligent String Comparison Facility. NEC Institute Tech. Report 97-093 (1997)

    Google Scholar 

  14. Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall Inc., Englewood Cliffs (1971)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yu, G., Li, X., Bao, Y., Wang, D. (2005). Evaluating Document-to-Document Relevance Based on Document Language Model: Modeling, Implementation and Performance Evaluation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30586-6_63

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24523-0

  • Online ISBN: 978-3-540-30586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics