Abstract
In this paper, we address the problem of document re-ranking in information retrieval, which is usually conducted after initial retrieval to improve rankings of relevant documents. To deal with this problem, we propose a method which automatically constructs a term resource specific to the document collection and then applies the resource to document re-ranking. The term resource includes a list of terms extracted from the documents as well as their weighting and correlations computed after initial retrieval. The term weighting based on local and global distribution ensures the re-ranking not sensitive to different choices of pseudo relevance, while the term correlation helps avoid any bias to certain specific concept embedded in queries. Experiments with NTCIR3 data show that the approach can not only improve performance of initial retrieval, but also make significant contribution to standard query expansion.
Similar content being viewed by others
References
Balinski, J., & Danilowicz, C. (2005). Re-ranking method based on inter-document distance. Information Processing and Management, 41, 759–775.
Bear, J., Israel, D., Petit J., & Martin D. (1997). Using information extraction to improve document retrieval. Proceedings of TREC.
Chen, K., Chen, H., Kando, N., Kuriyama, K., Lee, S., Sung, H., et al. (2003). Overview of CLIR task at the third NTCIR workshop. Proceedings of NTCIR III.
Crouch, C., Crouch, D., Chen, Q., & Holtz, S. (2002). Improving the retrieval effectiveness of very short queries. Information Processing and Management, 38, 1–36.
Diaz, F. (2005). Regularizing ad hoc retrieval scores. Proceedings of CIKM.
Kamps, J. (2004). Improving retrieval effectiveness by reranking documents based on controlled vocabulary. Proceedings of ECIR.
Kurland, O., & Lee L. (2005). PageRank without hyper-links: Structural re-ranking using links induced by language models. Proceedings of the 28th ACM SIGIR.
Lee, K., Park, Y., & Choi, K. S. (2001). Document re-ranking model using clusters. Information Processing and Management, 37(1), 1–14.
Luk, R. W. P., & Wong, K. F. (2002) Pseudo-relevance feedback and title re-ranking for Chinese IR. Proceedings of NTCIR Workshop 4.
Mitra, M., Singhal A., & Buckley, C. (1998). Improving automatic query expansion. Proceedings of ACM SIGIR.
Qu, Y. L., Xu, G. W., & Wang J. (2000). Rerank method based on individual thesaurus. Proceedings of NTCIR2 Workshop.
Robertson, S. E., & Jones, K. S. (1977). Relevance weighting of search terms. Journal of the American Society for Information Science, 27.
Robertson, S. E., Walker, S., & Jones K. S. (1995). Okapi at TREC-3. Proceedings of TREC.
Rocchio, J. (1971). Relevant feedback in information retrieval. In G. Salton (Ed.), The smart retrieval system: Experiments in automatic document processing. Englewood Cliffs, NJ: Prentice-Hall.
Salton, G. (1968). Automatic information organization and retrieval. New York: McGraw Hill Text.
Schutze, H. (1998). The hypertext concordance: A better back-of-the-book index. Proceedings of First Workshop on Computational Terminology.
Tao, T., & Zhai. C. X., (2004). A mixture clustering model for pseudo feedback in information retrieval. Proceedings of the Meeting of the International Federation of Classification Societies.
Xu, J., & Croft, W. B. (1996). Query expansion using local and global document analysis. Proceedings of ACM SIGIR.
Xu, J., & Croft, W. B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112.
Yang, L. P., Ji D. H., & Tang L. (2004). Document re-ranking based on automatically acquired key terms in chinese information retrieval. Proceedings of 20th COLING.
Yang, L. P., Ji, D. H., & Zhou, G. D. (2006). Document re-ranking using cluster validation and label propagation. Proceedings of CIKM.
Yang, L. P., Ji, D. H., Zhou, G. D., & Nie, Y. (2005). Improving retrieval effectiveness by using key terms in top retrieved documents. Proceedings of 27th ECIR.
Zhai, C. X., & Lafferty, J. (2002). Two-stage language models for information retrieval. Proceedings of the 25th ACM SIGIR.
Zhang, B. Y., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., et al. (2005). Improving search results using affinity graph. Proceedings of the 28th ACM SIGIR Conference.
Author information
Authors and Affiliations
Corresponding author
Additional information
First author is supported by NSF (60773011), NSF(90820005), and first two authors are supported by Wuhan University 985 Project (985yk004).
Rights and permissions
About this article
Cite this article
Ji, D., Zhao, S. & Xiao, G. Chinese document re-ranking based on automatically acquired term resource. Lang Resources & Evaluation 43, 385–406 (2009). https://doi.org/10.1007/s10579-009-9106-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-009-9106-z