ABSTRACT
We present a way of estimating term weights for Information Retrieval (IR), using term co-occurrence as a measure of dependency between terms.We use the random walk graph-based ranking algorithm on a graph that encodes terms and co-occurrence dependencies in text, from which we derive term weights that represent a quantification of how a term contributes to its context. Evaluation on two TREC collections and 350 topics shows that the random walk-based term weights perform at least comparably to the traditional tf-idf term weighting, while they outperform it when the distance between co-occurring terms is between 6 and 30 terms.
- G. Erkan and D. Radev. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. In Journal of Artificial Intelligence Research. 22, 457--479, 2004. Google ScholarDigital Library
- S. Hassan and C. Banea. Random-Walk Term Weighting for Improved Text Classification. In Proceedings of TextGraphs: 2nd Workshop on Graph Based Methods for Natural Language Processing. ACL, 53--60, 2006. Google ScholarDigital Library
- R. Mihalcea and P.Tarau. TextRank: Bringing Order into Texts. In Proceedings of Empirical Methods in Natural Language Processing. ACL, 404--411, 2006.Google Scholar
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
Index Terms
- Random walk term weighting for information retrieval
Recommendations
Fixed versus dynamic co-occurrence windows in TextRank term weights for information retrieval
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalTextRank is a variant of PageRank typically used in graphs that represent documents, and where vertices denote terms and edges denote relations between terms. Quite often the relation between terms is simple term co-occurrence within a fixed window of k ...
Information Retrieval by Modified Term Weighting Method Using Random Walk Model with Query Term Position Ranking
ICSPS '09: Proceedings of the 2009 International Conference on Signal Processing SystemsTerm weighting is a core idea behind any information retrieval technique which has crucial importance in document ranking. In graph based ranking algorithm, terms within a document are represented as a graph of that document. Term weights for ...
Term weighting for information retrieval based on term's discrimination power
One of the most important research topics in Information Retrieval is term weighting for document ranking and retrieval, such as TFIDF, BM25, etc. We propose a term weighting method that utilizes past retrieval results consisting of the queries that ...
Comments