Abstract
In this paper, a series of window-based methods is proposed for information retrieval. Compared with traditional tf-idf model, our approaches are based on two new key notions. The first one is that the closer the query words in a document, the larger the similarity value between the query and the document. And the second one is that some query words, like named entities and baseNP called “Core Words” are much more important than other words, and should have special weights. We implement the above notions by three models. They are Simple Window-based Model, Dynamic Window-based Model and Core Window-based Model. Our models can compute similarities between queries and documents based on the importance and distribution of query words in the documents. TREC data are used to test the algorithms. The experiments indicate that our window-based methods outperform most of the traditional methods, such as tf-idf and Okapi BM25. And the Core Window-based Model is the best and most robust model for various queries.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)
Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. Text Retrieval Conference. NIST Specail Publication 500-246 (1999)
Greiff, W.R.: A theory of term weighting based on exploratory data analysis. In: Proceedings of SIGIR 1998 (1998)
Hiemstra, D.: A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries 3(2), 131–139 (2000)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Church, K.W., Gale, W.A.: Inverse Document Frequency(IDF): A Measure of Deviations from Poisson. AT&T Bell Laboratories (1995)
Fujita, S.: Notes on Phrasal Indexing JSCB Evaluation Experiments at NTCIR AD HOC. In: Proceedings of NTCIR-1 workshop (1999)
Takenobu, T., Hironori, O., Hozumi, T.: Effectiveness of complex index term in information retrieval. In: The 6th RIAO Conference, pp.1322–1331 (2000)
Kaszkiel, et al.: Passage Retrieval Revisited. In: SIGIR 1997 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, Q., Zhao, J., Xu, B. (2005). Window-Based Method for Information Retrieval. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-30211-7_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)