Skip to main content

Window-Based Method for Information Retrieval

  • Conference paper
Natural Language Processing – IJCNLP 2004 (IJCNLP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

  • 1587 Accesses

Abstract

In this paper, a series of window-based methods is proposed for information retrieval. Compared with traditional tf-idf model, our approaches are based on two new key notions. The first one is that the closer the query words in a document, the larger the similarity value between the query and the document. And the second one is that some query words, like named entities and baseNP called “Core Words” are much more important than other words, and should have special weights. We implement the above notions by three models. They are Simple Window-based Model, Dynamic Window-based Model and Core Window-based Model. Our models can compute similarities between queries and documents based on the importance and distribution of query words in the documents. TREC data are used to test the algorithms. The experiments indicate that our window-based methods outperform most of the traditional methods, such as tf-idf and Okapi BM25. And the Core Window-based Model is the best and most robust model for various queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)

    Article  Google Scholar 

  2. Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. Text Retrieval Conference. NIST Specail Publication 500-246 (1999)

    Google Scholar 

  3. Greiff, W.R.: A theory of term weighting based on exploratory data analysis. In: Proceedings of SIGIR 1998 (1998)

    Google Scholar 

  4. Hiemstra, D.: A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries 3(2), 131–139 (2000)

    Article  Google Scholar 

  5. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  6. Church, K.W., Gale, W.A.: Inverse Document Frequency(IDF): A Measure of Deviations from Poisson. AT&T Bell Laboratories (1995)

    Google Scholar 

  7. Fujita, S.: Notes on Phrasal Indexing JSCB Evaluation Experiments at NTCIR AD HOC. In: Proceedings of NTCIR-1 workshop (1999)

    Google Scholar 

  8. Takenobu, T., Hironori, O., Hozumi, T.: Effectiveness of complex index term in information retrieval. In: The 6th RIAO Conference, pp.1322–1331 (2000)

    Google Scholar 

  9. Kaszkiel, et al.: Passage Retrieval Revisited. In: SIGIR 1997 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jin, Q., Zhao, J., Xu, B. (2005). Window-Based Method for Information Retrieval. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30211-7_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24475-2

  • Online ISBN: 978-3-540-30211-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics