Skip to main content

Efficient Top-k Document Retrieval Using a Term-Document Binary Matrix

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7097))

Abstract

Current web search engines perform well for “navigational queries.” However, due to their use of simple conjunctive Boolean filters, such engines perform poorly for “informational queries.” Informational queries would be better handled by a web search engine using an informational retrieval model along with a combination of enhancement techniques such as query expansion and relevance feedback, and the realization of such a engine requires a method to prosess the model efficiently. In this paper, we describe a novel extension of an existing top-k query processing technique. We add a simple data structure called a “term-document binary matrix,” resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the expanded technique achieves significant performance gains over existing techniques.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anh, V.N., Moffat, A.: Pruned query evaluation using pre-computed impacts. In: Proc. of SIGIR, pp. 372–379 (2006)

    Google Scholar 

  2. Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: IO-Top-k: index-access optimized top-k query processing. In: Proc. of VLDB, pp. 475–486 (2006)

    Google Scholar 

  3. Buckley, C., Lewit, A.: Optimization of inverted vector searches. In: Proc. of SIGIR, pp. 97–110 (1985)

    Google Scholar 

  4. Downey, D., Dumais, S., Liebling, D., Horvitz, E.: Understanding the relationship between searchers’ queries and information goals. In: Proc. of CIKM, pp. 449–458 (2008)

    Google Scholar 

  5. Fagin, R.: Combining Fuzzy Information: an Overview. SIGMOD Record 31(2), 109–118 (2002)

    Article  Google Scholar 

  6. Kumar, R., Punera, K., Suel, T., Vassilvitskii, S.: Top-k aggregation using intersections of ranked inputs. In: Proc. of WSDM, pp. 222–231 (2009)

    Google Scholar 

  7. Long, X., Suel, T.: Three-level caching for efficient query processing in large Web search engines. In: Proceedings of WWW, pp. 257–266 (2005)

    Google Scholar 

  8. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)

    Google Scholar 

  9. Persin, M., Zobel, J., Sacks-Davis, R.: Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science 47(10), 749–764 (1996)

    Article  Google Scholar 

  10. Schenkel, R., Broschart, A., Hwang, S.-w., Theobald, M., Weikum, G.: Efficient Text Proximity Search. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 287–299. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proc. of SIGIR, pp. 21–29 (1996)

    Google Scholar 

  12. Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proc. of WWW, pp. 387–396 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fujita, E., Oyama, K. (2011). Efficient Top-k Document Retrieval Using a Term-Document Binary Matrix. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds) Information Retrieval Technology. AIRS 2011. Lecture Notes in Computer Science, vol 7097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25631-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25631-8_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25630-1

  • Online ISBN: 978-3-642-25631-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics