Skip to main content

Fast document ranking for large scale information retrieval

  • Conference paper
  • First Online:
Book cover Applications of Databases (ADB 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 819))

Included in the following conference series:

Abstract

For large document databases, evaluation of ranked queries can be expensive in cpu time, memory usage, and disk traffic. It has been shown that memory usage can be dramatically reduced by use of a simple filtering heuristic that eliminates most documents from consideration. In this paper we show that, by designing inverted indexes explicitly to support filtering, cpu time and disk traffic can also be dramatically reduced. The principle of the index design is that inverted lists are sorted by indocument frequency rather than by document number. In the context of compressed indexes such a re-ordering could result in a large increase in index size. We show, however, that it is possible to use the re-ordering to achieve a net reduction in index size, regardless of whether the index is compressed. Together, these techniques simultaneously achieve savings in cpu time, disk traffic, memory usage, and index size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T.C. Bell, A. Moffat, C.G. Nevill-Manning, I.H. Witten, and J. Zobel. Data compression in full-text retrieval systems. Journal of the American Society for Information Science, 44(9):508–531, October 1993.

    Article  Google Scholar 

  2. C. Buckley and A.F. Lewit. Optimisation of inverted vector searches. In Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval, pages 97–110, Montreal, Canada, June 1985.

    Google Scholar 

  3. P. Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, IT-21(2):194–203, March 1975.

    Article  MathSciNet  Google Scholar 

  4. W.B. Frakes and R. Baeza-Yates, editors. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, New Jersey, 1992.

    Google Scholar 

  5. D. Harman and G. Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society for Information Science, 41(8):581–589, 1990.

    Article  Google Scholar 

  6. D. Lucarella. A document retrieval system based upon nearest neighbour searching. Journal of Information Science, 14:25–33, 1988.

    Article  Google Scholar 

  7. A. Moffat and J. Zobel. Parameterised compression for sparse bitmaps. In Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval, pages 274–285, Copenhagen, Denmark, June 1992. ACM Press.

    Google Scholar 

  8. A. Moffat and J. Zobel. Fast ranking in limited space. In Proc. IEEE International Conference on Data Engineering, pages 428–437, February 1994.

    Google Scholar 

  9. National Institute of Standards and Technology. Proc. Text Retrieval Conference (TREC), Washington, November 1992. Special Publication 500-207.

    Google Scholar 

  10. M. Persin. Document filtering for fast ranking. In Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval, 1994. (To appear).

    Google Scholar 

  11. G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989.

    Google Scholar 

  12. G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.

    MATH  Google Scholar 

  13. J. Zobel, A. Moffat, and R. Sacks-Davis. An efficient indexing technique for full-text database systems. In Proc. International Conference on Very Large Databases, pages 352–362, Vancouver, Canada, August 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Witold Litwin Tore Risch

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Persin, M., Zobel, J., Sacks-Davis, R. (1994). Fast document ranking for large scale information retrieval. In: Litwin, W., Risch, T. (eds) Applications of Databases. ADB 1994. Lecture Notes in Computer Science, vol 819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58183-9_53

Download citation

  • DOI: https://doi.org/10.1007/3-540-58183-9_53

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58183-3

  • Online ISBN: 978-3-540-48473-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics