Fast document ranking for large scale information retrieval

Persin, Michael; Zobel, Justin; Sacks-Davis, Ron

doi:10.1007/3-540-58183-9_53

Michael Persin¹,
Justin Zobel¹ &
Ron Sacks-Davis²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 819))

Included in the following conference series:

International Conference on Applications of Databases

318 Accesses

Abstract

For large document databases, evaluation of ranked queries can be expensive in cpu time, memory usage, and disk traffic. It has been shown that memory usage can be dramatically reduced by use of a simple filtering heuristic that eliminates most documents from consideration. In this paper we show that, by designing inverted indexes explicitly to support filtering, cpu time and disk traffic can also be dramatically reduced. The principle of the index design is that inverted lists are sorted by indocument frequency rather than by document number. In the context of compressed indexes such a re-ordering could result in a large increase in index size. We show, however, that it is possible to use the re-ordering to achieve a net reduction in index size, regardless of whether the index is compressed. Together, these techniques simultaneously achieve savings in cpu time, disk traffic, memory usage, and index size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Efficient query processing techniques for next-page retrieval

Article Open access 18 January 2022

Compact Indexes for Flexible Top- $$k$$ Retrieval

Assessing efficiency–effectiveness tradeoffs in multi-stage retrieval systems without using relevance judgments

Article 09 March 2016

References

T.C. Bell, A. Moffat, C.G. Nevill-Manning, I.H. Witten, and J. Zobel. Data compression in full-text retrieval systems. Journal of the American Society for Information Science, 44(9):508–531, October 1993.
Article Google Scholar
C. Buckley and A.F. Lewit. Optimisation of inverted vector searches. In Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval, pages 97–110, Montreal, Canada, June 1985.
Google Scholar
P. Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, IT-21(2):194–203, March 1975.
Article MathSciNet Google Scholar
W.B. Frakes and R. Baeza-Yates, editors. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, New Jersey, 1992.
Google Scholar
D. Harman and G. Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society for Information Science, 41(8):581–589, 1990.
Article Google Scholar
D. Lucarella. A document retrieval system based upon nearest neighbour searching. Journal of Information Science, 14:25–33, 1988.
Article Google Scholar
A. Moffat and J. Zobel. Parameterised compression for sparse bitmaps. In Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval, pages 274–285, Copenhagen, Denmark, June 1992. ACM Press.
Google Scholar
A. Moffat and J. Zobel. Fast ranking in limited space. In Proc. IEEE International Conference on Data Engineering, pages 428–437, February 1994.
Google Scholar
National Institute of Standards and Technology. Proc. Text Retrieval Conference (TREC), Washington, November 1992. Special Publication 500-207.
Google Scholar
M. Persin. Document filtering for fast ranking. In Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval, 1994. (To appear).
Google Scholar
G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989.
Google Scholar
G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
MATH Google Scholar
J. Zobel, A. Moffat, and R. Sacks-Davis. An efficient indexing technique for full-text database systems. In Proc. International Conference on Very Large Databases, pages 352–362, Vancouver, Canada, August 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, RMIT, GPO Box 2476V, 3001, Melbourne, Australia
Michael Persin & Justin Zobel
Faculty of Applied Science, RMIT, GPO Box 2476V, 3001, Melbourne, Australia
Ron Sacks-Davis

Authors

Michael Persin
View author publications
You can also search for this author in PubMed Google Scholar
Justin Zobel
View author publications
You can also search for this author in PubMed Google Scholar
Ron Sacks-Davis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Witold Litwin Tore Risch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Persin, M., Zobel, J., Sacks-Davis, R. (1994). Fast document ranking for large scale information retrieval. In: Litwin, W., Risch, T. (eds) Applications of Databases. ADB 1994. Lecture Notes in Computer Science, vol 819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58183-9_53

Download citation

DOI: https://doi.org/10.1007/3-540-58183-9_53
Published: 31 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58183-3
Online ISBN: 978-3-540-48473-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics