Abstract
For large document databases, evaluation of ranked queries can be expensive in cpu time, memory usage, and disk traffic. It has been shown that memory usage can be dramatically reduced by use of a simple filtering heuristic that eliminates most documents from consideration. In this paper we show that, by designing inverted indexes explicitly to support filtering, cpu time and disk traffic can also be dramatically reduced. The principle of the index design is that inverted lists are sorted by indocument frequency rather than by document number. In the context of compressed indexes such a re-ordering could result in a large increase in index size. We show, however, that it is possible to use the re-ordering to achieve a net reduction in index size, regardless of whether the index is compressed. Together, these techniques simultaneously achieve savings in cpu time, disk traffic, memory usage, and index size.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
T.C. Bell, A. Moffat, C.G. Nevill-Manning, I.H. Witten, and J. Zobel. Data compression in full-text retrieval systems. Journal of the American Society for Information Science, 44(9):508–531, October 1993.
C. Buckley and A.F. Lewit. Optimisation of inverted vector searches. In Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval, pages 97–110, Montreal, Canada, June 1985.
P. Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, IT-21(2):194–203, March 1975.
W.B. Frakes and R. Baeza-Yates, editors. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, New Jersey, 1992.
D. Harman and G. Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society for Information Science, 41(8):581–589, 1990.
D. Lucarella. A document retrieval system based upon nearest neighbour searching. Journal of Information Science, 14:25–33, 1988.
A. Moffat and J. Zobel. Parameterised compression for sparse bitmaps. In Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval, pages 274–285, Copenhagen, Denmark, June 1992. ACM Press.
A. Moffat and J. Zobel. Fast ranking in limited space. In Proc. IEEE International Conference on Data Engineering, pages 428–437, February 1994.
National Institute of Standards and Technology. Proc. Text Retrieval Conference (TREC), Washington, November 1992. Special Publication 500-207.
M. Persin. Document filtering for fast ranking. In Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval, 1994. (To appear).
G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989.
G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
J. Zobel, A. Moffat, and R. Sacks-Davis. An efficient indexing technique for full-text database systems. In Proc. International Conference on Very Large Databases, pages 352–362, Vancouver, Canada, August 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Persin, M., Zobel, J., Sacks-Davis, R. (1994). Fast document ranking for large scale information retrieval. In: Litwin, W., Risch, T. (eds) Applications of Databases. ADB 1994. Lecture Notes in Computer Science, vol 819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58183-9_53
Download citation
DOI: https://doi.org/10.1007/3-540-58183-9_53
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58183-3
Online ISBN: 978-3-540-48473-8
eBook Packages: Springer Book Archive