Abstract
Declining disk and CPU costs have kindled a renewed interest in efficient document indexing techniques. In this paper, the problem of incremental updates of inverted lists is addressed using a dual-structure index data structure that dynamically separates long and short inverted lists and optimizes the retrieval, update, and storage of each type of list. The behavior of this index is studied with the use of a synthetically-generated document collection and a simulation model of the algorithm. The index structure is shown to support rapid insertion of documents, fast queries, and to scale well to large document collections and many disks.
This research was sponsored by the Advanced Research Projects Agency (ARPA) of the Department of Defense under Grant No. MDA972-92-J-1029 with the Corporation for National Research Initiatives (CNRI). The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsement, either expressed or implied, of ARPA, the U. S. Government or CNRI.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Doug Cutting and Jan Pedersen. Optimizations for dynamic inverted index maintenance. In Proceedings of SIGIR ‘80, pages 405–411, 1990.
Christos Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In A. Pirotte, C. Delobel, and G. Gottlob, editors, Proceedings 3rd International Conference on Extending Database Technology - EDBT ‘82, Vienna, 1992. Springer-Verlag.
Christos Faloutsos and H. V. Jagadish. On B-tree indices for skewed distributions. In Proceedings of 18th International Conference on Very Large Databases, pages 363–374, Vancouver, British Columbia, Canada, 1992.
William B. Frakes and Ricardo Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992.
Donna Harman and Gerald Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society for Information Science, 41 (8): 581–589, 1990.
Donald E. Knuth. The Art of Computer Programming. Addison-Wesley, Reading, Massachusetts, 1973.
Anthony Tomasic, Hector Garcia-Molina, and Kurt Shoens. Incremental updates of inverted lists for text document retrieval. Technical Note STAN-CS-TN-93–1, Stanford University, 1993. Available via FTP from db.stanford.edu as /pub/tmasic/stan.cs.tn.93.1.ps.
Anthony Tomasic, Hector Garcia-Molina, and Kurt Shoens. Incremental updates of inverted lists for text document retrieval. In International Conference on Management of Data (SIGMOD 94), 1994.
Stephen Wolfram. Mathematica. Addison-Wesley, Redwood City, California, 2nd edition, 1991..
George Kingsley Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, 1949.
Justin Zobel, Alistair Moffat, and Ron Sacks-Davis. An efficient indexing technique for full-text database systems. In Proceedings of 18th International Conference on Very Large Databases, Vancouver, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag London Limited
About this paper
Cite this paper
Shoens, K., Tomasic, A., Garcia-Molina, H. (1994). Synthetic Workload Performance Analysis of Incremental Updates. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_34
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2099-5_34
Publisher Name: Springer, London
Print ISBN: 978-3-540-19889-5
Online ISBN: 978-1-4471-2099-5
eBook Packages: Springer Book Archive