Skip to main content
Book cover

SIGIR ’94 pp 329–338Cite as

Synthetic Workload Performance Analysis of Incremental Updates

  • Conference paper

Abstract

Declining disk and CPU costs have kindled a renewed interest in efficient document indexing techniques. In this paper, the problem of incremental updates of inverted lists is addressed using a dual-structure index data structure that dynamically separates long and short inverted lists and optimizes the retrieval, update, and storage of each type of list. The behavior of this index is studied with the use of a synthetically-generated document collection and a simulation model of the algorithm. The index structure is shown to support rapid insertion of documents, fast queries, and to scale well to large document collections and many disks.

This research was sponsored by the Advanced Research Projects Agency (ARPA) of the Department of Defense under Grant No. MDA972-92-J-1029 with the Corporation for National Research Initiatives (CNRI). The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsement, either expressed or implied, of ARPA, the U. S. Government or CNRI.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Doug Cutting and Jan Pedersen. Optimizations for dynamic inverted index maintenance. In Proceedings of SIGIR ‘80, pages 405–411, 1990.

    Chapter  Google Scholar 

  2. Christos Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In A. Pirotte, C. Delobel, and G. Gottlob, editors, Proceedings 3rd International Conference on Extending Database Technology - EDBT ‘82, Vienna, 1992. Springer-Verlag.

    Google Scholar 

  3. Christos Faloutsos and H. V. Jagadish. On B-tree indices for skewed distributions. In Proceedings of 18th International Conference on Very Large Databases, pages 363–374, Vancouver, British Columbia, Canada, 1992.

    Google Scholar 

  4. William B. Frakes and Ricardo Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992.

    Google Scholar 

  5. Donna Harman and Gerald Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society for Information Science, 41 (8): 581–589, 1990.

    Article  Google Scholar 

  6. Donald E. Knuth. The Art of Computer Programming. Addison-Wesley, Reading, Massachusetts, 1973.

    Google Scholar 

  7. Anthony Tomasic, Hector Garcia-Molina, and Kurt Shoens. Incremental updates of inverted lists for text document retrieval. Technical Note STAN-CS-TN-93–1, Stanford University, 1993. Available via FTP from db.stanford.edu as /pub/tmasic/stan.cs.tn.93.1.ps.

    Google Scholar 

  8. Anthony Tomasic, Hector Garcia-Molina, and Kurt Shoens. Incremental updates of inverted lists for text document retrieval. In International Conference on Management of Data (SIGMOD 94), 1994.

    Google Scholar 

  9. Stephen Wolfram. Mathematica. Addison-Wesley, Redwood City, California, 2nd edition, 1991..

    Google Scholar 

  10. George Kingsley Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, 1949.

    Google Scholar 

  11. Justin Zobel, Alistair Moffat, and Ron Sacks-Davis. An efficient indexing technique for full-text database systems. In Proceedings of 18th International Conference on Very Large Databases, Vancouver, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag London Limited

About this paper

Cite this paper

Shoens, K., Tomasic, A., Garcia-Molina, H. (1994). Synthetic Workload Performance Analysis of Incremental Updates. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_34

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2099-5_34

  • Publisher Name: Springer, London

  • Print ISBN: 978-3-540-19889-5

  • Online ISBN: 978-1-4471-2099-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics