Abstract
The data sets for many of today’s computer applications are too large to fit within the computer’s internal memory and must instead be stored on external storage devices such as disks. A major performance bottleneck can be the input/output communication (or I/O) between the external and internal memories. In this paper we discuss a variety of online data structures for external memory, some very old and some very new, such as hashing (for dictionaries), B-trees (for dictionaries and 1-D range search), butter trees (for batched dynamic problems), interval trees with weight-balanced B-trees (for stabbing queries), priority search trees (for 3-sided 2-D range search), and R-trees and other spatial structures. We also discuss several open problems along the way.
Supported in part by the Army Research Office through MURI grant DAAH04-96-1-0013 and by the National Science Foundation through research grants CCR-9522047 and EIA-9870734.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
P.K. Agarwal, L. Arge, G.S. Brodal, and J.S. Vitter. I/O-efficient dynamic point location in monotone planar subdivisions. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 11–20, 1999.
P.K. Agarwal, L. Arge, J. Erickson, P.G. Franciosa, and J.S. Vitter. Efficient searching with linear constraints. In Proc. 17th ACM Symposium on Principles of Database Systems, 169–178, 1998.
P.K. Agarwal, L. Arge, T.M. Murali, K. Varadarajan, and J.S. Vitter. I/O-efficient algorithms for contour line extraction and planar graph blocking. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 117–126, 1998.
P.K. Agarwal and J. Erickson. Geometric range searching and its relatives. In B. Chazelle, J.E. Goodman, and R. Pollack, editors, Advances in Discrete and Computational Geometry, volume 23 of Contemporary Mathematics, 1–56. AMS Press, Providence, RI, 1999.
L. Arge. The buffer tree: A new technique for optimal I/O-algorithms. In Proceedings of the Workshop on Algorithms and Data Structures, volume 955 of Lecture Notes in Computer Science, 334–345. Springer-Verlag, 1995. A complete version appears as BRICS technical report RS-96-28, University of Aarhus.
L. Arge, K.H. Hinrichs, J. Vahrenhold, and J.S. Vitter. Efficient bulk operations on dynamic R-trees. In Proceedings of the 1st Workshop on Algorithm Engineering and Experimentation, Baltimore, January 1999.
L. Arge, V. Samoladas, and J.S. Vitter. Two-dimensional indexability and optimal range search indexing. In Proceedings of the ACM Symposium Principles of Database Systems, Philadelphia, PA, May-June 1999.
L. Arge and J.S. Vitter. Optimal dynamic interval management in external memory. In Proceedings of the IEEE Symposium on Foundations of Computer Science, 560–569, Burlington, VT, October 1996.
R.A. Baeza-Yates. Expected behaviour of B+-trees under random insertions. Acta Informatica, 26(5), 439–472, 1989.
R.D. Barve, E.A.M. Shriver, P.B. Gibbons, B.K. Hillyer, Y. Matias, and J.S. Vitter. Modeling and optimizing I/O throughput of multiple disks on a bus: the long version. Technical report, Bell Labs, 1997.
R.D. Barve and J.S. Vitter. External memory algorithms with dynamically changing memory allocations: Long version. Technical Report CS-1998-09, Duke University, 1998.
R. Bayer and E. McCreight. Organization of large ordered indexes. Acta Inform., 1, 173–189, 1972.
B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer. An asymptotically optimal multiversion B-tree. The VLDB Journal, 5(4), 264–275, December 1996.
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the SIGMOD International Conference on Management of Data, 322–331, 1990.
J.L. Bentley. Multidimensional divide and conquer. Communications of the ACM, 23(6), 214–229, 1980.
S. Berchtold, C. Böhm, and H.-P. Kriegel. Improving the query performance of high-dimensional index structures by bulk load operations. In Proceedings of the International Conference on Extending Database Technology, 1998.
G.S. Brodal and J. Katajainen. Worst-case efficient external-memory priority queues. In Proceedings of the Scandinavian Workshop on Algorithms Theory, volume 1432of Lecture Notes in Computer Science, 107–118, Stockholm, Sweden, July 1998. Springer-Verlag.
P. Callahan, M.T. Goodrich, and K. Ramaiyer. Topology B-trees and their applications. In Proceedings of the Workshop on Algorithms and Data Structures, volume 955 of Lecture Notes in Computer Science, 381–392. Springer-Verlag, 1995.
B. Chazelle. Filtering search: a new approach to query-answering. SIAM Journal on Computing, 15, 703–724, 1986.
B. Chazelle. Lower bounds for orthogonal range searching: I. The reporting case. Journal of the ACM, 37(2), 200–212, April 1990.
B. Chazelle and H. Edelsbrunner. Linear space data structures for two types of range search. Discrete & Computational Geometry, 2, 113–126, 1987.
P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A. Patterson. RAID: high-performance, reliable secondary storage. ACM Computing Surveys, 26(2), 145–185, June 1994.
Y.-J. Chiang and C.T. Silva. External memory techniques for isosurface extraction in scientific visualization. In J. Abello and J.S. Vitter, editors, External Memory Algorithms and Visualization, Providence, RI, 1999. AMS Press.
D. Comer. The ubiquitous B-tree. Comput. Surveys, 11(2), 121–137, 1979.
H. Edelsbrunner. A new approach to rectangle intersections, part I. Int. J.Computer Mathematics, 13, 209–219, 1983.
H. Edelsbrunner. A new approach to rectangle intersections, part II. Int. J. Computer Mathematics, 13, 221–229, 1983.
R.J. Enbody and H.C. Du. Dynamic hashing schemes. ACM Computing Surveys, 20(2), 85–113, June 1988.
G. Evangelidis, D.B. Lomet, and B. Salzberg. The hBII-tree: A multi-attribute index supporting concurrency, recovery and node consolidation. VLDB Journal, 6, 1–25, 1997.
R. Fagin, J. Nievergelt, N. Pippinger, and H.R. Strong. Extendible hashing—a fast access method for dynamic files. ACM Transactions on Database Systems, 4(3), 315–344, 1979.
P. Flajolet. On the performance evaluation of extendible hashing and trie searching. Acta Informatica, 20(4), 345–369, 1983.
W. Frakes and R. Baeza-Yates, editors. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992.
V. Gaede and O. Günther. Multidimensional access methods. Computing Surveys, 30(2), 170–231, June 1998.
G.A. Gibson, J.S. Vitter, and J. Wilkes. Report of the working group on storage I/O issues in large-scale computing. ACM Computing Surveys, 28(4), 779–793, December 1996.
D. Greene. An implementation and performance analysis of spatial data access methods. In Proceedings of the IEEE International Conference on Data Engineering, 606–615, 1989.
R. Grossi and G.F. Italiano. Efficient cross-trees for external memory. In J. Abello and J.S. Vitter, editors, External Memory Algorithms and Visualization. AMS Press, Providence, RI, 1999.
R. Grossi and G.F. Italiano. Efficient splitting and merging algorithms for order decomposable problems. Information and Computation, in press. An earlier version appears in Proceedings of the 24th International Colloquium on Automata, Languages and Programming, volume 1256 of Lecture Notes in Computer Science, Springer Verlag, 605–615, 1997.
A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD Conference on Management of Data, 47–57, 1985.
J.M. Hellerstein, E. Koutsoupias, and C.H. Papadimitriou. On the analysis of indexing schemes. In Proceedings of the 16th ACM Symposium on Principles of Database Systems, 249–256, Tucson, AZ, May 1997.
L. Hellerstein, G. Gibson, R.M. Karp, R.H. Katz, and D.A. Patterson. Coding techniques for handling failures in large disk arrays. Algorithmica, 12(2-3), 182–208, 1994.
K.H. Hinrichs. The grid file system: Implementation and case studies of applications. PhD thesis, Dept. Information Science, ETH, Zürich, 1985.
I. Kamel and C. Faloutsos. On packing R-trees. In Proceedings of the 2nd International Conference on Information and Knowledge Management, 490–499, 1993.
I. Kamel and C. Faloutsos. Hilbert R-tree: An improved R-tree using fractals. In Proceedings of the 20th International Conference on Very Large Databases, 500–509, 1994.
I. Kamel, M. Khalil, and V. Kouramajian. Bulk insertion in dynamic R-trees. In Proceedings of the 4th International Symposium on Spatial Data Handling, 3B, 31–42, 1996.
P.C. Kanellakis, S. Ramaswamy, D.E. Vengroff, and J.S. Vitter. Indexing for data models with constraints and classes. Journal of Computer and System Science, 52(3), 589–612, 1996.
K.V.R. Kanth and A.K. Singh. Optimal dynamic range searching in non-replicating index structures. In Proceedings of the 7th International Conference on Database Theory, Jerusalem, January 1999.
D.E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, Reading MA, second edition, 1998.
E. Koutsoupias and D.S. Taylor. Tight bounds for 2-dimensional indexing schemes. In Proceedings of the 17th ACM Symposium on Principles of Database Systems, Seattle, WA, June 1998.
R. Krishnamurthy and K.-Y. Wang. Multilevel grid files. Tech. Report, IBM T. J. Watson Center, Yorktown Heights, NY, November 1985.
K. Küspert. Storage utilization in B*-trees with a generalized overflow technique. Acta Informatica, 19, 35–55, 1983.
W. Litwin. Linear hashing: A new tool for files and tables addressing. In International Conference On Very Large Data Bases, 212–223, Montreal, Quebec, Canada, October 1980.
D.B. Lomet and B. Salzberg. The hB-tree: a multiattribute indexing method with good guaranteed performance. ACM Transactions on Database Systems, 15(4), 625–658, 1990.
D.B. Lomet and B. Salzberg. Concurrency and recovery for index trees. The VLDB Journal, 6(3), 224–240, 1997.
E.M. McCreight. Priority search trees. SIAM Journal on Computing, 14(2), 257–276, May 1985.
H. Mendelson. Analysis of extendible hashing. IEEE Transactions on Software Engineering, SE-8, 611–619, November 1982.
J. Nievergelt, H. Hinterberger, and K.C. Sevcik. The grid file: An adaptable, symmetric multi-key file structure. ACM Trans. Database Syst., 9, 38–71, 1984.
J. Nievergelt and P. Widmayer. Spatial data structures: Concepts and design choices. In M. van Kreveld, J. Nievergelt, T. Roos, and P. Widmayer, editors, Algorithmic Foundations of GIS, volume 1340 of Lecture Notes in Computer Science. Springer-Verlag, 1997.
S. Ramaswamy and S. Subramanian. Path caching: a technique for optimal external searching. Proceedings of the 13th ACM Conference on Principles of Database Systems, 1994.
J.T. Robinson. The k-d-b-tree: a search structure for large multidimensional dynamic indexes. In Proc. ACM Conference Principles Database Systems, 10–18, 1981.
C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. IEEE Computer, 17–28, March 1994.
V. Samoladas and D. Miranker. A lower bound theorem for indexing schemes and its application to multidimensional range queries. In Proc. 17th ACM Conf. on Princ. of Database Systems, Seattle, WA, June 1998.
B. Seeger and H.-P. Kriegel. The buddy-tree: An efficient and robust access method for spatial data base systems. In Proc. 16th VLDB Conference, 590–601, 1990.
E. Shriver, A. Merchant, and J. Wilkes. An analytic behavior model for disk drives with readahead caches and request reordering. In Joint International Conference on Measurement and Modeling of Computer Systems, June 1998.
S. Subramanian and S. Ramaswamy. The P-range tree: a new data structure for range searching in secondary memory. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1995.
J. van den Bercken, B. Seeger, and P. Widmayer. A generic approach to bulk loading multidimensional index structures. In Proceedings 23rd VLDB Conference, 406–415, 1997.
P.J. Varman and R.M. Verma. An efficient multiversion access structure. IEEE Transactions on Knowledge and Data Engineering, 9(3), 391–409, May/June 1997.
D.E. Vengroff and J.S. Vitter. Efficient 3-d range searching in external memory. In Proceedings of the ACM Symposium on Theory of Computation, 192–201, Philadelphia, PA, May 1996.
J.S. Vitter. Efficient memory access in large-scale computation. In Proceedings of the 1991 Symposium on Theoretical Aspects of Computer Science, Lecture Notes in Computer Science. Springer-Verlag, 1991. Invited paper.
J.S. Vitter. External memory algorithms and data structures. In J. Abello and J.S. Vitter, editors, External Memory Algorithms and Visualization. AMS Press, Providence, RI, 1999. An updated version is available via the author’s web page http://www.cs.duke.edu/~jsv/.
J.S. Vitter and E.A.M. Shriver. Algorithms for parallel memory I: Two-level memories. Algorithmica, 12(2-3), 110–147, 1994.
D. Willard and G. Lueker. Adding range restriction capability to dynamic data structures. Journal of the ACM, 32(3), 597–617, 1985.
A.C. Yao. On random 2-3 trees. Acta Informatica, 9, 159–170, 1978.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vitter, J.S. (1999). Online Data Structures in External Memory. In: Wiedermann, J., van Emde Boas, P., Nielsen, M. (eds) Automata, Languages and Programming. Lecture Notes in Computer Science, vol 1644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48523-6_10
Download citation
DOI: https://doi.org/10.1007/3-540-48523-6_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66224-2
Online ISBN: 978-3-540-48523-0
eBook Packages: Springer Book Archive