Abstract
Merged indexes are B-trees that contain multiple traditional indexes and interleave their records based on a common sort order. In relational databases, merged indexes implement ‘‘master-detail clustering’’ of related records, e.g., orders and order details. Thus, merged indexes shift de-normalization from the logical level of tables and rows to the physical level of indexes and records, which is a much more appropriate place for it. For object-oriented applications, clustering can reduce the I/O cost for joining rows in related tables to a fraction compared to traditional indexes, with additional beneficial effects on buffer pool requirements.
Prior research has covered merged indexes without providing much guidance for their implementation. Enabling the design proposed here is a strict separation of B-tree and index into two layers of abstraction. In addition, this paper provides algorithms (i) for data layout including bitmap indexes and column stores, (ii) for concurrency control and recovery including locking of individual keys and of complex objects, (iii) for update operations including bulk insertions, bulk deletions, and deferred index maintenance, (iv) for adding and removing individual indexes within a merged index, (v) for enforcement of relational integrity constraints from uniqueness constraints to foreign key constraints, and (vi) for query processing including caching in query execution plans dominated by nested iteration and index navigation.
In the proposed design for merged indexes, the set of tables, views, and indexes can evolve without restriction. The set of clustering columns can also evolve freely. A relational query processor can search and update index records just as in traditional indexes. With these abilities, the proposed design may finally bring general master-detail clustering to traditional databases together with its advantages in performance and cost.
Zusammenfassung
Merged Indexes (eine Art physischer Sammelindex) sind spezielle B-Bäume, die mehrere herkömmliche Indexe vereinigen und ihre Einträge in einer gemeinsamen Sortierreihenfolge speichern. In relationalen Datenbanken implementieren Merged Indexes eine ,,hierarchische Clusterbildung‘‘ (master-detail clustering) zusammengehöriger Sätze, beispielsweise von Bestellungen und Bestellposten. Damit verlagern Merged Indexes die Denormalisierung von der logischen Ebene der Tabellen und Zeilen auf die physische Ebene der Indexe und Sätze, was eine viel besser geeignete Stelle dafür ist. Im Vergleich zum Einsatz herkömmlicher Indexe können bei objektorientierten Anwendungen durch Clusterbildung die E/A-Kosten für den Verbund in Beziehung stehender Tabellen auf einen Bruchteil gesenkt werden, wobei zusätzlich Spareffekte bei der Puffernutzung erzielt werden.
In früheren Publikationen werden Merged Indexes ohne spezielle Hinweise zu ihrer Implementierung erwähnt. Unsere Vorgehensweise bei ihrer Realisierung sieht eine strikte Trennung von B-Baum und Index in zwei Abstraktionsebenen vor. Zusätzlich liefert unser Beitrag Algorithmen (i) für die Datenabbildung einschließlich der Bitlisten-Indexe und der spaltenorientierten Speicherung, (ii) für Mehrbenutzersynchronisation und Recovery einschließlich spezieller Sperren für einzelne Schlüsselwerte und komplexe Objekte, (iii) für Aktualisierungsoperationen einschließlich Einfügungen und Löschungen großer Datenmengen, (iv) für das Hinzufügen und Entfernen einzelner Indexe innerhalb eines Merged Index, (v) zur Kontrolle relationaler Integritätsbedingungen wie Unique- und Fremdschlüsselbedingungen und (vi) für die Anfrageverarbeitung einschließlich Caching-Vorkehrungen bei Anfrageausführungsplänen mit hohem Anteil an geschachtelter Iteration und Index-Navigation.
In unserem Entwurf für Merged Indexes kann die Menge der beteiligten Tabellen, Sichten und Indexe unbeschränkt erweitert werden. Auch die Menge der Attribute (Spalten) mit Clusterbildung unterliegt keinen Beschränkungen. Ein relationaler Anfrageprozessor kann Indexeinträge genauso wie in herkömmlichen Indexen suchen und modifizieren. Diese Eigenschaften unseres Entwurfs ermöglichen es schließlich, das Konzept der allgemeinen hierarchischen Clusterbildung in herkömmliche Datenbanken zu integrieren und dabei ihre Vorteile hinsichtlich Leistung und Kosten zu erhalten.
Similar content being viewed by others
References
Abadi DJ (2007) Column stores for wide and sparse data. CIDR 2007
Agarwal S, Blakeley JA, Casey T, Delaney K, Galindo-Legaria CA, Graefe G, Rys M, Zwilling MJ (2001) Microsoft SQL Server. Chapter 27, p 969–1006. In: Silberschatz A, Korth HF, Sudarshan S (eds) Database System Concepts 4th Edn. McGraw-Hill Higher Education
Bayer R, Unterauer K (1977) Prefix B-Trees. ACM TODS 2(1):11–26
Fernandez PM (1994) Red Brick Warehouse: A Read-Mostly RDBMS for Open SMP Platforms. SIGMOD 1994, p 492
Graefe G (2003) Sorting and Indexing with Partitioned B-Trees. CIDR 2003
Graefe G (2003) Executing Nested Queries. BTW Conf. 2003, pp 58–77
Graefe G (2006) B-tree indexes, interpolation search, and skew. DaMoN 2006, p 5
Graefe G (2007) Hierarchical locking in B-tree Indexes. BTW Conf. 2007
Graefe G (2007) Algorithms for merged indexes. BTW Conf. 2007
Graefe G: Surrogate compression for columnar storage. To appear in ACM SIGMOD Record
Graefe G, Bunker R, Cooper S (1998) Hash joins and hash teams in Microsoft SQL Server. VLDB 1998, pp 86–97
Gray J, Graefe G (1997) The Five-Minute Rule ten years later, and other computer storage rules of thumb. SIGMOD Record 26(4):63–68
Gottemukkala V, Lehman TJ (1992) Locking and Latching in a Memory-Resident Database System. VLDB 1992, pp 533–544
Graefe G, Larson P (2001) B-tree indexes and CPU caches. ICDE 2001, pp 349–358
Gray J, Lorie RA, Putzolu GR, Traiger IL (1975) Granularity of Locks in a Large Shared Data Base. VLDB 1975, pp 428–451
Gassner P, Lohman GM, Schiefer KB, Wang Y (1993) Query Optimization in the IBM DB2 Family. IEEE Data Eng Bull 16(4):4–18
Gray J, Reuter A (1993) Transaction Processing: Concepts and Techniques. Morgan-Kaufman, San Mateo, CA
Graefe G, Shapiro LD (1991) Data compression and database performance. ACM/IEEE-CS Symp. on Applied Computing
Galindo-Legaria CA, Stefani S, Waas F (2004) Query Processing for SQL Updates. SIGMOD 2004:844–849
Graefe G, Zwilling MJ (2004) Transaction support for indexed views. SIGMOD 2004:323–334
Härder T (1978) Implementing a Generalized Access Path Structure for a Relational Database System. ACM TODS 3(3):285–298
Hamilton J (2007) An architecture for modular data centers. CIDR 2007
Harizopoulos S, Liang V, Abadi DJ, Madden S (2006) Performance Tradeoffs in Read-Optimized Databases. VLDB 2006, pp 487–498
Härder T, Reuter A (1983) Principles of Transaction-Oriented Database Recovery. ACM Comput Surv 15(4):287–317
Härder T, Reinert J (1996) Access Path Support for Referential Integrity in SQL2. VLDB J 5(3):196–214
Hsu WW, Smith AJ (2004) The performance impact of I/O optimizations and disk improvements. IBM J Res Dev 48(2):255–289
Joshi AM (1991) Adaptive Locking Strategies in a Multi-node Data Sharing Environment. VLDB 1991, pp 181–191
Korth HF (1983) Locking Primitives in a Database System. J ACM 30(1):55–79
Lomet DB (1993) Key Range Locking Strategies for Improved Concurrency. VLDB 1993, pp 655–664
Lomet DB (2001) The Evolution of Effective B-tree: Page Organization and Techniques: A Personal Account. SIGMOD Record 30(3):64–69
Lehman TJ, Carey MJ (1989) A Concurrency Control Algorithm for Memory-Resident Database Systems. FODO 1989, pp 490–504
Leslie H, Jain R, Birdsall D, Yaghmai H (1995) Efficient Search of Multi-Dimensional B-Trees. VLDB 1995, pp 710–719
Lomet DB, Tuttle MR (1995) Redo Recovery after System Crashes. VLDB 1995, pp 457–468
Mohan C, ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes. VLDB 1990, pp 392–405
McJones PR (1997) The 1995 SQL Reunion: People, Project, and Politics, May 29, 1995. Digital System Research Center Report SRC1997-018
Mohan C, Levine F (1992) ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging. SIGMOD 1992, pp 371–380
Mohan C, Narang I (1992) Algorithms for Creating Indexes for Very Large Tables without Quiescing Updates. SIGMOD Conf. 1992, pp 361–370
O’Neil PE (1986) The Escrow Transactional Method. ACM TODS 11(4):405–430
O’Neil PE, Graefe G (1995) Multi-table joins through bitmapped join indices. SIGMOD Record 24(3):8–11
Orenstein JA, Merrett TH (1984) A Class of Data Structures for Associative Searching. PODS 1984, pp 181–190
Pöss M, Potapov D (2003) Data Compression in Oracle. VLDB 2003, pp 937–947
Ramsak F, Markl V, Fenk R, Zirkel M, Elhardt K, Bayer R (2000) Integrating the UB-Tree into a Database System Kernel. VLDB 2000, pp 263–272
Short-stroking. Storage magazine. http://storagemagazine.techtarget.com, July 2005
Stonebraker M, Abadi DJ, Batkin A, Chen X, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil EJ, O’Neil PE, Rasin A, Tran N, Zdonik SB (2005) C-Store: A Column-oriented DBMS. VLDB 2005, pp 553–564
Severance DG, Lohman GM (1976) Differential Files: Their Application to the Maintenance of Large Databases. ACM TODS 1(3):256–267
Tsatalos OG, Solomon MH, Ioannidis YE (1996) VLDB J 5(2):101–118
Valduriez P (1987) Join Indices. ACM TODS 12(2):218–246
Wu K, Otoo EJ, Shoshani A (2006) Optimizing bitmap indices with efficient compression. ACM TODS 31(1):1–38
Author information
Authors and Affiliations
Corresponding author
Additional information
CR subject classification
H2.2; H2.4; H3.2; H3.3
Rights and permissions
About this article
Cite this article
Graefe, G. Master-detail clustering using merged indexes . Informatik Forsch. Entw. 21, 127–145 (2007). https://doi.org/10.1007/s00450-007-0022-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-007-0022-4
Keywords
- Relational database
- B-tree index
- Master-detail clustering
- Concurrency control
- Index utilities
- Query processing