Abstract
We provide a benchmark measuring star schema queries retrieving data from a fact table with Where clause column restrictions on dimension tables. Clustering is crucial to performance with modern disk technology, since retrievals with filter factors down to 0.0005 are now performed most efficiently by sequential table search rather than by indexed access. DB2’s Multi-Dimensional Clustering (MDC) provides methods to "dice" the fact table along a number of orthogonal "dimensions", but only when these dimensions are columns in the fact table. The diced cells cluster fact rows on several of these "dimensions" at once so queries restricting several such columns can access crucially localized data, with much faster query response. Unfortunately, columns of dimension tables of a star schema are not usually represented in the fact table. In this paper, we show a simple way to adjoin physical copies of dimension columns to the fact table, dicing data to effectively cluster query retrieval, and explain how such dicing can be achieved on database products other than DB2. We provide benchmark measurements to show successful use of this methodology on three commercial database products.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bhattacharjee, B., et al.: Efficient Query Processing for Multi-Dimensional Clustered Tables in DB2
Chen, X., O’Neil, P., O’Neil, E.: Adjoined Dimension Column Clustering to Improve Data Warehouse Query Performance. Poster presentation at ICDE 2008 (2008), http://www.cs.umb.edu/~poneil/icde08_adc.pdf
Cranston, L.: MDC Performance: Customer Examples and Experiences, http://www.research.ibm.com/mdc/db2.pdf
IBM: Designing Multidimensional Clustering (MDC) Tables, http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/c0007238.htm
IBM Research: DB2’s Multi-Dimensional Clustering, http://www.research.ibm.com/mdc/
Kennedy, J.: Introduction to Multidimensional Clustering with DB2 UDB LUW. In: IBM DB2 Information Management Technical Conference, Orlando, FL (September 2005)
Kimball, R., Ross, M.: The Data Warehouse Toolkit, 2nd edn. Wiley, Chichester (2002)
Lightstone, S., Teorey, T., Nadeau, T.: Physical Database Design. Morgan Kaufman, San Francisco (2007)
O’Neil, P.: The Set Query Benchmark. In: Gray, J. (ed.) The Benchmark Handbook for Database and Transaction Processing Systems, pp. 209–245. Morgan Kauffmann, San Francisco (1993), http://www.cs.umb.edu/~poneil/SetQBM.pdf
O’Neil, P., O’Neil, E.: Database Principles, Programming, and Performance, 2nd edn. Morgan Kaufmann, San Francisco (2001)
O’Neil, P., O’Neil, E., Chen, X.: The Star Schema Benchmark, http://www.cs.umb.edu/~poneil/StarSchemaB.pdf
Oracle: Partitioning in Oracle Database 10g Release 2 (May 2005), http://www.oracle.com/solutions/business_intelligence/partitioning.html
Padmanabhan, S., et al.: Multi-Dimensional Clustering: A New Data Layout Scheme in DB2. In: Proceedings of the ACM SIGMOD Conference (2003)
Selinger, P., et al.: Access Path Selection in a Relational Database Management System. In: Proceedings of the ACM SIGMOD Conference, pp. 23–34 (1979)
Stonebraker, M., et al.: One Size Fits All? Part 2: Benchmarking Results, Keynote address. In: Proceedings of CIDR (2007), http://www-db.cs.wisc.edu/cidr/cidr2007/papers/cidr07p20.pdf
TPC: TPC-DS, TPC Decision Support, under development, http://www.tpc.org/tpcds/default.asp
Nambiar, R., Poess, M.: The Making of TPC-DS. In: VLDB Proceedings (2006)
TPC: TPC-H Version 2.4.0, http://www.tpc.org/tpch/default.asp
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
O’Neil, P., O’Neil, E., Chen, X., Revilak, S. (2009). The Star Schema Benchmark and Augmented Fact Table Indexing. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking. TPCTC 2009. Lecture Notes in Computer Science, vol 5895. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10424-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-10424-4_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10423-7
Online ISBN: 978-3-642-10424-4
eBook Packages: Computer ScienceComputer Science (R0)