CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees

Hammer, Joachim; Fu, Lixin

doi:10.1023/A:1025537315785

CubiST⁺⁺: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees

Published: November 2003

Volume 14, pages 221–254, (2003)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Joachim Hammer¹ &
Lixin Fu²

88 Accesses
3 Citations
6 Altmetric
Explore all metrics

Abstract

We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. We are focusing on a class of queries called cube queries, which return aggregated values rather than sets of tuples. Our approach, termed CubiST⁺⁺ (Cubing with Statistics Trees Plus Families), represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the view lattice to compute and materialize new views from existing views in some heuristic fashion. Instead, CubiST⁺⁺ encodes all possible aggregate views in the leaves of a new data structure called statistics tree (ST) during a one-time scan of the detailed data. In order to optimize the queries involving constraints on hierarchy levels of the underlying dimensions, we select andmaterialize a family of candidate trees, which represent superviews over the different hierarchical levels of the dimensions. Given a query, our query evaluation algorithm selects the smallest tree in the family, which can provide the answer. Extensive evaluations of our prototype implementation have demonstrated its superior run-time performance and scalability when compared with existing MOLAP and ROLAP systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

S. Agarwal, R. Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, and S. Sarawagi, “On the computation of multidimensional aggregates,” in 22th International Conference on Very Large Data Bases, Mumbai (Bomabi), India, 1996.
R. Agrawal, A. Gupta, and S. Sarawagi, “Modeling multidimensional databases,” in Thirteenth International Conference on Database Engineering, Birmingham, UK, 1997.
Arbor Systems, “Large-scale data warehousing using hyperion essbase OLAP technology,” Arbor Systems, White Paper 1997.
K. Beyer and R. Ramakrishnan, “Bottom-up computation of sparse and iceberg CUBEs,” in ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, 1999.
S. Chaudhuri and U. Dayal, “Data warehousing and OLAP for decision support,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 26, pp. 507–508, 1997.
Google Scholar
S. Chaudhuri and U. Dayal, “An overview of data warehousing and OLAP technology,” SIGMOD Record, vol. 26, pp. 65–74, 1997.
Google Scholar
C.Y. Chan and Y.E. Ioannidis, “Bitmap index design and evaluation,” in 1998 ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, 1998.
E.F. Codd, S.B. Codd, and C.T. Salley, “Beyond decision support,” in Computer World, vol. 27, 1993, www.arborsoft.com/OLAP.html.
E.F. Codd, S.B. Codd, and C.T. Salley, “Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate,” Technical Report 1993.
D. Comer, “The ubiquitous Btree,” ACM Computing Surveys, vol. 11, pp. 121–137, 1979.
Google Scholar
C.E. Dyreson, “Information retrieval from an incomplete data cube,” in Twenty-Second International Conference on Very Large Data Bases, Mumbai (Bombai), India, 1996.
L. Fu and J. Hammer, “CubiST: A new algorithm for improving the performance of ad-hoc OLAP queries,” 0ACM Third International Workshop on Data Warehousing and OLAP (DOLAP), Washington, DC, 2000.
S. Goil and A. Choudhary, “High performance OLAP and data mining on parallel computers,” Journal of Data Mining and Knowledge Discovery, vol. 1, pp. 391–417, 1997.
Google Scholar
S. Goil and A. Choudhary, “PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining,” Journal of Parallel and Distributed Computing, vol. 61, pp. 285–321, 2001.
Google Scholar
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh, “Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals,” Data Mining and Knowledge Discovery, vol. 1, pp. 29–53, 1997.
Google Scholar
H. Gupta and I. Mumick, “Selection of views to materialize under a maintenance cost constraint,” Stanford University, Technical Report, 1997.
V. Harinarayan, A. Rajaraman, and J.D. Ullman, “Implementing data cubes efficiently,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 25, pp. 205–216, 1996.
Google Scholar
Information Advantage, “Business intelligence,” White Paper, 1998.
Informix Corp., “Informix red brick decision server,” 2001, http://www.informix.com/redbrick/.
T. Johnson and D. Shasha, “Some approaches to index design for cube forests,” Bulletin of the Technical Committee on Data Engineering, IEEE Computer Society, vol. 20, pp. 27–35, 1997.
Google Scholar
W. Labio, D. Quass, and B. Adelberg, “Physical database design for data warehouses,” in International Conference on Database Engineering, Birmingham, England, 1997.
M. Lee and J. Hammer, “Speeding upwarehouse physical design using a randomized algorithm,” International Journal of Cooperative Information Systems (IJCIS)—Special Issue on Design and Management of Data Warehouses, vol. 10, pp. 327–354, 2001.
Google Scholar
D. Lomet, “Bulletin of the technical committee on data engineering,” in Special Issue on Materialized Views and Data Warehousing, J. Widom (Ed.), IEEE Computer Society, 1995, vol. 18.
Microsoft Corp., “Microsoft SQL server,” Microsoft, Seattle, WA, White Paper.
MicroStrategy Inc., “The case for relational OLAP,” MicroStrategy, White Paper.
P. O'Neil and D. Quass, “Improved query performance with variant indexes,” SIGMOD Record (ACMSpecial Interest Group on Management of Data), vol. 26, pp. 38–49, 1997.
Google Scholar
P.E. O'Neil, “Model 204 architecture and performance,” 2nd International Workshop on High Performance Transaction Systems, Asilomar, CA, 1987.
Oracle Corp., “Oracle express OLAP technology,” http://www.oracle.com/olap/index.html.
Oracle Corp., “Oracle express server documentation,” Oracle Corporation, Redwood Shores, CA, Documentation, August 2001.
Pilot Software Inc., “An introduction to OLAP multidimensional terminology and technology,” Pilot Software, Cambridge, MA, White Paper.
Redbrick Systems, “Decision-makers, business data and RISQL,” Informix, Los Gatos, CA, White Paper, 1997.
N. Roussopoulos, Y. Kotidis, and M. Roussopoulos, “Cubetree: Organization of and bulk updates on the data cube,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 26, pp. 89–99, 1997.
Google Scholar
N. Roussopoulos and D. Leifker, “Direct spatial search on pictorial databases using packed R-trees,” in 1985 ACM SIGMOD International Conference on Management of Data, Austin, Texas, 1985.
R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational tables,” in 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, 1996.
J. Srivastava, J.S.E. Tan, and V.Y. Lum, “TBSAM: An access method for efficient processing of statistical queries,” IEEE Transactions on Knowledge and Data Engineering, vol. 1, pp. 414–423, 1989.
Google Scholar
Transaction Processing Performance Council, “The TPC BenchmarkTMH,” Transaction Processing Council, 2001, http://www.tpc.org/tpch/.
Transaction Processing Performance Council, “Transaction processing performance council,” 2001, http://www.tpc.org/.
W.P. Yan and P. Larson, “Eager aggregation and lazy aggregation,” in 21th International Conference on Very Large Data Bases, Zurich, Switzerland, 1995.
Y. Zhao, P.M. Deshpande, and J.F. Naughton, “An array-based algorithm for simultaneous multidimensional aggregates,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 26, pp. 159–170, 1997.
Google Scholar
Y. Zhuge, H. Garcia-Molina, and J.L. Wiener, “Consistency algorithms for multi-source warehouse view maintenance,” Distributed and Parallel Databases, vol. 6, pp. 7–40, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer & Information Science & Eng, University of Florida, Gainesville, FL, 32611-6120, USA
Joachim Hammer
Division of Computer Science, University of North Carolina, Greensboro, Greensboro, NC, 27402-6170, USA
Lixin Fu

Authors

Joachim Hammer
View author publications
You can also search for this author in PubMed Google Scholar
Lixin Fu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hammer, J., Fu, L. CubiST⁺⁺: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees. Distributed and Parallel Databases 14, 221–254 (2003). https://doi.org/10.1023/A:1025537315785

Download citation

Issue Date: November 2003
DOI: https://doi.org/10.1023/A:1025537315785

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CubiST⁺⁺: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees

Abstract

Access this article

Similar content being viewed by others

Making data visualization more efficient and effective: a survey

MongoDB Vs PostgreSQL: A comparative study on performance aspects

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees

Abstract

Access this article

Similar content being viewed by others

Making data visualization more efficient and effective: a survey

MongoDB Vs PostgreSQL: A comparative study on performance aspects

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation

CubiST⁺⁺: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees