Skip to main content
Log in

CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. We are focusing on a class of queries called cube queries, which return aggregated values rather than sets of tuples. Our approach, termed CubiST++ (Cubing with Statistics Trees Plus Families), represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the view lattice to compute and materialize new views from existing views in some heuristic fashion. Instead, CubiST++ encodes all possible aggregate views in the leaves of a new data structure called statistics tree (ST) during a one-time scan of the detailed data. In order to optimize the queries involving constraints on hierarchy levels of the underlying dimensions, we select andmaterialize a family of candidate trees, which represent superviews over the different hierarchical levels of the dimensions. Given a query, our query evaluation algorithm selects the smallest tree in the family, which can provide the answer. Extensive evaluations of our prototype implementation have demonstrated its superior run-time performance and scalability when compared with existing MOLAP and ROLAP systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. S. Agarwal, R. Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, and S. Sarawagi, “On the computation of multidimensional aggregates,” in 22th International Conference on Very Large Data Bases, Mumbai (Bomabi), India, 1996.

  2. R. Agrawal, A. Gupta, and S. Sarawagi, “Modeling multidimensional databases,” in Thirteenth International Conference on Database Engineering, Birmingham, UK, 1997.

  3. Arbor Systems, “Large-scale data warehousing using hyperion essbase OLAP technology,” Arbor Systems, White Paper 1997.

  4. K. Beyer and R. Ramakrishnan, “Bottom-up computation of sparse and iceberg CUBEs,” in ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, 1999.

  5. S. Chaudhuri and U. Dayal, “Data warehousing and OLAP for decision support,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 26, pp. 507–508, 1997.

    Google Scholar 

  6. S. Chaudhuri and U. Dayal, “An overview of data warehousing and OLAP technology,” SIGMOD Record, vol. 26, pp. 65–74, 1997.

    Google Scholar 

  7. C.Y. Chan and Y.E. Ioannidis, “Bitmap index design and evaluation,” in 1998 ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, 1998.

  8. E.F. Codd, S.B. Codd, and C.T. Salley, “Beyond decision support,” in Computer World, vol. 27, 1993, www.arborsoft.com/OLAP.html.

  9. E.F. Codd, S.B. Codd, and C.T. Salley, “Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate,” Technical Report 1993.

  10. D. Comer, “The ubiquitous Btree,” ACM Computing Surveys, vol. 11, pp. 121–137, 1979.

    Google Scholar 

  11. C.E. Dyreson, “Information retrieval from an incomplete data cube,” in Twenty-Second International Conference on Very Large Data Bases, Mumbai (Bombai), India, 1996.

  12. L. Fu and J. Hammer, “CubiST: A new algorithm for improving the performance of ad-hoc OLAP queries,” 0ACM Third International Workshop on Data Warehousing and OLAP (DOLAP), Washington, DC, 2000.

  13. S. Goil and A. Choudhary, “High performance OLAP and data mining on parallel computers,” Journal of Data Mining and Knowledge Discovery, vol. 1, pp. 391–417, 1997.

    Google Scholar 

  14. S. Goil and A. Choudhary, “PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining,” Journal of Parallel and Distributed Computing, vol. 61, pp. 285–321, 2001.

    Google Scholar 

  15. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh, “Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals,” Data Mining and Knowledge Discovery, vol. 1, pp. 29–53, 1997.

    Google Scholar 

  16. H. Gupta and I. Mumick, “Selection of views to materialize under a maintenance cost constraint,” Stanford University, Technical Report, 1997.

  17. V. Harinarayan, A. Rajaraman, and J.D. Ullman, “Implementing data cubes efficiently,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 25, pp. 205–216, 1996.

    Google Scholar 

  18. Information Advantage, “Business intelligence,” White Paper, 1998.

  19. Informix Corp., “Informix red brick decision server,” 2001, http://www.informix.com/redbrick/.

  20. T. Johnson and D. Shasha, “Some approaches to index design for cube forests,” Bulletin of the Technical Committee on Data Engineering, IEEE Computer Society, vol. 20, pp. 27–35, 1997.

    Google Scholar 

  21. W. Labio, D. Quass, and B. Adelberg, “Physical database design for data warehouses,” in International Conference on Database Engineering, Birmingham, England, 1997.

  22. M. Lee and J. Hammer, “Speeding upwarehouse physical design using a randomized algorithm,” International Journal of Cooperative Information Systems (IJCIS)—Special Issue on Design and Management of Data Warehouses, vol. 10, pp. 327–354, 2001.

    Google Scholar 

  23. D. Lomet, “Bulletin of the technical committee on data engineering,” in Special Issue on Materialized Views and Data Warehousing, J. Widom (Ed.), IEEE Computer Society, 1995, vol. 18.

  24. Microsoft Corp., “Microsoft SQL server,” Microsoft, Seattle, WA, White Paper.

  25. MicroStrategy Inc., “The case for relational OLAP,” MicroStrategy, White Paper.

  26. P. O'Neil and D. Quass, “Improved query performance with variant indexes,” SIGMOD Record (ACMSpecial Interest Group on Management of Data), vol. 26, pp. 38–49, 1997.

    Google Scholar 

  27. P.E. O'Neil, “Model 204 architecture and performance,” 2nd International Workshop on High Performance Transaction Systems, Asilomar, CA, 1987.

  28. Oracle Corp., “Oracle express OLAP technology,” http://www.oracle.com/olap/index.html.

  29. Oracle Corp., “Oracle express server documentation,” Oracle Corporation, Redwood Shores, CA, Documentation, August 2001.

  30. Pilot Software Inc., “An introduction to OLAP multidimensional terminology and technology,” Pilot Software, Cambridge, MA, White Paper.

  31. Redbrick Systems, “Decision-makers, business data and RISQL,” Informix, Los Gatos, CA, White Paper, 1997.

  32. N. Roussopoulos, Y. Kotidis, and M. Roussopoulos, “Cubetree: Organization of and bulk updates on the data cube,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 26, pp. 89–99, 1997.

    Google Scholar 

  33. N. Roussopoulos and D. Leifker, “Direct spatial search on pictorial databases using packed R-trees,” in 1985 ACM SIGMOD International Conference on Management of Data, Austin, Texas, 1985.

  34. R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational tables,” in 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, 1996.

  35. J. Srivastava, J.S.E. Tan, and V.Y. Lum, “TBSAM: An access method for efficient processing of statistical queries,” IEEE Transactions on Knowledge and Data Engineering, vol. 1, pp. 414–423, 1989.

    Google Scholar 

  36. Transaction Processing Performance Council, “The TPC BenchmarkTMH,” Transaction Processing Council, 2001, http://www.tpc.org/tpch/.

  37. Transaction Processing Performance Council, “Transaction processing performance council,” 2001, http://www.tpc.org/.

  38. W.P. Yan and P. Larson, “Eager aggregation and lazy aggregation,” in 21th International Conference on Very Large Data Bases, Zurich, Switzerland, 1995.

  39. Y. Zhao, P.M. Deshpande, and J.F. Naughton, “An array-based algorithm for simultaneous multidimensional aggregates,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 26, pp. 159–170, 1997.

    Google Scholar 

  40. Y. Zhuge, H. Garcia-Molina, and J.L. Wiener, “Consistency algorithms for multi-source warehouse view maintenance,” Distributed and Parallel Databases, vol. 6, pp. 7–40, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hammer, J., Fu, L. CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees. Distributed and Parallel Databases 14, 221–254 (2003). https://doi.org/10.1023/A:1025537315785

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025537315785

Navigation