PnP: sequential, external memory, and parallel iceberg cube computation

Chen, Ying; Dehne, Frank; Eavis, Todd; Rau-Chaplin, Andrew

doi:10.1007/s10619-007-7023-y

PnP: sequential, external memory, and parallel iceberg cube computation

Published: 03 January 2008

Volume 23, pages 99–126, (2008)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Ying Chen¹,
Frank Dehne²,
Todd Eavis³ &
…
Andrew Rau-Chaplin⁴

102 Accesses
12 Citations
Explore all metrics

Abstract

We present “Pipe ’n Prune” (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular strength of PnP is that it is efficient for all of the following scenarios: (1) Sequential iceberg-cube queries, (2) External memory iceberg-cube queries, and (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks.

We performed an extensive performance analysis of PnP for the above scenarios with the following main results: In the first scenario PnP performs very well for both dense and sparse data sets, providing an interesting alternative to BUC and Star-Cubing. In the second scenario PnP shows a surprisingly efficient handling of disk I/O, with an external memory running time that is less than twice the running time for full in-memory computation of the same iceberg-cube query. In the third scenario PnP scales very well, providing near linear speedup for a larger number of processors and thereby solving the scalability problem observed for the parallel iceberg-cubes proposed by Ng et al.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable distributed data cube computation for large-scale multidimensional data analysis on a Spark cluster

Article 01 February 2018

SPIN: Concurrent Workload Scaling over Data Warehouses

A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

References

Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J., Ramakrishnan, R., Sarawagi, S.: On the computation of multidimensional aggregates. In: Proceedings of the 22nd International VLDB Conference, pp. 506–521 (1996)
Beyer, K., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: Proceedings of the 1999 ACM SIGMOD Conference, pp. 359–370 (1999)
Chen, Y., Dehne, F., Eavis, T., Rau-Chaplin, A.: Building large ROLAP data cubes in parallel. In: Proceedings of the 8th International Database Engineering and Applications Symposium (IDEAS ’04) (2004)
Chen, Y., Dehne, F., Eavis, T., Rau-Chaplin, A.: Parallel ROLAP data cube construction on shared-nothing multiprocessors. Distributed Parallel Databases 15, 219–236 (2004)
Article Google Scholar
Codd, E.F.: Providing OLAP (on-line analytical processing) to user-analysts: an IT mandate. Technical report, E.F. Codd and Associates (1993)
Dehne, F., Eavis, T., Hambrusch, S., Rau-Chaplin, A.: Parallelizing the datacube. In: International Conference on Database Theory (2001)
Dehne, F., Eavis, T., Rau-Chaplin, A.: A cluster architecture for parallel data warehousing. In: International Conference on Cluster Computing and the Grid (CCGRID 2001) (2001)
Dehne, F., Eavis, T., Rau-Chaplin, A.: Computing partial data cubes for parallel data warehousing applications. In: Euro PVM/MPI 2001 (2001)
Dehne, F., Eavis, T., Rau-Chaplin, A.: Parallelizing the datacube. Distributed Parallel Databases 11(2), 181–201 (2002)
MATH Google Scholar
DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)
Article Google Scholar
Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.: Computing iceberg queries efficiently. In: Proceedings VLDB, pp. 299–310 (1998)
Goil, S., Choudhary, A.: High performance OLAP and data mining on parallel computers. J. Data Min. Knowl. Discov. (4) (1997)
Goil, S., Choudhary, A.: High performance multidimensional analysis of large datasets. In: Proceedings of the First ACM International Workshop on Data Warehousing and OLAP, pp. 34–39 (1998)
Goil, S., Choudhary, A.: A parallel scalable infrastructure for OLAP and data mining. In: International Database Engineering and Application Symposium, pp. 178–186 (1999)
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Proceedings of the 12th International Conference on Data Engineering, pp. 152–159 (1996)
Han, J.: Software download site. http://www-sal.cs.uiuc.edu/hanj/pubs/software.htm
Harinarayan, V., Rajaraman, A., Ullman, J.: Implementing data cubes. In: Proceedings of the 1996 ACM SIGMOD Conference, pp. 205–216 (1996)
HYDRO1k Elevation Derivative Database. http://edcdaac.usgs.gov/gtopo30/hydro/index.asp. Last visited: Oct. 25th, 2006
Lakshmanan, L.V.S., Pei, J., Han, J.: Quotient cube: How to summarize the semantics of a data cube. In: Proceedings of the 28th VLDB Conference (2002)
Lakshmanan, L.V.S., Pei, J., Zhao, Y.: QC-trees: An efficient summary structure for semantic OLAP. In: Proceedings of the 2003 ACM SIGMOD Conference, pp. 64–75 (2003)
Lu, H., Yu, J.X., Feng, L., Li, X.: Fully dynamic partitioning: Handling data skew in parallel data cube computation. Distributed Parallel Databases 13, 181–202 (2003)
Article MATH Google Scholar
Muto, S., Kitsuregawa, M.: A dynamic load balancing strategy for parallel datacube computation. In: ACM 2nd Annual Workshop on Data Warehousing and OLAP, pp. 67–72 (1999)
Ng, R., Wagner, A., Yin, Y.: Iceberg-cube computation with PC clusters. In: Proceedings of 2001 ACM SIGMOD Conference on Management of Data, pp. 25–36 (2001)
Ross, K., Srivastava, D.: Fast computation of sparse data cubes. In: Proceedings of the 23rd VLDB Conference, pp. 116–125 (1997)
Roussopoulos, N., Kotidis, Y., Roussopolis, M.: Cubetree: Organization of the bulk incremental updates on the data cube. In: Proceedings of the 1997 ACM SIGMOD Conference, pp. 89–99 (1997)
Sarawagi, S., Agrawal, R., Gupta, A.: On computing the data cube. Technical Report RJ10026, IBM Almaden Research Center, San Jose, California (1996)
Sismanis, Y., Deligiannakis, A., Roussopolos, N., Kotidis, Y.: Dwarf: Shrinking the petacube. In: Proceedings of the 2002 ACM SIGMOD Conference, pp. 464–475 (2002)
Wang, W., Feng, J., Lu, H., Yu, J.X.: Condensed cube: An effective approach to reducing data cube size. In: Proceedings of the International Conference on Data Engineering (2002)
Winter Corporation. 2005 Top Ten Program Summary: The survey of the world’s largest databases. http://www.wintercorp.com/WhitePapers/WC_TopTenWP.pdf. Last visited Oct. 25th 2006
Xin, D., Han, J., Li, X., Wah, B.W.: Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In: Proceedings Int. Conf. on Very Large Data Bases (VLDB’03) (2003)
Zhao, Y., Deshpande, P., Naughton, J.: An array-based algorithm for simultaneous multi-dimensional aggregates. In: Proceedings of the 1997 ACM SIGMOD Conference, pp. 159–170 (1997)

Download references

Author information

Authors and Affiliations

Microsoft Corp., Redmond, WA, USA
Ying Chen
School of Computer Science, Carleton University, Ottawa, Canada
Frank Dehne
Department of Computer Science, Concordia University, Montreal, Canada
Todd Eavis
Faculty of Computer Science, Dalhousie University, Halifax, Canada
Andrew Rau-Chaplin

Authors

Ying Chen
View author publications
You can also search for this author in PubMed Google Scholar
Frank Dehne
View author publications
You can also search for this author in PubMed Google Scholar
Todd Eavis
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Rau-Chaplin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Rau-Chaplin.

Additional information

Research partially supported by the Natural Sciences and Engineering Research Council of Canada. A preliminary version of this work appeared in the International Conference on Data Engineering (ICDE’05).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Dehne, F., Eavis, T. et al. PnP: sequential, external memory, and parallel iceberg cube computation. Distrib Parallel Databases 23, 99–126 (2008). https://doi.org/10.1007/s10619-007-7023-y

Download citation

Received: 17 November 2006
Accepted: 16 December 2007
Published: 03 January 2008
Issue Date: April 2008
DOI: https://doi.org/10.1007/s10619-007-7023-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PnP: sequential, external memory, and parallel iceberg cube computation

Abstract

Access this article

Similar content being viewed by others

Scalable distributed data cube computation for large-scale multidimensional data analysis on a Spark cluster

SPIN: Concurrent Workload Scaling over Data Warehouses

A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PnP: sequential, external memory, and parallel iceberg cube computation

Abstract

Access this article

Similar content being viewed by others

Scalable distributed data cube computation for large-scale multidimensional data analysis on a Spark cluster

SPIN: Concurrent Workload Scaling over Data Warehouses

A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation