Skip to main content
Log in

PnP: sequential, external memory, and parallel iceberg cube computation

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

We present “Pipe ’n Prune” (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular strength of PnP is that it is efficient for all of the following scenarios: (1) Sequential iceberg-cube queries, (2) External memory iceberg-cube queries, and (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks.

We performed an extensive performance analysis of PnP for the above scenarios with the following main results: In the first scenario PnP performs very well for both dense and sparse data sets, providing an interesting alternative to BUC and Star-Cubing. In the second scenario PnP shows a surprisingly efficient handling of disk I/O, with an external memory running time that is less than twice the running time for full in-memory computation of the same iceberg-cube query. In the third scenario PnP scales very well, providing near linear speedup for a larger number of processors and thereby solving the scalability problem observed for the parallel iceberg-cubes proposed by Ng et al.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J., Ramakrishnan, R., Sarawagi, S.: On the computation of multidimensional aggregates. In: Proceedings of the 22nd International VLDB Conference, pp. 506–521 (1996)

  2. Beyer, K., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: Proceedings of the 1999 ACM SIGMOD Conference, pp. 359–370 (1999)

  3. Chen, Y., Dehne, F., Eavis, T., Rau-Chaplin, A.: Building large ROLAP data cubes in parallel. In: Proceedings of the 8th International Database Engineering and Applications Symposium (IDEAS ’04) (2004)

  4. Chen, Y., Dehne, F., Eavis, T., Rau-Chaplin, A.: Parallel ROLAP data cube construction on shared-nothing multiprocessors. Distributed Parallel Databases 15, 219–236 (2004)

    Article  Google Scholar 

  5. Codd, E.F.: Providing OLAP (on-line analytical processing) to user-analysts: an IT mandate. Technical report, E.F. Codd and Associates (1993)

  6. Dehne, F., Eavis, T., Hambrusch, S., Rau-Chaplin, A.: Parallelizing the datacube. In: International Conference on Database Theory (2001)

  7. Dehne, F., Eavis, T., Rau-Chaplin, A.: A cluster architecture for parallel data warehousing. In: International Conference on Cluster Computing and the Grid (CCGRID 2001) (2001)

  8. Dehne, F., Eavis, T., Rau-Chaplin, A.: Computing partial data cubes for parallel data warehousing applications. In: Euro PVM/MPI 2001 (2001)

  9. Dehne, F., Eavis, T., Rau-Chaplin, A.: Parallelizing the datacube. Distributed Parallel Databases 11(2), 181–201 (2002)

    MATH  Google Scholar 

  10. DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)

    Article  Google Scholar 

  11. Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.: Computing iceberg queries efficiently. In: Proceedings VLDB, pp. 299–310 (1998)

  12. Goil, S., Choudhary, A.: High performance OLAP and data mining on parallel computers. J. Data Min. Knowl. Discov. (4) (1997)

  13. Goil, S., Choudhary, A.: High performance multidimensional analysis of large datasets. In: Proceedings of the First ACM International Workshop on Data Warehousing and OLAP, pp. 34–39 (1998)

  14. Goil, S., Choudhary, A.: A parallel scalable infrastructure for OLAP and data mining. In: International Database Engineering and Application Symposium, pp. 178–186 (1999)

  15. Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Proceedings of the 12th International Conference on Data Engineering, pp. 152–159 (1996)

  16. Han, J.: Software download site. http://www-sal.cs.uiuc.edu/hanj/pubs/software.htm

  17. Harinarayan, V., Rajaraman, A., Ullman, J.: Implementing data cubes. In: Proceedings of the 1996 ACM SIGMOD Conference, pp. 205–216 (1996)

  18. HYDRO1k Elevation Derivative Database. http://edcdaac.usgs.gov/gtopo30/hydro/index.asp. Last visited: Oct. 25th, 2006

  19. Lakshmanan, L.V.S., Pei, J., Han, J.: Quotient cube: How to summarize the semantics of a data cube. In: Proceedings of the 28th VLDB Conference (2002)

  20. Lakshmanan, L.V.S., Pei, J., Zhao, Y.: QC-trees: An efficient summary structure for semantic OLAP. In: Proceedings of the 2003 ACM SIGMOD Conference, pp. 64–75 (2003)

  21. Lu, H., Yu, J.X., Feng, L., Li, X.: Fully dynamic partitioning: Handling data skew in parallel data cube computation. Distributed Parallel Databases 13, 181–202 (2003)

    Article  MATH  Google Scholar 

  22. Muto, S., Kitsuregawa, M.: A dynamic load balancing strategy for parallel datacube computation. In: ACM 2nd Annual Workshop on Data Warehousing and OLAP, pp. 67–72 (1999)

  23. Ng, R., Wagner, A., Yin, Y.: Iceberg-cube computation with PC clusters. In: Proceedings of 2001 ACM SIGMOD Conference on Management of Data, pp. 25–36 (2001)

  24. Ross, K., Srivastava, D.: Fast computation of sparse data cubes. In: Proceedings of the 23rd VLDB Conference, pp. 116–125 (1997)

  25. Roussopoulos, N., Kotidis, Y., Roussopolis, M.: Cubetree: Organization of the bulk incremental updates on the data cube. In: Proceedings of the 1997 ACM SIGMOD Conference, pp. 89–99 (1997)

  26. Sarawagi, S., Agrawal, R., Gupta, A.: On computing the data cube. Technical Report RJ10026, IBM Almaden Research Center, San Jose, California (1996)

  27. Sismanis, Y., Deligiannakis, A., Roussopolos, N., Kotidis, Y.: Dwarf: Shrinking the petacube. In: Proceedings of the 2002 ACM SIGMOD Conference, pp. 464–475 (2002)

  28. Wang, W., Feng, J., Lu, H., Yu, J.X.: Condensed cube: An effective approach to reducing data cube size. In: Proceedings of the International Conference on Data Engineering (2002)

  29. Winter Corporation. 2005 Top Ten Program Summary: The survey of the world’s largest databases. http://www.wintercorp.com/WhitePapers/WC_TopTenWP.pdf. Last visited Oct. 25th 2006

  30. Xin, D., Han, J., Li, X., Wah, B.W.: Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In: Proceedings Int. Conf. on Very Large Data Bases (VLDB’03) (2003)

  31. Zhao, Y., Deshpande, P., Naughton, J.: An array-based algorithm for simultaneous multi-dimensional aggregates. In: Proceedings of the 1997 ACM SIGMOD Conference, pp. 159–170 (1997)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Rau-Chaplin.

Additional information

Research partially supported by the Natural Sciences and Engineering Research Council of Canada. A preliminary version of this work appeared in the International Conference on Data Engineering (ICDE’05).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Dehne, F., Eavis, T. et al. PnP: sequential, external memory, and parallel iceberg cube computation. Distrib Parallel Databases 23, 99–126 (2008). https://doi.org/10.1007/s10619-007-7023-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-007-7023-y

Keywords

Navigation