Skip to main content
Log in

High Performance OLAP and Data Mining on Parallel Computers

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

On-Line Analytical Processing (OLAP) techniques are increasingly being used in decision support systems to provide analysis of data. Queries posed on such systems are quite complex and require different views of data. Analytical models need to capture the multidimensionality of the underlying data, a task for which multidimensional databases are well suited. Multidimensional OLAP systems store data in multidimensional arrays on which analytical operations are performed. Knowledge discovery and data mining requires complex operations on the underlying data which can be very expensive in terms of computation time. High performance parallel systems can reduce this analysis time.

Precomputed aggregate calculations in a Data Cube can provide efficient query processing for OLAP applications. In this article, we present algorithms for construction of data cubes on distributed-memory parallel computers. Data is loaded from a relational database into a multidimensional array. We present two methods, sort-based and hash-based for loading the base cube and compare their performances. Data cubes are used to perform consolidation queries used in roll-up operations using dimension hierarchies. Finally, we show how data cubes are used for data mining using Attribute Focusing techniques. We present results for these on the IBM-SP2 parallel machine. Results show that our algorithms and techniques for OLAP and data mining on parallel systems are scalable to a large number of processors, providing a high performance platform for such applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Bhandari I., Halliday M., Tarver E., Brown D., Chaar J. and Chillarege R., “A case study of software process improvement during development”, IEEE Transactions on Software Engineering, 19(12), December 1993, pp. 1157–1170.

    Article  Google Scholar 

  • Bhandari I., “Attribute Focusing: Data mining for the layman”, Research Report RC 20136, IBM T.J Watson Research Center.

  • Bhandari I., Colet E., et al., “Advanced Scout: Data Mining and Knowledge Discovery in NBA Data”, Research Report RC 20443, IBM T.J Watson Research Center, 1996.

  • Codd E. F., “Providing OLAP to user-analysts: An IT mandate”, Technical Report, E.F. Codd and Associates, 1993.

  • Fayyad U.M, Piatesky-Shapiro G., Smyth P. and Uthurusamy R., “From data mining to knowledge discovery: An overview”, Advances in data mining and knowledge discovery, MIT Press, pp. 1–34.

  • Goil S. and Choudhary A., “Parallel Data Cube Construction for High Performance On-Line Analytical Processing”, To appear in the 4th International Conference on High Performance Computing, Bangalore, India.

  • Gray J., Bosworth A., Layman A and Pirahesh H., “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals”, Proc. International Conference on Data Engineering, 1996.

  • Guting A., “An Introduction to Spatial Databases”, VLDB Journal, 3, 1994, pp. 357–399.

    Article  Google Scholar 

  • Harinarayan V., Rajaraman A. and Ullman J. D., “Implementing Data Cubes Efficiently”, Proc. SIGMOD'96.

  • Kumar V., Grama A., Gupta A. and Karypis G., “Introduction to Parallel Computing: Design and Analysis of Algorithms”, Benjamin Cummings Publishing Company, California, 1994. “OLAP Council Benchmark” available from http://www.olapcouncil.org

    Google Scholar 

  • Sarawagi S., Agrawal R., and Gupta A., “On Computing the Data Cube”, Research Report 10026, IBM Almaden Research Center, San Jose, California, 1996.

    Google Scholar 

  • S. Sarawagi and M. Stonebraker, “Efficient Organization of Large Multidimensional Arrays”, Proc. of the Eleventh International Conference on Data Engineering, Houston, February 1994.

  • Zhao Y., Tufte K. and Naughton J., “On the Performance of an Array-Based ADT for OLAP Workloads”, Technical Report, University of Wisconsin, Madison, May 1996

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goil, S., Choudhary, A. High Performance OLAP and Data Mining on Parallel Computers. Data Mining and Knowledge Discovery 1, 391–417 (1997). https://doi.org/10.1023/A:1009777418785

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009777418785

Navigation