High Performance OLAP and Data Mining on Parallel Computers

Goil, Sanjay; Choudhary, Alok

doi:10.1023/A:1009777418785

High Performance OLAP and Data Mining on Parallel Computers

Published: December 1997

Volume 1, pages 391–417, (1997)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Sanjay Goil¹ &
Alok Choudhary¹

422 Accesses
33 Citations
Explore all metrics

Abstract

On-Line Analytical Processing (OLAP) techniques are increasingly being used in decision support systems to provide analysis of data. Queries posed on such systems are quite complex and require different views of data. Analytical models need to capture the multidimensionality of the underlying data, a task for which multidimensional databases are well suited. Multidimensional OLAP systems store data in multidimensional arrays on which analytical operations are performed. Knowledge discovery and data mining requires complex operations on the underlying data which can be very expensive in terms of computation time. High performance parallel systems can reduce this analysis time.

Precomputed aggregate calculations in a Data Cube can provide efficient query processing for OLAP applications. In this article, we present algorithms for construction of data cubes on distributed-memory parallel computers. Data is loaded from a relational database into a multidimensional array. We present two methods, sort-based and hash-based for loading the base cube and compare their performances. Data cubes are used to perform consolidation queries used in roll-up operations using dimension hierarchies. Finally, we show how data cubes are used for data mining using Attribute Focusing techniques. We present results for these on the IBM-SP2 parallel machine. Results show that our algorithms and techniques for OLAP and data mining on parallel systems are scalable to a large number of processors, providing a high performance platform for such applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bhandari I., Halliday M., Tarver E., Brown D., Chaar J. and Chillarege R., “A case study of software process improvement during development”, IEEE Transactions on Software Engineering, 19(12), December 1993, pp. 1157–1170.
Article Google Scholar
Bhandari I., “Attribute Focusing: Data mining for the layman”, Research Report RC 20136, IBM T.J Watson Research Center.
Bhandari I., Colet E., et al., “Advanced Scout: Data Mining and Knowledge Discovery in NBA Data”, Research Report RC 20443, IBM T.J Watson Research Center, 1996.
Codd E. F., “Providing OLAP to user-analysts: An IT mandate”, Technical Report, E.F. Codd and Associates, 1993.
Fayyad U.M, Piatesky-Shapiro G., Smyth P. and Uthurusamy R., “From data mining to knowledge discovery: An overview”, Advances in data mining and knowledge discovery, MIT Press, pp. 1–34.
Goil S. and Choudhary A., “Parallel Data Cube Construction for High Performance On-Line Analytical Processing”, To appear in the 4th International Conference on High Performance Computing, Bangalore, India.
Gray J., Bosworth A., Layman A and Pirahesh H., “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals”, Proc. International Conference on Data Engineering, 1996.
Guting A., “An Introduction to Spatial Databases”, VLDB Journal, 3, 1994, pp. 357–399.
Article Google Scholar
Harinarayan V., Rajaraman A. and Ullman J. D., “Implementing Data Cubes Efficiently”, Proc. SIGMOD'96.
Kumar V., Grama A., Gupta A. and Karypis G., “Introduction to Parallel Computing: Design and Analysis of Algorithms”, Benjamin Cummings Publishing Company, California, 1994. “OLAP Council Benchmark” available from http://www.olapcouncil.org
Google Scholar
Sarawagi S., Agrawal R., and Gupta A., “On Computing the Data Cube”, Research Report 10026, IBM Almaden Research Center, San Jose, California, 1996.
Google Scholar
S. Sarawagi and M. Stonebraker, “Efficient Organization of Large Multidimensional Arrays”, Proc. of the Eleventh International Conference on Data Engineering, Houston, February 1994.
Zhao Y., Tufte K. and Naughton J., “On the Performance of an Array-Based ADT for OLAP Workloads”, Technical Report, University of Wisconsin, Madison, May 1996
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering and Center for Parallel and Distributed Computing, Northwestern University, Evanston, IL, 60201
Sanjay Goil & Alok Choudhary

Authors

Sanjay Goil
View author publications
You can also search for this author in PubMed Google Scholar
Alok Choudhary
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goil, S., Choudhary, A. High Performance OLAP and Data Mining on Parallel Computers. Data Mining and Knowledge Discovery 1, 391–417 (1997). https://doi.org/10.1023/A:1009777418785

Download citation

Issue Date: December 1997
DOI: https://doi.org/10.1023/A:1009777418785

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High Performance OLAP and Data Mining on Parallel Computers

Abstract

Access this article

Similar content being viewed by others

Parallel Database Systems

Parallel processing of very large databases using distributed column indexes

A Query Processing Framework for Large-Scale Scientific Data Analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

High Performance OLAP and Data Mining on Parallel Computers

Abstract

Access this article

Similar content being viewed by others

Parallel Database Systems

Parallel processing of very large databases using distributed column indexes

A Query Processing Framework for Large-Scale Scientific Data Analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation