Abstract
The exploitation of hidden information from large datasets by means of data mining techniques suffers from long response times. We address this problem by using the processing power of workstation clusters and have studied the performance of OLAP queries as a first step towards a portable data mining platform.
The results of our study suggest that with the availability of parallel workstation clusters that are equipped with high performance communication networks, fine-grained and communication-intensive parallelizations of queries are promising—even though they are considered too costly in traditional database systems.
The paper describes our Java framework for parallel OLAP-type query execution, necessary optimizations to the standard Java implementation, and analyzes the performance of non-standard parallel execution schemes on a workstation cluster.
Preview
Unable to display preview. Download preview PDF.
References
B. Bergsten, M. Couprie, and M. Lopez. DBS3: A parallel data base system for shared store (synopsis). In Proc. Parallel and Distr. Inf. Sys., San Diego, CA, January 1993.
Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jarov N. Seizovic, and Wen-King Su. Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro, 15(1):29–36, February 1995.
Chee-Yong Chan and Yannis E Ioannidis. Bitmap index design and evaluation. In Proceedings of the SIGMOD International Conference on Management of Data, SIGMOD Record. ACM Press, 1998.
Transaction Processing Council. http://www.tpc.org/dspec.html.
David DeWitt and Jim Gray. Parallel database systems: The future of high-performance database systems. Comm. of the ACM, 35(6):85–98, June 1992.
D.J. DeWitt, R.H. Gerber, G. Graefe, M.L. Heytens, K.B. Kumar, and M. Muralikrishna. Gamma-a high performance dataflow database machine. In 12th Conference on Very Large Data Bases (VLDB), pages 228–237, Kyoto, Japan, August 1986.
A.A. Freitas and S.H. Lavington. Mining very large Databases with parallel processing. Kluwer Academic Publishers, 1998.
Bernhard Haumacher and Michael Philippsen. More efficient object serialization. In International Workshop on Java for Parallel and Distributed Computing, Puerto Rico, April 12–16 1999.
Wei Hong. Exploiting inter-operation parallelism in XPRS. In Proceedings of the SIGMOD International Conference on Management of Data, volume 21–2 of SIGMOD Record, pages 19–28, New York, NY, USA, June 1992. ACM Press.
Informix dynamic server v7.3. White paper, Informix Corp., 1998.
W.H. Inmon, Ken Rudin, C.K. Buss, and R. Sousa. Data Warehouse Performance. Wiley Computer Publishing, New York, USA, 1998.
JavaParty. http://wwwipd.ira.uka.de/JavaParty.
Oracle7 server. scalable parallel architecture for open data warehousing. White paper, Oracle Corp., 1995.
Michael Philippsen and Matthias Zenger. JavaParty: Transparent remote objects in Java. Concurrency: Practice and Experience, 9(11):1225–1242, November 1997.
P. Valduriez. Parallel database systems: Open problems and new issues. Distributed and parallel Databases, 1(2):137–165, April 1993.
Sunita Sarawagi, Shiby Thomas, and Rakesh Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. SIGMOD Record (ACM Special Interest Group on Management of Data), 27(2), 1998.
Thomas M. Warschko, Joachim M. Blum, and Walter F. Tichy. ParaStation: Efficient parallel computing by clustering workstations: Design and evaluation. Journal of Systems Architecture, 44:241–260, December 1997. Elsevier Science Inc., New York.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1999 Springer-Verlag
About this paper
Cite this paper
Gimbel, M. (1999). Java as a basis for parallel data mining in workstation clusters. In: Sloot, P., Bubak, M., Hoekstra, A., Hertzberger, B. (eds) High-Performance Computing and Networking. HPCN-Europe 1999. Lecture Notes in Computer Science, vol 1593. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0100648
Download citation
DOI: https://doi.org/10.1007/BFb0100648
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65821-4
Online ISBN: 978-3-540-48933-7
eBook Packages: Springer Book Archive