Skip to main content

Java as a basis for parallel data mining in workstation clusters

  • Workshop: Java in HPC
  • Conference paper
  • First Online:
High-Performance Computing and Networking (HPCN-Europe 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1593))

Included in the following conference series:

Abstract

The exploitation of hidden information from large datasets by means of data mining techniques suffers from long response times. We address this problem by using the processing power of workstation clusters and have studied the performance of OLAP queries as a first step towards a portable data mining platform.

The results of our study suggest that with the availability of parallel workstation clusters that are equipped with high performance communication networks, fine-grained and communication-intensive parallelizations of queries are promising—even though they are considered too costly in traditional database systems.

The paper describes our Java framework for parallel OLAP-type query execution, necessary optimizations to the standard Java implementation, and analyzes the performance of non-standard parallel execution schemes on a workstation cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. Bergsten, M. Couprie, and M. Lopez. DBS3: A parallel data base system for shared store (synopsis). In Proc. Parallel and Distr. Inf. Sys., San Diego, CA, January 1993.

    Google Scholar 

  2. Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jarov N. Seizovic, and Wen-King Su. Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro, 15(1):29–36, February 1995.

    Article  Google Scholar 

  3. Chee-Yong Chan and Yannis E Ioannidis. Bitmap index design and evaluation. In Proceedings of the SIGMOD International Conference on Management of Data, SIGMOD Record. ACM Press, 1998.

    Google Scholar 

  4. Transaction Processing Council. http://www.tpc.org/dspec.html.

    Google Scholar 

  5. David DeWitt and Jim Gray. Parallel database systems: The future of high-performance database systems. Comm. of the ACM, 35(6):85–98, June 1992.

    Article  Google Scholar 

  6. D.J. DeWitt, R.H. Gerber, G. Graefe, M.L. Heytens, K.B. Kumar, and M. Muralikrishna. Gamma-a high performance dataflow database machine. In 12th Conference on Very Large Data Bases (VLDB), pages 228–237, Kyoto, Japan, August 1986.

    Google Scholar 

  7. A.A. Freitas and S.H. Lavington. Mining very large Databases with parallel processing. Kluwer Academic Publishers, 1998.

    Google Scholar 

  8. Bernhard Haumacher and Michael Philippsen. More efficient object serialization. In International Workshop on Java for Parallel and Distributed Computing, Puerto Rico, April 12–16 1999.

    Google Scholar 

  9. Wei Hong. Exploiting inter-operation parallelism in XPRS. In Proceedings of the SIGMOD International Conference on Management of Data, volume 21–2 of SIGMOD Record, pages 19–28, New York, NY, USA, June 1992. ACM Press.

    Google Scholar 

  10. Informix dynamic server v7.3. White paper, Informix Corp., 1998.

    Google Scholar 

  11. W.H. Inmon, Ken Rudin, C.K. Buss, and R. Sousa. Data Warehouse Performance. Wiley Computer Publishing, New York, USA, 1998.

    Google Scholar 

  12. JavaParty. http://wwwipd.ira.uka.de/JavaParty.

    Google Scholar 

  13. Oracle7 server. scalable parallel architecture for open data warehousing. White paper, Oracle Corp., 1995.

    Google Scholar 

  14. Michael Philippsen and Matthias Zenger. JavaParty: Transparent remote objects in Java. Concurrency: Practice and Experience, 9(11):1225–1242, November 1997.

    Article  Google Scholar 

  15. P. Valduriez. Parallel database systems: Open problems and new issues. Distributed and parallel Databases, 1(2):137–165, April 1993.

    Article  Google Scholar 

  16. Sunita Sarawagi, Shiby Thomas, and Rakesh Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. SIGMOD Record (ACM Special Interest Group on Management of Data), 27(2), 1998.

    Google Scholar 

  17. Thomas M. Warschko, Joachim M. Blum, and Walter F. Tichy. ParaStation: Efficient parallel computing by clustering workstations: Design and evaluation. Journal of Systems Architecture, 44:241–260, December 1997. Elsevier Science Inc., New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Peter Sloot Marian Bubak Alfons Hoekstra Bob Hertzberger

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag

About this paper

Cite this paper

Gimbel, M. (1999). Java as a basis for parallel data mining in workstation clusters. In: Sloot, P., Bubak, M., Hoekstra, A., Hertzberger, B. (eds) High-Performance Computing and Networking. HPCN-Europe 1999. Lecture Notes in Computer Science, vol 1593. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0100648

Download citation

  • DOI: https://doi.org/10.1007/BFb0100648

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65821-4

  • Online ISBN: 978-3-540-48933-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics