Clusters and Grids for Distributed and Parallel Knowledge Discovery

Cannataro, Mario

doi:10.1007/3-540-45492-6_86

Mario Cannataro⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1823))

Included in the following conference series:

International Conference on High-Performance Computing and Networking

372 Accesses
13 Citations

Abstract

Parallel and Distributed Knowledge Discovery (PDKD) is emerging as a possible killer application for clusters and grids of computers. The need to process large volumes of data and the availability of parallel data mining algorithms, makes it possible to exploit the increasing computational power of clusters at low costs. On the other side, grid computing is an emerging “standard” to develop and deploy distributed, high performance applications over geographic networks, in different domains, and in particular for data intensive applications. This paper proposes an approach to integrate cluster of computers within a grid infrastructure to use them, enriched by specific data mining services, as the deployment platform for high performance distributed data mining and knowledge discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

G. Piatesky-Shapiro, The data mining Industry coming of age, IEEE Intelligent Systems, pp. 32–34, november/december 1999
Google Scholar
A. Freitas, S. Levington, Mining Very Large Databases with Parallel Processing, Kluwer, 1998.
Google Scholar
M.J.A. Michael, J.A. Berry, Data Mining Techniques, John Wiley & Sons, 1997.
Google Scholar
D. Abramson, From PC Clusters to a Global Computational Grid, 1^st IEEE Workshop on Cluster Computing (IWCC99), Melbourne, 1999.
Google Scholar
R. Moore, Collection-Based Data Management, Workshop on Large-Scale Parallel, KDD Systems (KDD99), San Diego, CA, 1999.
Google Scholar
S. Bailey, E. Creel, R. Grossman, S. Gutti, H. Sivakumar, A high performance implementation of the data space transfer protocol (DSTP), Workshop on Large-Scale Parallel, KDD Systems (KDD99), San Diego, CA, 1999.
Google Scholar
U. Dayal, Large-Scale Data Mining Applications: Requirements and Architectures, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.
Google Scholar
G. Williams, Integrated Delivery of Large-Scale Data Mining Systems, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.
Google Scholar
R. Grossman, S. Kasif, R. Moore, D. Rocke, J. Ullman, Data Mining Research: Opportunities and Challenges, A report on three NFS Workshops on Mining Large, Massive and Distributed Data, available at http://www.ncdm.uic.edu/m3d-finalreport.htm
B. Grossman and Yike Guo, Communicating Data Mining: Issues and Challenges in Wide Area Distributed Data Mining, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.
Google Scholar
V. Kumar, Large-Scale Data Mining: Where is it Headed?, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.
Google Scholar
Building the Grid: An Integrated Services and Toolkit Architecture for Next-Generation Networked Applications, Working Draft, http://www.gridforum.org/building_the_grid.htm.
Foster and C. Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers, 1999.
Google Scholar
Foster, G. H. Thiruvathukal, S. Tuecke, Technologies for Ubiquitous Supercomputing: A Java Interface to the Nexus Communication System, Concurrency: Practice and Experience, special issue edited by G. C. Fox, June 1997.
Google Scholar
The Globus project, available at http://www.globus.org.
The Nimrod project, available at http://www.dgs.monah.edu/~davida/nimrod.html.
Rajkumar Buyya (editor), High Performance Cluster Computing: Architectures and Systems, Prentice Hall PTR, NJ, USA, 1999.
Google Scholar
M. Baker, editor, Cluster Computing White Paper, http://www.dcs.port.ac.uk/~mab/tfcc/WhitePaper/
R. L. Grossman, S. Kasif, D. Mon, A. Ramu and B. Malhi, The Preliminary Design of Papyrus: A System for High Performance, Distributed Data Mining over Clusters, Meta-Clusters and Super-Clusters, Proceedings of the KDD-98 Workshop on Distributed Data Mining, AAAI, 1999.
Google Scholar
S. Stolfo, A. L. Prodromis, P.K. Chan, JAM: Java Agents for Meta-Learning over Distributed Databases, Proc. of the 3^rd Int. Conf. On Knowledge Discovery and data Miing, AAAI Press, CA, 1997.
Google Scholar
Y. Guo et al., Meta Learning for parallel Data Mining, in Proc. o the 7^th Parallel Computing Workshop, 1997.
Google Scholar
Albanese, M. Cannataro, P. Rullo, D. Saccà, Transmitting Datacubes over Congested Networks, Proc. of the IEEE International Conference on Coding and Transmission (ITCC2000), Las Vegas, 2000 (to appear).
Google Scholar
Foster, I., A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems, Proc. of the SC98 Conference, Orlando, USA, Nov. 7–13, 1998.
Google Scholar
DiNucci, D. “The Role and Requirements of a Grid Programming Model”, available at http://www.elepar.com/GPMWG/gpm.1.ps

Download references

Author information

Authors and Affiliations

ISI-CNR, Via P. Bucci, 41/c, 87036, Rende, Italy
Mario Cannataro

Authors

Mario Cannataro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Academic Computer Center CYFRONET, University of Mining and Metallurgy (AGH), al. Mickiewicza 30, 30-059, Cracow, Poland
Marian Bubak
Faculteit der Natuurwetenschappen, Wiskunde en Informatica, Universiteit van Amsterdam, 1098 SJ, Amsterdam, The Netherlands
Hamideh Afsarmanesh & Bob Hertzberger &
California Institute of Technology, Caltech 158-79, Pasadena, CA, 91125, USA
Roy Williams

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cannataro, M. (2000). Clusters and Grids for Distributed and Parallel Knowledge Discovery. In: Bubak, M., Afsarmanesh, H., Hertzberger, B., Williams, R. (eds) High Performance Computing and Networking. HPCN-Europe 2000. Lecture Notes in Computer Science, vol 1823. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45492-6_86

Download citation

DOI: https://doi.org/10.1007/3-540-45492-6_86
Published: 12 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67553-2
Online ISBN: 978-3-540-45492-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics