Parallel and Grid-Based Data Mining – Algorithms, Models and Systems for High-Performance KDD

Congiusta, Antonio; Talia, Domenico; Trunfio, Paolo

doi:10.1007/978-0-387-09823-4_53

Antonio Congiusta³,
Domenico Talia³ &
Paolo Trunfio³

16k Accesses

Summary

Data Mining often is a computing intensive and time requiring process. For this reason, several Data Mining systems have been implemented on parallel computing platforms to achieve high performance in the analysis of large data sets. Moreover, when large data repositories are coupled with geographical distribution of data, users and systems, more sophisticated technologies are needed to implement high-performance distributed KDD systems. Since computational Grids emerged as privileged platforms for distributed computing, a growing number of Grid-based KDD systems has been proposed. In this chapter we first discuss different ways to exploit parallelism in the main Data Mining techniques and algorithms, then we discuss Grid-based KDD systems. Finally, we introduce the Knowledge Grid, an environment which makes use of standard Grid middleware to support the development of parallel and distributed knowledge discovery applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal G. High-level Interfaces and Abstractions for Grid-based Data Mining. Workshop on Data Mining and Exploration Middleware for Distributed and Grid Computing; 2003 September 18–19; Minneapolis, MI.
Google Scholar
Agrawal R., Shafer J.C. Parallel Mining of Association Rules. IEEE Transactions on Knowledge and Data Engineering 1996; 8: 962-969.
Article Google Scholar
Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules. Proceedings of the 20th International Conference on Very Large Databases; 1994; Santiago, Chile.
Google Scholar
Berman F. From TeraGrid to Knowledge Grid. Communications of the ACM 2001; 44(11): 27-28.
Article Google Scholar
Berry, M. JA, Linoff, G., Data Mining Techniques for Marketing, Sales, and Customer Support. New York: Wiley Computer Publishing, 1997.
Google Scholar
Beynon M, Kurc T, Catalyurek U, Chang C, Sussman A, Saltz J. Distributed Processing of Very Large Datasets with DataCutter. Parallel Computing 2001. 27(11):1457-1478.
Article MATH Google Scholar
Bigus, J. P., Data Mining with Neural Networks. New York: McGraw-Hill, 1996.
Google Scholar
Bruynooghe M., Parallel Implementation of Fast Clustering Algorithms. Proceedings of the International Symposium on High Performance Computing; 1989 March 22-24; Montpellier, France. Elsevier Science, 1989; 65-78.
Google Scholar
Cannataro M, Congiusta A, Talia D, Trunfio P. A Data Mining Toolset for Distributed Highperformance Platforms. Proceedings of the International Conference on Data Mining Methods and Databases for Engineering; 2002 September 25-27; Bologna, Italy.Wessex Institute Press, 2002; 41-50.
Google Scholar
Cannataro M., Talia D. The Knowledge Grid. Communications of the ACM 2003; 46(1):89-93.
Article Google Scholar
Cannataro M, Talia D, Trunfio P. KNOWLEDGE GRID: High Performance Knowledge Discovery Services on the Grid. Proceedings of the 2nd InternationalWorkshop GRID 2001; 2001 November; Denver, CO. Springer-Verlag, 2001; LNCS 2242:38-50.
Google Scholar
Cannataro M., Talia D., Trunfio P. Distributed Data Mining on the Grid. Future Generation Computer Systems 2002. 18(8):1101-1112.
Article MATH Google Scholar
Congiusta A, Talia D, Trunfio P. VEGA: A Visual Environment for Developing Complex Grid Applications. Proceedings of the First International Workshop on Knowledge Grid and Grid Intelligence (KGGI); 2003 October 13; Halifax, Canada.
Google Scholar
Catlett C. The TeraGrid: a Primer, 2002.
Google Scholar
Curcin V, Ghanem M, Guo Y, Kohler M, Rowe A, Syed J,Wendel P. Discovery Net: Towards a Grid of Knowledge Discovery. Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining; 2002 July 23-26; Edmonton, Canada.
Google Scholar
Foster I, Kesselman C, Nick J, Tuecke S (2002). The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration.
Google Scholar
Foti D, Lipari D, Pizzuti C, Talia D. Scalable Parallel Clustering for Data Mining on Multicomputers. Proceedings of the 3rd International Workshop on High Performance Data Mining; 2000; Cancun. Springer-Verlag, 2000; LNCS 1800:390-398.
Google Scholar
Freitas, A. A., Lavington, S. H, Mining Very Large Database with Parallel Processing. Boston: Kluwer Academic Publishers, 1998.
Google Scholar
Giannadakis N., Rowe A., Ghanem M., Guo Y. InfoGrid: Providing Information Integration for Knowledge Discovery. Information Sciences 2003; 155:199-226.
Google Scholar
Han E. H., Karypis G., Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering 2000; 12(2):337-352
Google Scholar
Hinke T., Novonty J. Data Mining on NASA’s Information Power Grid. Proceedings 9th International Symposium on High Performance Distributed Computing; 2000 August 1-4; Pittsburgh, PA.
Google Scholar
Johnston W. E. Computational and Data Grids in Large-Scale Science and Engineering. Future Generation Computer Systems 2002; 18(8):1085-1100.
Article MATH Google Scholar
Judd D, McKinley K, Jain AK. Large-Scale Parallel Data Clustering. Proceedings of the International Conference On Pattern Recognition; 1996; Wien.
Google Scholar
Kargupta, H., Chan, P. (Eds.), Advances in Distributed and Parallel Knowledge Discovery. Boston: AAAI/MIT Press, 2000.
Google Scholar
Kufrin R. Generating C4.5 Production Rules in Parallel. Proceedings of the 14th National Conference on Artificial Intelligence; AAAI Press, 1997.
Google Scholar
Li X., Fang Z. Parallel Clustering Algorithms. Parallel Computing 1989; 11:275–290.
Article MATH MathSciNet Google Scholar
Moore R.W. (2001). Knowledge-Based Grids: Two Use Cases. GGF-3 Meeting.
Google Scholar
Neri F, Giordana A. A Parallel Genetic Algorithm for Concept Learning. Proceedings of the 6th International Conference on Genetic Algorithms; 1995 July 15-19; Pittsburgh, PA. Morgan Kaufmann, 1995; 436-443.
Google Scholar
Olson C.F. Parallel Algorithms for Hierarchical Clustering. Parallel Computing 1995; 21:1313-1325.
Article MATH MathSciNet Google Scholar
Pearson, R. A. “A Coarse-grained Parallel Induction Heuristic.” In Parallel Processing for Artificial Intelligence 2, H. Kitano, V. Kumar, C.B. Suttner, ed. Elsevier Science, 1994.
Google Scholar
Prodromidis, A. L., Chan, P. K., Stolfo, S. J. “Meta-Learning in Distributed Data Mining Systems: Issues and Approaches”, In Advances in Distributed and Parallel Knowledge Discovery, H. Kargupta, P. Chan, ed. AAAI Press, 2000.
Google Scholar
Shafer J, Agrawal R, Mehta M. SPRINT: A Scalable Parallel Classifier for Data Mining. Proceedings of the 22nd International Conference Very Large Databases; 1996; Bombay.
Google Scholar
Skillicorn D. Strategies for Parallel Data Mining. IEEE Concurrency 1999; 7(4):26-35.
Article Google Scholar
Skillicorn D., Talia D. Mining Large Data Sets on Grids: Issues and Prospects. Computing and Informatics 2002; 21:347-362.
MATH Google Scholar
Witten, I. H., Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann, 2000.
Google Scholar
Zaki M.J. Parallel and Distributed Association Mining: A Survey. IEEE Concurrency 1999; 7(4):14-25.
Article Google Scholar

Download references

Author information

Authors and Affiliations

DEIS – University of Calabria, Cosenza, Italy
Antonio Congiusta, Domenico Talia & Paolo Trunfio

Authors

Antonio Congiusta
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Talia
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Trunfio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Congiusta .

Editor information

Editors and Affiliations

, Dept. Industrial Engineering, Tel Aviv University, Ramat Aviv, 69978, Israel
Oded Maimon
, Dept. Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
Lior Rokach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Congiusta, A., Talia, D., Trunfio, P. (2009). Parallel and Grid-Based Data Mining – Algorithms, Models and Systems for High-Performance KDD. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_53

Download citation

DOI: https://doi.org/10.1007/978-0-387-09823-4_53
Published: 07 July 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09822-7
Online ISBN: 978-0-387-09823-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics