Summary
Data Mining often is a computing intensive and time requiring process. For this reason, several Data Mining systems have been implemented on parallel computing platforms to achieve high performance in the analysis of large data sets. Moreover, when large data repositories are coupled with geographical distribution of data, users and systems, more sophisticated technologies are needed to implement high-performance distributed KDD systems. Since computational Grids emerged as privileged platforms for distributed computing, a growing number of Grid-based KDD systems has been proposed. In this chapter we first discuss different ways to exploit parallelism in the main Data Mining techniques and algorithms, then we discuss Grid-based KDD systems. Finally, we introduce the Knowledge Grid, an environment which makes use of standard Grid middleware to support the development of parallel and distributed knowledge discovery applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal G. High-level Interfaces and Abstractions for Grid-based Data Mining. Workshop on Data Mining and Exploration Middleware for Distributed and Grid Computing; 2003 September 18–19; Minneapolis, MI.
Agrawal R., Shafer J.C. Parallel Mining of Association Rules. IEEE Transactions on Knowledge and Data Engineering 1996; 8: 962-969.
Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules. Proceedings of the 20th International Conference on Very Large Databases; 1994; Santiago, Chile.
Berman F. From TeraGrid to Knowledge Grid. Communications of the ACM 2001; 44(11): 27-28.
Berry, M. JA, Linoff, G., Data Mining Techniques for Marketing, Sales, and Customer Support. New York: Wiley Computer Publishing, 1997.
Beynon M, Kurc T, Catalyurek U, Chang C, Sussman A, Saltz J. Distributed Processing of Very Large Datasets with DataCutter. Parallel Computing 2001. 27(11):1457-1478.
Bigus, J. P., Data Mining with Neural Networks. New York: McGraw-Hill, 1996.
Bruynooghe M., Parallel Implementation of Fast Clustering Algorithms. Proceedings of the International Symposium on High Performance Computing; 1989 March 22-24; Montpellier, France. Elsevier Science, 1989; 65-78.
Cannataro M, Congiusta A, Talia D, Trunfio P. A Data Mining Toolset for Distributed Highperformance Platforms. Proceedings of the International Conference on Data Mining Methods and Databases for Engineering; 2002 September 25-27; Bologna, Italy.Wessex Institute Press, 2002; 41-50.
Cannataro M., Talia D. The Knowledge Grid. Communications of the ACM 2003; 46(1):89-93.
Cannataro M, Talia D, Trunfio P. KNOWLEDGE GRID: High Performance Knowledge Discovery Services on the Grid. Proceedings of the 2nd InternationalWorkshop GRID 2001; 2001 November; Denver, CO. Springer-Verlag, 2001; LNCS 2242:38-50.
Cannataro M., Talia D., Trunfio P. Distributed Data Mining on the Grid. Future Generation Computer Systems 2002. 18(8):1101-1112.
Congiusta A, Talia D, Trunfio P. VEGA: A Visual Environment for Developing Complex Grid Applications. Proceedings of the First International Workshop on Knowledge Grid and Grid Intelligence (KGGI); 2003 October 13; Halifax, Canada.
Catlett C. The TeraGrid: a Primer, 2002.
Curcin V, Ghanem M, Guo Y, Kohler M, Rowe A, Syed J,Wendel P. Discovery Net: Towards a Grid of Knowledge Discovery. Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining; 2002 July 23-26; Edmonton, Canada.
Foster I, Kesselman C, Nick J, Tuecke S (2002). The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration.
Foti D, Lipari D, Pizzuti C, Talia D. Scalable Parallel Clustering for Data Mining on Multicomputers. Proceedings of the 3rd International Workshop on High Performance Data Mining; 2000; Cancun. Springer-Verlag, 2000; LNCS 1800:390-398.
Freitas, A. A., Lavington, S. H, Mining Very Large Database with Parallel Processing. Boston: Kluwer Academic Publishers, 1998.
Giannadakis N., Rowe A., Ghanem M., Guo Y. InfoGrid: Providing Information Integration for Knowledge Discovery. Information Sciences 2003; 155:199-226.
Han E. H., Karypis G., Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering 2000; 12(2):337-352
Hinke T., Novonty J. Data Mining on NASA’s Information Power Grid. Proceedings 9th International Symposium on High Performance Distributed Computing; 2000 August 1-4; Pittsburgh, PA.
Johnston W. E. Computational and Data Grids in Large-Scale Science and Engineering. Future Generation Computer Systems 2002; 18(8):1085-1100.
Judd D, McKinley K, Jain AK. Large-Scale Parallel Data Clustering. Proceedings of the International Conference On Pattern Recognition; 1996; Wien.
Kargupta, H., Chan, P. (Eds.), Advances in Distributed and Parallel Knowledge Discovery. Boston: AAAI/MIT Press, 2000.
Kufrin R. Generating C4.5 Production Rules in Parallel. Proceedings of the 14th National Conference on Artificial Intelligence; AAAI Press, 1997.
Li X., Fang Z. Parallel Clustering Algorithms. Parallel Computing 1989; 11:275–290.
Moore R.W. (2001). Knowledge-Based Grids: Two Use Cases. GGF-3 Meeting.
Neri F, Giordana A. A Parallel Genetic Algorithm for Concept Learning. Proceedings of the 6th International Conference on Genetic Algorithms; 1995 July 15-19; Pittsburgh, PA. Morgan Kaufmann, 1995; 436-443.
Olson C.F. Parallel Algorithms for Hierarchical Clustering. Parallel Computing 1995; 21:1313-1325.
Pearson, R. A. “A Coarse-grained Parallel Induction Heuristic.” In Parallel Processing for Artificial Intelligence 2, H. Kitano, V. Kumar, C.B. Suttner, ed. Elsevier Science, 1994.
Prodromidis, A. L., Chan, P. K., Stolfo, S. J. “Meta-Learning in Distributed Data Mining Systems: Issues and Approaches”, In Advances in Distributed and Parallel Knowledge Discovery, H. Kargupta, P. Chan, ed. AAAI Press, 2000.
Shafer J, Agrawal R, Mehta M. SPRINT: A Scalable Parallel Classifier for Data Mining. Proceedings of the 22nd International Conference Very Large Databases; 1996; Bombay.
Skillicorn D. Strategies for Parallel Data Mining. IEEE Concurrency 1999; 7(4):26-35.
Skillicorn D., Talia D. Mining Large Data Sets on Grids: Issues and Prospects. Computing and Informatics 2002; 21:347-362.
Witten, I. H., Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann, 2000.
Zaki M.J. Parallel and Distributed Association Mining: A Survey. IEEE Concurrency 1999; 7(4):14-25.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Congiusta, A., Talia, D., Trunfio, P. (2009). Parallel and Grid-Based Data Mining – Algorithms, Models and Systems for High-Performance KDD. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_53
Download citation
DOI: https://doi.org/10.1007/978-0-387-09823-4_53
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09822-7
Online ISBN: 978-0-387-09823-4
eBook Packages: Computer ScienceComputer Science (R0)