Skip to main content

Parallel and Grid-Based Data Mining – Algorithms, Models and Systems for High-Performance KDD

  • Chapter
  • First Online:
Data Mining and Knowledge Discovery Handbook

Summary

Data Mining often is a computing intensive and time requiring process. For this reason, several Data Mining systems have been implemented on parallel computing platforms to achieve high performance in the analysis of large data sets. Moreover, when large data repositories are coupled with geographical distribution of data, users and systems, more sophisticated technologies are needed to implement high-performance distributed KDD systems. Since computational Grids emerged as privileged platforms for distributed computing, a growing number of Grid-based KDD systems has been proposed. In this chapter we first discuss different ways to exploit parallelism in the main Data Mining techniques and algorithms, then we discuss Grid-based KDD systems. Finally, we introduce the Knowledge Grid, an environment which makes use of standard Grid middleware to support the development of parallel and distributed knowledge discovery applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agrawal G. High-level Interfaces and Abstractions for Grid-based Data Mining. Workshop on Data Mining and Exploration Middleware for Distributed and Grid Computing; 2003 September 18–19; Minneapolis, MI.

    Google Scholar 

  • Agrawal R., Shafer J.C. Parallel Mining of Association Rules. IEEE Transactions on Knowledge and Data Engineering 1996; 8: 962-969.

    Article  Google Scholar 

  • Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules. Proceedings of the 20th International Conference on Very Large Databases; 1994; Santiago, Chile.

    Google Scholar 

  • Berman F. From TeraGrid to Knowledge Grid. Communications of the ACM 2001; 44(11): 27-28.

    Article  Google Scholar 

  • Berry, M. JA, Linoff, G., Data Mining Techniques for Marketing, Sales, and Customer Support. New York: Wiley Computer Publishing, 1997.

    Google Scholar 

  • Beynon M, Kurc T, Catalyurek U, Chang C, Sussman A, Saltz J. Distributed Processing of Very Large Datasets with DataCutter. Parallel Computing 2001. 27(11):1457-1478.

    Article  MATH  Google Scholar 

  • Bigus, J. P., Data Mining with Neural Networks. New York: McGraw-Hill, 1996.

    Google Scholar 

  • Bruynooghe M., Parallel Implementation of Fast Clustering Algorithms. Proceedings of the International Symposium on High Performance Computing; 1989 March 22-24; Montpellier, France. Elsevier Science, 1989; 65-78.

    Google Scholar 

  • Cannataro M, Congiusta A, Talia D, Trunfio P. A Data Mining Toolset for Distributed Highperformance Platforms. Proceedings of the International Conference on Data Mining Methods and Databases for Engineering; 2002 September 25-27; Bologna, Italy.Wessex Institute Press, 2002; 41-50.

    Google Scholar 

  • Cannataro M., Talia D. The Knowledge Grid. Communications of the ACM 2003; 46(1):89-93.

    Article  Google Scholar 

  • Cannataro M, Talia D, Trunfio P. KNOWLEDGE GRID: High Performance Knowledge Discovery Services on the Grid. Proceedings of the 2nd InternationalWorkshop GRID 2001; 2001 November; Denver, CO. Springer-Verlag, 2001; LNCS 2242:38-50.

    Google Scholar 

  • Cannataro M., Talia D., Trunfio P. Distributed Data Mining on the Grid. Future Generation Computer Systems 2002. 18(8):1101-1112.

    Article  MATH  Google Scholar 

  • Congiusta A, Talia D, Trunfio P. VEGA: A Visual Environment for Developing Complex Grid Applications. Proceedings of the First International Workshop on Knowledge Grid and Grid Intelligence (KGGI); 2003 October 13; Halifax, Canada.

    Google Scholar 

  • Catlett C. The TeraGrid: a Primer, 2002.

    Google Scholar 

  • Curcin V, Ghanem M, Guo Y, Kohler M, Rowe A, Syed J,Wendel P. Discovery Net: Towards a Grid of Knowledge Discovery. Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining; 2002 July 23-26; Edmonton, Canada.

    Google Scholar 

  • Foster I, Kesselman C, Nick J, Tuecke S (2002). The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration.

    Google Scholar 

  • Foti D, Lipari D, Pizzuti C, Talia D. Scalable Parallel Clustering for Data Mining on Multicomputers. Proceedings of the 3rd International Workshop on High Performance Data Mining; 2000; Cancun. Springer-Verlag, 2000; LNCS 1800:390-398.

    Google Scholar 

  • Freitas, A. A., Lavington, S. H, Mining Very Large Database with Parallel Processing. Boston: Kluwer Academic Publishers, 1998.

    Google Scholar 

  • Giannadakis N., Rowe A., Ghanem M., Guo Y. InfoGrid: Providing Information Integration for Knowledge Discovery. Information Sciences 2003; 155:199-226.

    Google Scholar 

  • Han E. H., Karypis G., Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering 2000; 12(2):337-352

    Google Scholar 

  • Hinke T., Novonty J. Data Mining on NASA’s Information Power Grid. Proceedings 9th International Symposium on High Performance Distributed Computing; 2000 August 1-4; Pittsburgh, PA.

    Google Scholar 

  • Johnston W. E. Computational and Data Grids in Large-Scale Science and Engineering. Future Generation Computer Systems 2002; 18(8):1085-1100.

    Article  MATH  Google Scholar 

  • Judd D, McKinley K, Jain AK. Large-Scale Parallel Data Clustering. Proceedings of the International Conference On Pattern Recognition; 1996; Wien.

    Google Scholar 

  • Kargupta, H., Chan, P. (Eds.), Advances in Distributed and Parallel Knowledge Discovery. Boston: AAAI/MIT Press, 2000.

    Google Scholar 

  • Kufrin R. Generating C4.5 Production Rules in Parallel. Proceedings of the 14th National Conference on Artificial Intelligence; AAAI Press, 1997.

    Google Scholar 

  • Li X., Fang Z. Parallel Clustering Algorithms. Parallel Computing 1989; 11:275–290.

    Article  MATH  MathSciNet  Google Scholar 

  • Moore R.W. (2001). Knowledge-Based Grids: Two Use Cases. GGF-3 Meeting.

    Google Scholar 

  • Neri F, Giordana A. A Parallel Genetic Algorithm for Concept Learning. Proceedings of the 6th International Conference on Genetic Algorithms; 1995 July 15-19; Pittsburgh, PA. Morgan Kaufmann, 1995; 436-443.

    Google Scholar 

  • Olson C.F. Parallel Algorithms for Hierarchical Clustering. Parallel Computing 1995; 21:1313-1325.

    Article  MATH  MathSciNet  Google Scholar 

  • Pearson, R. A. “A Coarse-grained Parallel Induction Heuristic.” In Parallel Processing for Artificial Intelligence 2, H. Kitano, V. Kumar, C.B. Suttner, ed. Elsevier Science, 1994.

    Google Scholar 

  • Prodromidis, A. L., Chan, P. K., Stolfo, S. J. “Meta-Learning in Distributed Data Mining Systems: Issues and Approaches”, In Advances in Distributed and Parallel Knowledge Discovery, H. Kargupta, P. Chan, ed. AAAI Press, 2000.

    Google Scholar 

  • Shafer J, Agrawal R, Mehta M. SPRINT: A Scalable Parallel Classifier for Data Mining. Proceedings of the 22nd International Conference Very Large Databases; 1996; Bombay.

    Google Scholar 

  • Skillicorn D. Strategies for Parallel Data Mining. IEEE Concurrency 1999; 7(4):26-35.

    Article  Google Scholar 

  • Skillicorn D., Talia D. Mining Large Data Sets on Grids: Issues and Prospects. Computing and Informatics 2002; 21:347-362.

    MATH  Google Scholar 

  • Witten, I. H., Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann, 2000.

    Google Scholar 

  • Zaki M.J. Parallel and Distributed Association Mining: A Survey. IEEE Concurrency 1999; 7(4):14-25.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Congiusta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Congiusta, A., Talia, D., Trunfio, P. (2009). Parallel and Grid-Based Data Mining – Algorithms, Models and Systems for High-Performance KDD. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_53

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-09823-4_53

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-09822-7

  • Online ISBN: 978-0-387-09823-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics