Skip to main content

Distributed Data Mining Methodology with Classification Model Example

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5796))

Abstract

Distributed computing and data mining are two elements essential for many commercial and scientific organizations. Data mining is a time and hardware resources consuming process of building analytical models of data. Distribution is often a part of organizations’ structure. Authors propose methodology of distributed data mining by combining local analytical models (build in parallel in nodes of a distributed computer system) into a global one without necessity to construct distributed version of data mining algorithm. Different combining strategies are proposed and their verification method as well. Proposed solutions were tested with data sets coming from UCI Machine Learning Repository.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chan, P., Prodromidis, A., Stolfo, G.: Meta-learning in distributed data mining systems: Issues and approaches. In: Advances of Distributed Data Mining. AAAI Press, Menlo Park (2000)

    Google Scholar 

  2. Guo, Y., Reuger, S.M., Sutiwaraphun, J., Forbes-Millot, J.: Meta-learning for parallel data mining. In: Proceedings of the 7th Parallel Computing Workshop (1997)

    Google Scholar 

  3. Caragea, D., Silvescu, A., Honavar, V.: Invited Paper. A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems 1(2), 80–89 (2004)

    Article  MATH  Google Scholar 

  4. Grossman, R., Turinsky, A.: A Framework for Finding Distributed Data Mining Strategies That Are Intermediate Between Centralized Strategies and In-Place Strategies. In: Proceedings of Workshop on Distributed and Parallel Knowledge Discovery at KDD-2000, pp. 1–7 (2000)

    Google Scholar 

  5. Gorawski, M., Pluciennik, E.: Analytical Models Combining Methodology with Classification Model Example. In: First International Conference on Information Technology, Gdansk (2008), http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4621623 , ISBN: 978-1-4244-2244-9

  6. Pluim, J.P.W., Maintz, J.B.A., Viergever, M.A.: Mutual-information-based registration of medical images: a survey. IEEE Transactions on Medical Imaging 22(8), 986–1004 (2003)

    Article  MATH  Google Scholar 

  7. International Organization for Standardization (ISO). Information Technology, Database Language, SQL Multimedia and Application Packages, Part 6: Data Mining Draft Standard No. ISO/IEC 13249-6 (2003)

    Google Scholar 

  8. Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: DMQL: A Data Mining Query Language for Relational Database. In: Proc. of the SIGMOD 1996 Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada, pp. 27–34 (1996)

    Google Scholar 

  9. Imieliński, T., Virmani, A.: MSQL: A Query Language for Database Mining. Data Mining and Knowledge Discovery 3(4), 373–408 (1999)

    Article  Google Scholar 

  10. Meo, R., Psaila, G., Ceri, S.: A New SQL-like Operator for Mining Association Rules. In: Proc. 22nd VLDB Conference, Bombaj, India, pp. 122–133 (1996)

    Google Scholar 

  11. Meo, R., Psaila, G., Ceri, S.: An Extention to SQL for Mining Association Rules. Data Mining and Knowledge Discovery 2(2), 195–224 (1998)

    Article  Google Scholar 

  12. Morzy, T., Zakrzewicz, M.: SQL-like language for database mining. In: Proc. of the First East-European, Symposium on Advances in Databases and Information Systems - ADBIS, St. Petersburg, vol. 1, pp. 311–317(1997)

    Google Scholar 

  13. Baglioni, M., Turini, F.: MQL: An algebraic query language for knowledge discovery. In: Cappelli, A., Turini, F. (eds.) AI*IA 2003. LNCS, vol. 2829, pp. 225–236. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  14. Romei, A., Ruggieri, S., Turini, F.: KDDML: a middleware language and system for knowledge discovery in databases. Data & Knowledge Engineering 57(2), 179–220 (2006)

    Article  Google Scholar 

  15. Cereghini, P., Ordonez, C.: SQLEM: Fast Clustering in SQL using the EM Algorithm. In: SIGMOD Conference, pp. 559–570 (2000)

    Google Scholar 

  16. Dunemann, O., Sattler, K.: SQL Database Primitives for Decision Tree Classifiers. In: Proc. of the 10th ACM CIKM Int. Conf. on Information and Knowledge Management, pp. 379–386 (2001)

    Google Scholar 

  17. Gorawski, M., Pluciennik, E.: Distributed Data Mining by Means of SQL Enhancement. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM-WS 2008. LNCS, vol. 5333, pp. 34–35. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  18. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

  19. Quinlan, R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gorawski, M., Płuciennik-Psota, E. (2009). Distributed Data Mining Methodology with Classification Model Example. In: Nguyen, N.T., Kowalczyk, R., Chen, SM. (eds) Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems. ICCCI 2009. Lecture Notes in Computer Science(), vol 5796. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04441-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04441-0_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04440-3

  • Online ISBN: 978-3-642-04441-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics