Abstract
Distributed computing and data mining are two elements essential for many commercial and scientific organizations. Data mining is a time and hardware resources consuming process of building analytical models of data. Distribution is often a part of organizations’ structure. Authors propose methodology of distributed data mining by combining local analytical models (build in parallel in nodes of a distributed computer system) into a global one without necessity to construct distributed version of data mining algorithm. Different combining strategies are proposed and their verification method as well. Proposed solutions were tested with data sets coming from UCI Machine Learning Repository.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chan, P., Prodromidis, A., Stolfo, G.: Meta-learning in distributed data mining systems: Issues and approaches. In: Advances of Distributed Data Mining. AAAI Press, Menlo Park (2000)
Guo, Y., Reuger, S.M., Sutiwaraphun, J., Forbes-Millot, J.: Meta-learning for parallel data mining. In: Proceedings of the 7th Parallel Computing Workshop (1997)
Caragea, D., Silvescu, A., Honavar, V.: Invited Paper. A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems 1(2), 80–89 (2004)
Grossman, R., Turinsky, A.: A Framework for Finding Distributed Data Mining Strategies That Are Intermediate Between Centralized Strategies and In-Place Strategies. In: Proceedings of Workshop on Distributed and Parallel Knowledge Discovery at KDD-2000, pp. 1–7 (2000)
Gorawski, M., Pluciennik, E.: Analytical Models Combining Methodology with Classification Model Example. In: First International Conference on Information Technology, Gdansk (2008), http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4621623 , ISBN: 978-1-4244-2244-9
Pluim, J.P.W., Maintz, J.B.A., Viergever, M.A.: Mutual-information-based registration of medical images: a survey. IEEE Transactions on Medical Imaging 22(8), 986–1004 (2003)
International Organization for Standardization (ISO). Information Technology, Database Language, SQL Multimedia and Application Packages, Part 6: Data Mining Draft Standard No. ISO/IEC 13249-6 (2003)
Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: DMQL: A Data Mining Query Language for Relational Database. In: Proc. of the SIGMOD 1996 Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada, pp. 27–34 (1996)
Imieliński, T., Virmani, A.: MSQL: A Query Language for Database Mining. Data Mining and Knowledge Discovery 3(4), 373–408 (1999)
Meo, R., Psaila, G., Ceri, S.: A New SQL-like Operator for Mining Association Rules. In: Proc. 22nd VLDB Conference, Bombaj, India, pp. 122–133 (1996)
Meo, R., Psaila, G., Ceri, S.: An Extention to SQL for Mining Association Rules. Data Mining and Knowledge Discovery 2(2), 195–224 (1998)
Morzy, T., Zakrzewicz, M.: SQL-like language for database mining. In: Proc. of the First East-European, Symposium on Advances in Databases and Information Systems - ADBIS, St. Petersburg, vol. 1, pp. 311–317(1997)
Baglioni, M., Turini, F.: MQL: An algebraic query language for knowledge discovery. In: Cappelli, A., Turini, F. (eds.) AI*IA 2003. LNCS, vol. 2829, pp. 225–236. Springer, Heidelberg (2003)
Romei, A., Ruggieri, S., Turini, F.: KDDML: a middleware language and system for knowledge discovery in databases. Data & Knowledge Engineering 57(2), 179–220 (2006)
Cereghini, P., Ordonez, C.: SQLEM: Fast Clustering in SQL using the EM Algorithm. In: SIGMOD Conference, pp. 559–570 (2000)
Dunemann, O., Sattler, K.: SQL Database Primitives for Decision Tree Classifiers. In: Proc. of the 10th ACM CIKM Int. Conf. on Information and Knowledge Management, pp. 379–386 (2001)
Gorawski, M., Pluciennik, E.: Distributed Data Mining by Means of SQL Enhancement. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM-WS 2008. LNCS, vol. 5333, pp. 34–35. Springer, Heidelberg (2008)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan, R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gorawski, M., Płuciennik-Psota, E. (2009). Distributed Data Mining Methodology with Classification Model Example. In: Nguyen, N.T., Kowalczyk, R., Chen, SM. (eds) Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems. ICCCI 2009. Lecture Notes in Computer Science(), vol 5796. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04441-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-04441-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04440-3
Online ISBN: 978-3-642-04441-0
eBook Packages: Computer ScienceComputer Science (R0)