Abstract
This work introduces JaCa-DDM, a novel distributed data mining system founded on the agents and artifacts paradigm, conceived to design, implement, deploy, and evaluate learning strategies. Jason rational agents conform to such strategies to cope with distributed computing environments, where CArtAgO artifacts encapsulate learning algorithms, data sources, evaluation tools, and other services implemented in Weka for data mining tasks. The set of strategies presented in this paper aims at encouraging the use of JaCa-DDM to develop new ones, suited to different needs. For this, our system provides tools to evaluate the resulting models in terms of accuracy, number of instances employed to learn, time of convergence, and volume of communications. Although the emphasis in decision trees, JaCa-DDM can be easily extended by adopting new artifacts, e.g., for meta-learning. The main contributions of the paper are as follows: (i) From the multi-agent systems perspective, our approach illustrates how to exploit the so-called “agentification” of Weka for the sake of code reusability, while preserving the benefits of reasoning at the Belief–Desire–Intention level with Jason; (ii) from the data mining perspective, JaCa-DDM is promoted as an extensible tool to define and test distributed strategies; and (iii) a set of strategies including centralizing, meta-learning and Windowing-based approaches, is carefully analyzed to provide comparisons among them.
Similar content being viewed by others
Notes
A tutorial is available at https://sourceforge.net/p/jacaddm/wiki.
References
Albashiri KA, Coenen F (2009) Agent-enriched data mining using an extendable framework. In: Agents and data mining interaction. Springer, pp 53–68
Bache K, Lichman M (2013) UCI machine learning repository
Baik SW, Bala J, Cho JS (2005) Agent based distributed data mining. In: Parallel and distributed computing: applications and technologies. Springer, pp 42–45
Bailey S, Grossman R, Sivakumar H, Turinsky A (1999) Papyrus: a system for data mining over local and wide area clusters and super-clusters. In: Proceedings of the 1999 ACM/IEEE conference on Supercomputing. ACM, p 63
Bellifemine F, Caire G, Greenwood D (2007) Developing multi-agent systems with JADE. Wiley, London
Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604
Bordini RH, Hübner JF, Wooldridge M (2007) Programming multi-agent systems in agent-speak using Jason. Wiley, London
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Caire G, Quarantotto E, Sacchi G (2009) Wade: an open source platform for workflows and agents. In: MALLOW
Cao L, Weiss G, Philip SY (2012) A brief introduction to agent mining. Auton Agents Multi Agent Syst 25(3):419–424
Cao L (2009) Data mining and multi-agent integration. Springer, Berlin Heidelberg New York London
Cao L, Bazzan ALC, Gorodetsky V, Mitkas PA, Weiss G, Philip SY (2010) Agents and data mining interaction: 6th ADMI 2010, Toronto, ON, Canada, volume 5980 ofLecture Notes in Artificial Intelligence. Springer Verlag, Berlin Heidelberg
Cao L, Gorodetsky V, Liu J, Gerhard G, Philip SY (2009) Agents and data mining interaction: 4th ADMI, Budapes, Hungary, vol 5680. Lecture notes in artificial intelligence. Springer Verlag, Berlin Heidelberg New York
Chan PK, Stolfo SJ (1997) On the accuracy of meta-learning for scalable data mining. J Intell Inf Syst 8(1):5–28
Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, London
Da Silva JC, Giannella C, Bhargava R, Kargupta H, Klusch M (2005) Distributed data mining and agents. Eng Appl Artif Intell 18(7):791–807
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80
Finin T et al (1992) An overview of KQML: a knowledge query and manipulation language. Technical report, University of Maryland, CS Department,
Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156
Fürnkranz J (1998) Integrative windowing. arXiv preprint cs/9805101
Gorodetsky V, Karsaeyv O, Samoilov V (2003) Multi-agent technology for distributed data mining and classification. In: Intelligent agent technology, 2003. IAT 2003. IEEE/WIC international conference on. IEEE, pp 438–441
Guo Y, Sutiwaraphun J (1998) Knowledge probing in distributed data mining. In: Working notes of the KDD-97 workshop on distributed data mining. pp 61–69
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–106
Kargupta H, Byung-Hoon DH, Johnson E (1999) Collective data mining: a new perspective toward distributed data analysis. In: Advances in distributed and parallel knowledge discovery. Citeseer
Klusch M, Lodi S, Moro G (2003) Agent-based distributed data mining: The kdec scheme. In: Intelligent information agents. Springer, pp 104–122
Klusch M, Lodi S, Moro G (2003) Issues of agent-based distributed data mining. In: Proceedings of the second international joint conference on Autonomous agents and multiagent systems. ACM, pp 1034–1035
Limón X, Guerra-Hernández A, Cruz-Ramírez N, Grimaldo F (2013) An agents and artifacts approach to distributed data mining. In Castro F, Gelbukh A, Mendoza MG (eds), 11th MICAI, volume 8266 ofLNAI. Springer, Berlin Heidelbergpp 338–349
Luo P, He Q, Huang R, Lin F, Shi Z (2005) Execution engine of meta-learning system for kdd in multi-agent environment. In: AIS-ADM, volume 3505 of LNAI. Springer, Berlin Heidelberg, pp 149–160
Moemeng C, Gorodetsky V, Zuo Z, Yang Y, Zhang C (2009) Agent-based distributed data mining: a survey. In: Data mining and multi-agent integration. Springer, pp 47–58
Moemeng C, Zhu X, Cao L (2010) Integrating workflow into agent-based distributed data mining systems. In: Agents and data mining interaction. Springer, pp 4–15
Moemeng C, Zhu X, Cao L, Jiahang C (2010) i-analyst: an agent-based distributed data mining platform. In: Data mining workshops (ICDMW), 2010 IEEE international conference on. IEEE, pp1404–1406
Nguyen H-L, Woon Y-K, Ng W-K (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569
Omicini A, Ricci A, Viroli M (2008) Artifacts in the A&A meta-model for multi-agent systems. Auton Agents Multi Agent Syst 17(3):432–456
Park B-H, Kargupta H (2002) Distributed data mining: algorithms, systems, and applications. pp 341–358
Prodromidis A, Chan P, Stolfo S (2000) Meta-learning in distributed data mining systems: issues and approaches. Adv Distrib Parallel Knowl Discov 3:81–114
Quinlan JR (1993) C4. 5: programs for machine learning, vol 1. Morgan kaufmann, Burlington
Raftery AE, Madigan D, Hoeting JA (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92(437):179–191
Rao AS (1996) AgentSpeak(L): BDI agents speak out in a logical computable language. In: van Hoe R (ed) Seventh European Workshop on Modelling Autonomous Agents in a Multi-Agent World. Eindhoven, The Netherlands
Rao VS (2009) Multi agent-based distributed data mining: an overview. Int J Rev Comput 3:83–92
Ricci A, Piunti M, Viroli M (2011) Environment programming in multi-agent systems: an artifact-based perspective. Auton Agents Multi Agent Syst 23(2):158–192
Ricci A, Viroli M, Omicini A (2006) Construenda est cartago: toward an infrastructure for artifacts in MAS. Cybern Syst 2:569–574
Secretan J (2009) An architecture for high-performance privacy-preserving and distributed data mining. PhD thesis, University of Central Florida Orlando, Florida, Orlando, FL., USA
Shoham Y (1993) Agent-oriented programming. Artif Intell 60:51–92
Stolfo SJ, Prodromidis AL, Tselepis S, Lee W, Fan DW, Chan PK (1997) Jam: Java agents for meta-learning over distributed databases. In: KDD volume 97, pp 74–81
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Xu J, Li Y, Li L, Chen Y (2014) Sampling based multi-agent joint learning for association rule mining. In: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1469–1470
Xu L, Jordan MI (1993) Em learning on a generalized finite mixture model for combining multiple classifiers. In: Proceedings of the world congress on neural networks, volume 4, pp 227–230
Zhong N, Matsui Y, Okuno T, Liu C (2002) Framework of a multi-agent kdd system. In: Intelligent data engineering and automated learning—IDEAL 2002. Springer, pp 337–346
Funding
The first author was funded by Conacyt Scholarship 362384. This work has been partly supported by the Spanish Ministry of Science and Innovation through Project TIN2015-66972-C5-5-R.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Table 5 shows the results for all the datasets. The column DS refers to the indexes for the datasets in Table 3. Displayed values are the average of 20 runs (two repetitions of a tenfold stratified cross-validation). Time is measured in seconds and traffic in megabytes. Remember that the strategies round, round counter, and parallel round counter use the VFDT learning algorithm.
Rights and permissions
About this article
Cite this article
Limón, X., Guerra-Hernández, A., Cruz-Ramírez, N. et al. Modeling and implementing distributed data mining strategies in JaCa-DDM. Knowl Inf Syst 60, 99–143 (2019). https://doi.org/10.1007/s10115-018-1222-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1222-x