Modeling and implementing distributed data mining strategies in JaCa-DDM

Limón, Xavier; Guerra-Hernández, Alejandro; Cruz-Ramírez, Nicandro; Grimaldo, Francisco

doi:10.1007/s10115-018-1222-x

Modeling and implementing distributed data mining strategies in JaCa-DDM

Regular Paper
Published: 20 June 2018

Volume 60, pages 99–143, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xavier Limón¹,
Alejandro Guerra-Hernández¹,
Nicandro Cruz-Ramírez¹ &
…
Francisco Grimaldo²

290 Accesses
3 Citations
Explore all metrics

Abstract

This work introduces JaCa-DDM, a novel distributed data mining system founded on the agents and artifacts paradigm, conceived to design, implement, deploy, and evaluate learning strategies. Jason rational agents conform to such strategies to cope with distributed computing environments, where CArtAgO artifacts encapsulate learning algorithms, data sources, evaluation tools, and other services implemented in Weka for data mining tasks. The set of strategies presented in this paper aims at encouraging the use of JaCa-DDM to develop new ones, suited to different needs. For this, our system provides tools to evaluate the resulting models in terms of accuracy, number of instances employed to learn, time of convergence, and volume of communications. Although the emphasis in decision trees, JaCa-DDM can be easily extended by adopting new artifacts, e.g., for meta-learning. The main contributions of the paper are as follows: (i) From the multi-agent systems perspective, our approach illustrates how to exploit the so-called “agentification” of Weka for the sake of code reusability, while preserving the benefits of reasoning at the Belief–Desire–Intention level with Jason; (ii) from the data mining perspective, JaCa-DDM is promoted as an extensible tool to define and test distributed strategies; and (iii) a set of strategies including centralizing, meta-learning and Windowing-based approaches, is carefully analyzed to provide comparisons among them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Agents and Artifacts Approach to Distributed Data Mining

Multi-agent Systems for Distributed Data Mining Techniques: An Overview

Decision-Making in a Distributed and Dynamically Scalable Environments

Notes

http://jacaddm.sourceforge.net/.
A tutorial is available at https://sourceforge.net/p/jacaddm/wiki.

References

Albashiri KA, Coenen F (2009) Agent-enriched data mining using an extendable framework. In: Agents and data mining interaction. Springer, pp 53–68
Bache K, Lichman M (2013) UCI machine learning repository
Baik SW, Bala J, Cho JS (2005) Agent based distributed data mining. In: Parallel and distributed computing: applications and technologies. Springer, pp 42–45
Bailey S, Grossman R, Sivakumar H, Turinsky A (1999) Papyrus: a system for data mining over local and wide area clusters and super-clusters. In: Proceedings of the 1999 ACM/IEEE conference on Supercomputing. ACM, p 63
Bellifemine F, Caire G, Greenwood D (2007) Developing multi-agent systems with JADE. Wiley, London
Book Google Scholar
Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604
Google Scholar
Bordini RH, Hübner JF, Wooldridge M (2007) Programming multi-agent systems in agent-speak using Jason. Wiley, London
Book MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Caire G, Quarantotto E, Sacchi G (2009) Wade: an open source platform for workflows and agents. In: MALLOW
Cao L, Weiss G, Philip SY (2012) A brief introduction to agent mining. Auton Agents Multi Agent Syst 25(3):419–424
Article Google Scholar
Cao L (2009) Data mining and multi-agent integration. Springer, Berlin Heidelberg New York London
Book MATH Google Scholar
Cao L, Bazzan ALC, Gorodetsky V, Mitkas PA, Weiss G, Philip SY (2010) Agents and data mining interaction: 6th ADMI 2010, Toronto, ON, Canada, volume 5980 ofLecture Notes in Artificial Intelligence. Springer Verlag, Berlin Heidelberg
Cao L, Gorodetsky V, Liu J, Gerhard G, Philip SY (2009) Agents and data mining interaction: 4th ADMI, Budapes, Hungary, vol 5680. Lecture notes in artificial intelligence. Springer Verlag, Berlin Heidelberg New York
Chan PK, Stolfo SJ (1997) On the accuracy of meta-learning for scalable data mining. J Intell Inf Syst 8(1):5–28
Article Google Scholar
Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, London
Google Scholar
Da Silva JC, Giannella C, Bhargava R, Kargupta H, Klusch M (2005) Distributed data mining and agents. Eng Appl Artif Intell 18(7):791–807
Article Google Scholar
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80
Finin T et al (1992) An overview of KQML: a knowledge query and manipulation language. Technical report, University of Maryland, CS Department,
Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156
Google Scholar
Fürnkranz J (1998) Integrative windowing. arXiv preprint cs/9805101
Gorodetsky V, Karsaeyv O, Samoilov V (2003) Multi-agent technology for distributed data mining and classification. In: Intelligent agent technology, 2003. IAT 2003. IEEE/WIC international conference on. IEEE, pp 438–441
Guo Y, Sutiwaraphun J (1998) Knowledge probing in distributed data mining. In: Working notes of the KDD-97 workshop on distributed data mining. pp 61–69
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–106
Kargupta H, Byung-Hoon DH, Johnson E (1999) Collective data mining: a new perspective toward distributed data analysis. In: Advances in distributed and parallel knowledge discovery. Citeseer
Klusch M, Lodi S, Moro G (2003) Agent-based distributed data mining: The kdec scheme. In: Intelligent information agents. Springer, pp 104–122
Klusch M, Lodi S, Moro G (2003) Issues of agent-based distributed data mining. In: Proceedings of the second international joint conference on Autonomous agents and multiagent systems. ACM, pp 1034–1035
Limón X, Guerra-Hernández A, Cruz-Ramírez N, Grimaldo F (2013) An agents and artifacts approach to distributed data mining. In Castro F, Gelbukh A, Mendoza MG (eds), 11th MICAI, volume 8266 ofLNAI. Springer, Berlin Heidelbergpp 338–349
Luo P, He Q, Huang R, Lin F, Shi Z (2005) Execution engine of meta-learning system for kdd in multi-agent environment. In: AIS-ADM, volume 3505 of LNAI. Springer, Berlin Heidelberg, pp 149–160
Moemeng C, Gorodetsky V, Zuo Z, Yang Y, Zhang C (2009) Agent-based distributed data mining: a survey. In: Data mining and multi-agent integration. Springer, pp 47–58
Moemeng C, Zhu X, Cao L (2010) Integrating workflow into agent-based distributed data mining systems. In: Agents and data mining interaction. Springer, pp 4–15
Moemeng C, Zhu X, Cao L, Jiahang C (2010) i-analyst: an agent-based distributed data mining platform. In: Data mining workshops (ICDMW), 2010 IEEE international conference on. IEEE, pp1404–1406
Nguyen H-L, Woon Y-K, Ng W-K (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569
Article Google Scholar
Omicini A, Ricci A, Viroli M (2008) Artifacts in the A&A meta-model for multi-agent systems. Auton Agents Multi Agent Syst 17(3):432–456
Article Google Scholar
Park B-H, Kargupta H (2002) Distributed data mining: algorithms, systems, and applications. pp 341–358
Prodromidis A, Chan P, Stolfo S (2000) Meta-learning in distributed data mining systems: issues and approaches. Adv Distrib Parallel Knowl Discov 3:81–114
Google Scholar
Quinlan JR (1993) C4. 5: programs for machine learning, vol 1. Morgan kaufmann, Burlington
Google Scholar
Raftery AE, Madigan D, Hoeting JA (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92(437):179–191
Article MathSciNet MATH Google Scholar
Rao AS (1996) AgentSpeak(L): BDI agents speak out in a logical computable language. In: van Hoe R (ed) Seventh European Workshop on Modelling Autonomous Agents in a Multi-Agent World. Eindhoven, The Netherlands
Rao VS (2009) Multi agent-based distributed data mining: an overview. Int J Rev Comput 3:83–92
Google Scholar
Ricci A, Piunti M, Viroli M (2011) Environment programming in multi-agent systems: an artifact-based perspective. Auton Agents Multi Agent Syst 23(2):158–192
Article Google Scholar
Ricci A, Viroli M, Omicini A (2006) Construenda est cartago: toward an infrastructure for artifacts in MAS. Cybern Syst 2:569–574
Google Scholar
Secretan J (2009) An architecture for high-performance privacy-preserving and distributed data mining. PhD thesis, University of Central Florida Orlando, Florida, Orlando, FL., USA
Shoham Y (1993) Agent-oriented programming. Artif Intell 60:51–92
Article MathSciNet Google Scholar
Stolfo SJ, Prodromidis AL, Tselepis S, Lee W, Fan DW, Chan PK (1997) Jam: Java agents for meta-learning over distributed databases. In: KDD volume 97, pp 74–81
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Article Google Scholar
Xu J, Li Y, Li L, Chen Y (2014) Sampling based multi-agent joint learning for association rule mining. In: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1469–1470
Xu L, Jordan MI (1993) Em learning on a generalized finite mixture model for combining multiple classifiers. In: Proceedings of the world congress on neural networks, volume 4, pp 227–230
Zhong N, Matsui Y, Okuno T, Liu C (2002) Framework of a multi-agent kdd system. In: Intelligent data engineering and automated learning—IDEAL 2002. Springer, pp 337–346

Download references

Funding

The first author was funded by Conacyt Scholarship 362384. This work has been partly supported by the Spanish Ministry of Science and Innovation through Project TIN2015-66972-C5-5-R.

Author information

Authors and Affiliations

Universidad Veracruzana, Centro de Investigación en Inteligencia Artificial, Sebastián Camacho No 5, Xalapa, Ver., México, 91000, Mexico
Xavier Limón, Alejandro Guerra-Hernández & Nicandro Cruz-Ramírez
Departament d’Informàtica, Universitat de València, Avinguda de la Universitat, s/n, Burjassot, València, 46100, Spain
Francisco Grimaldo

Authors

Xavier Limón
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Guerra-Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Nicandro Cruz-Ramírez
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Grimaldo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alejandro Guerra-Hernández.

Appendix

Table 5 Experimental results

Full size table

Table 5 shows the results for all the datasets. The column DS refers to the indexes for the datasets in Table 3. Displayed values are the average of 20 runs (two repetitions of a tenfold stratified cross-validation). Time is measured in seconds and traffic in megabytes. Remember that the strategies round, round counter, and parallel round counter use the VFDT learning algorithm.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Limón, X., Guerra-Hernández, A., Cruz-Ramírez, N. et al. Modeling and implementing distributed data mining strategies in JaCa-DDM. Knowl Inf Syst 60, 99–143 (2019). https://doi.org/10.1007/s10115-018-1222-x

Download citation

Received: 06 March 2015
Accepted: 10 May 2018
Published: 20 June 2018
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s10115-018-1222-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling and implementing distributed data mining strategies in JaCa-DDM

Abstract

Access this article

Similar content being viewed by others

An Agents and Artifacts Approach to Distributed Data Mining

Multi-agent Systems for Distributed Data Mining Techniques: An Overview

Decision-Making in a Distributed and Dynamically Scalable Environments

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling and implementing distributed data mining strategies in JaCa-DDM

Abstract

Access this article

Similar content being viewed by others

An Agents and Artifacts Approach to Distributed Data Mining

Multi-agent Systems for Distributed Data Mining Techniques: An Overview

Decision-Making in a Distributed and Dynamically Scalable Environments

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation