Skip to main content
Log in

Modeling and implementing distributed data mining strategies in JaCa-DDM

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

This work introduces JaCa-DDM, a novel distributed data mining system founded on the agents and artifacts paradigm, conceived to design, implement, deploy, and evaluate learning strategies. Jason rational agents conform to such strategies to cope with distributed computing environments, where CArtAgO artifacts encapsulate learning algorithms, data sources, evaluation tools, and other services implemented in Weka for data mining tasks. The set of strategies presented in this paper aims at encouraging the use of JaCa-DDM to develop new ones, suited to different needs. For this, our system provides tools to evaluate the resulting models in terms of accuracy, number of instances employed to learn, time of convergence, and volume of communications. Although the emphasis in decision trees, JaCa-DDM can be easily extended by adopting new artifacts, e.g., for meta-learning. The main contributions of the paper are as follows: (i) From the multi-agent systems perspective, our approach illustrates how to exploit the so-called “agentification” of Weka for the sake of code reusability, while preserving the benefits of reasoning at the Belief–Desire–Intention level with Jason; (ii) from the data mining perspective, JaCa-DDM is promoted as an extensible tool to define and test distributed strategies; and (iii) a set of strategies including centralizing, meta-learning and Windowing-based approaches, is carefully analyzed to provide comparisons among them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. http://jacaddm.sourceforge.net/.

  2. A tutorial is available at https://sourceforge.net/p/jacaddm/wiki.

References

  1. Albashiri KA, Coenen F (2009) Agent-enriched data mining using an extendable framework. In: Agents and data mining interaction. Springer, pp 53–68

  2. Bache K, Lichman M (2013) UCI machine learning repository

  3. Baik SW, Bala J, Cho JS (2005) Agent based distributed data mining. In: Parallel and distributed computing: applications and technologies. Springer, pp 42–45

  4. Bailey S, Grossman R, Sivakumar H, Turinsky A (1999) Papyrus: a system for data mining over local and wide area clusters and super-clusters. In: Proceedings of the 1999 ACM/IEEE conference on Supercomputing. ACM, p 63

  5. Bellifemine F, Caire G, Greenwood D (2007) Developing multi-agent systems with JADE. Wiley, London

    Book  Google Scholar 

  6. Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4

  7. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  8. Bordini RH, Hübner JF, Wooldridge M (2007) Programming multi-agent systems in agent-speak using Jason. Wiley, London

    Book  MATH  Google Scholar 

  9. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  10. Caire G, Quarantotto E, Sacchi G (2009) Wade: an open source platform for workflows and agents. In: MALLOW

  11. Cao L, Weiss G, Philip SY (2012) A brief introduction to agent mining. Auton Agents Multi Agent Syst 25(3):419–424

    Article  Google Scholar 

  12. Cao L (2009) Data mining and multi-agent integration. Springer, Berlin Heidelberg New York London

    Book  MATH  Google Scholar 

  13. Cao L, Bazzan ALC, Gorodetsky V, Mitkas PA, Weiss G, Philip SY (2010) Agents and data mining interaction: 6th ADMI 2010, Toronto, ON, Canada, volume 5980 ofLecture Notes in Artificial Intelligence. Springer Verlag, Berlin Heidelberg

  14. Cao L, Gorodetsky V, Liu J, Gerhard G, Philip SY (2009) Agents and data mining interaction: 4th ADMI, Budapes, Hungary, vol 5680. Lecture notes in artificial intelligence. Springer Verlag, Berlin Heidelberg New York

  15. Chan PK, Stolfo SJ (1997) On the accuracy of meta-learning for scalable data mining. J Intell Inf Syst 8(1):5–28

    Article  Google Scholar 

  16. Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, London

    Google Scholar 

  17. Da Silva JC, Giannella C, Bhargava R, Kargupta H, Klusch M (2005) Distributed data mining and agents. Eng Appl Artif Intell 18(7):791–807

    Article  Google Scholar 

  18. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80

  19. Finin T et al (1992) An overview of KQML: a knowledge query and manipulation language. Technical report, University of Maryland, CS Department,

  20. Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156

    Google Scholar 

  21. Fürnkranz J (1998) Integrative windowing. arXiv preprint cs/9805101

  22. Gorodetsky V, Karsaeyv O, Samoilov V (2003) Multi-agent technology for distributed data mining and classification. In: Intelligent agent technology, 2003. IAT 2003. IEEE/WIC international conference on. IEEE, pp 438–441

  23. Guo Y, Sutiwaraphun J (1998) Knowledge probing in distributed data mining. In: Working notes of the KDD-97 workshop on distributed data mining. pp 61–69

  24. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–106

  25. Kargupta H, Byung-Hoon DH, Johnson E (1999) Collective data mining: a new perspective toward distributed data analysis. In: Advances in distributed and parallel knowledge discovery. Citeseer

  26. Klusch M, Lodi S, Moro G (2003) Agent-based distributed data mining: The kdec scheme. In: Intelligent information agents. Springer, pp 104–122

  27. Klusch M, Lodi S, Moro G (2003) Issues of agent-based distributed data mining. In: Proceedings of the second international joint conference on Autonomous agents and multiagent systems. ACM, pp 1034–1035

  28. Limón X, Guerra-Hernández A, Cruz-Ramírez N, Grimaldo F (2013) An agents and artifacts approach to distributed data mining. In Castro F, Gelbukh A, Mendoza MG (eds), 11th MICAI, volume 8266 ofLNAI. Springer, Berlin Heidelbergpp 338–349

  29. Luo P, He Q, Huang R, Lin F, Shi Z (2005) Execution engine of meta-learning system for kdd in multi-agent environment. In: AIS-ADM, volume 3505 of LNAI. Springer, Berlin Heidelberg, pp 149–160

  30. Moemeng C, Gorodetsky V, Zuo Z, Yang Y, Zhang C (2009) Agent-based distributed data mining: a survey. In: Data mining and multi-agent integration. Springer, pp 47–58

  31. Moemeng C, Zhu X, Cao L (2010) Integrating workflow into agent-based distributed data mining systems. In: Agents and data mining interaction. Springer, pp 4–15

  32. Moemeng C, Zhu X, Cao L, Jiahang C (2010) i-analyst: an agent-based distributed data mining platform. In: Data mining workshops (ICDMW), 2010 IEEE international conference on. IEEE, pp1404–1406

  33. Nguyen H-L, Woon Y-K, Ng W-K (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569

    Article  Google Scholar 

  34. Omicini A, Ricci A, Viroli M (2008) Artifacts in the A&A meta-model for multi-agent systems. Auton Agents Multi Agent Syst 17(3):432–456

    Article  Google Scholar 

  35. Park B-H, Kargupta H (2002) Distributed data mining: algorithms, systems, and applications. pp 341–358

  36. Prodromidis A, Chan P, Stolfo S (2000) Meta-learning in distributed data mining systems: issues and approaches. Adv Distrib Parallel Knowl Discov 3:81–114

    Google Scholar 

  37. Quinlan JR (1993) C4. 5: programs for machine learning, vol 1. Morgan kaufmann, Burlington

    Google Scholar 

  38. Raftery AE, Madigan D, Hoeting JA (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92(437):179–191

    Article  MathSciNet  MATH  Google Scholar 

  39. Rao AS (1996) AgentSpeak(L): BDI agents speak out in a logical computable language. In: van Hoe R (ed) Seventh European Workshop on Modelling Autonomous Agents in a Multi-Agent World. Eindhoven, The Netherlands

  40. Rao VS (2009) Multi agent-based distributed data mining: an overview. Int J Rev Comput 3:83–92

    Google Scholar 

  41. Ricci A, Piunti M, Viroli M (2011) Environment programming in multi-agent systems: an artifact-based perspective. Auton Agents Multi Agent Syst 23(2):158–192

    Article  Google Scholar 

  42. Ricci A, Viroli M, Omicini A (2006) Construenda est cartago: toward an infrastructure for artifacts in MAS. Cybern Syst 2:569–574

    Google Scholar 

  43. Secretan J (2009) An architecture for high-performance privacy-preserving and distributed data mining. PhD thesis, University of Central Florida Orlando, Florida, Orlando, FL., USA

  44. Shoham Y (1993) Agent-oriented programming. Artif Intell 60:51–92

    Article  MathSciNet  Google Scholar 

  45. Stolfo SJ, Prodromidis AL, Tselepis S, Lee W, Fan DW, Chan PK (1997) Jam: Java agents for meta-learning over distributed databases. In: KDD volume 97, pp 74–81

  46. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  47. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  Google Scholar 

  48. Xu J, Li Y, Li L, Chen Y (2014) Sampling based multi-agent joint learning for association rule mining. In: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1469–1470

  49. Xu L, Jordan MI (1993) Em learning on a generalized finite mixture model for combining multiple classifiers. In: Proceedings of the world congress on neural networks, volume 4, pp 227–230

  50. Zhong N, Matsui Y, Okuno T, Liu C (2002) Framework of a multi-agent kdd system. In: Intelligent data engineering and automated learning—IDEAL 2002. Springer, pp 337–346

Download references

Funding

The first author was funded by Conacyt Scholarship 362384. This work has been partly supported by the Spanish Ministry of Science and Innovation through Project TIN2015-66972-C5-5-R.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alejandro Guerra-Hernández.

Appendix

Appendix

Table 5 Experimental results

Table 5 shows the results for all the datasets. The column DS refers to the indexes for the datasets in Table 3. Displayed values are the average of 20 runs (two repetitions of a tenfold stratified cross-validation). Time is measured in seconds and traffic in megabytes. Remember that the strategies round, round counter, and parallel round counter use the VFDT learning algorithm.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Limón, X., Guerra-Hernández, A., Cruz-Ramírez, N. et al. Modeling and implementing distributed data mining strategies in JaCa-DDM. Knowl Inf Syst 60, 99–143 (2019). https://doi.org/10.1007/s10115-018-1222-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1222-x

Keywords

Navigation