Skip to main content
Log in

Distributed data mining for e-business

  • Published:
Information Technology and Management Aims and scope Submit manuscript

Abstract

In the internet-based e-business environment, most business data are distributed, heterogeneous and private. To achieve true business intelligence, mining large amounts of distributed data is necessary. Through a thorough literature review, this paper identifies four main issues in distributed data mining (DDM) systems for e-business and classifies modern DDM systems into three classes with representative samples. To address these identified issues, this paper proposes a novel DDM model named DRHPDM (Data source Relevance-based Hierarchical Parallel Distributed data mining Model). In addition, to improve the quality of the final result, the data sources are divided into a centralized mining layer and a distributed mining layer, according to their relevance. To improve the openness, cross-platform ability, and intelligence of the DDM system, web service and multi-agent technologies are adopted. The feasibility of DRHPDM was verified by building a prototype system and applying it to a web usage mining scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Albashiri KA, Coenen F, Leng P (2009) EMADS: an extendible multi-agent data miner. Knowl Based Syst 22(7):523–528

    Article  Google Scholar 

  2. Brintrup A (2010) Behaviour adaptation in the multi-agent, multi-objective and multi-role supply chain. Comput Ind 61(7):636–645

    Article  Google Scholar 

  3. Cesario E, Talia D (2008) Distributed data mining models as services on the grid. In: IEEE International Conference on Data Mining Workshops, Pisa, TBD, Italy, pp 486-495

  4. Chen GQ, Wei Q, Liu D, Wets G (2002) Simple association rules (SAR) and the SAR-based rule discovery. Comput Ind Eng 43(4):721–733

    Article  Google Scholar 

  5. Chen ZY, Liu S F, Liu G (2008) The multi-agent knowledge management system model for pervasive computing. In: 3rd international conference on pervasive computing and applications, Alexandria, Egypt, pp 70–73

  6. Da Silva JC, Giannella C, Bhargava R, Kargupta H, Klusch M (2005) Distributed data mining and agents. Eng Appl Artif Intell 18(7):791–807

    Article  Google Scholar 

  7. Dam HH, Abbass HA, Lokan C (2005) DXCS: an XCS system for distributed data mining. In: Proceeding of the 2005 conference on genetic and evolutionary computation, Washington, DC, USA, pp 1883–1890

  8. Danish K (2008) CAKE-classifying, associating and knowledge discovery-an approach for distributed data mining (DDM) using parallel data mining agents (PADMAs). In: IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, Sydney, Australia, pp 596–601

  9. Das R, Turkoglu I (2009) Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method. Expert Syst Appl 36(3):6635–6644

    Article  Google Scholar 

  10. Davies WHE, Edwards P (1995) Agent-based knowledge discovery. Working Notes of the AAAI Spring Symposium. Information Gathering from Heterogeneous, Distributed Environments, Palo Alto, California, pp 34–37

  11. David M, Massimo P, Matthias W (2007) Bringing Semantics to Web Services with OWL-S. World Wide Web 10(3):243–277

    Article  Google Scholar 

  12. Eirinaki M, Vazirgiannis M (2003) Web mining for web personalization. ACM Trans Internet Technol 3(1):1–27

    Article  Google Scholar 

  13. Foster I, Kesselman C, Tuecke S (2001) The anatomy of the grid: enabling scalable virtual organizations. Int J High Perform Comput Appl 15(3):200–222

    Article  Google Scholar 

  14. Giannella C, Bhargava R, and Kargupta H (2004) Multi-agent systems and distributed data mining. In: Cooperative information agents VIII: 8th international workshop, CIA 2004, Erfurt, Germany, pp 1–15

  15. Gong ZG, Muyeba M, Guo JZ (2010) Business information query expansion through semantic network. Enterp Inf Syst 4(1):1–22

    Article  Google Scholar 

  16. Gordijn J, Akkermans H (2001) Designing and evaluating e-business models. IEEE Intell Syst 16(4):11–17

    Article  Google Scholar 

  17. Graml T, Bracht R, Spies M (2008) Patterns of business rules to enable agile business processes. Enterp Inf Syst 2(4):385–402

    Article  Google Scholar 

  18. Gruber TR (2002) Toward principles for the design of ontologies used for knowledge sharing? Technical report KSL-93-04, Knowledge Systems Laboratory, Stanford University

  19. Han JW, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufman Publishers, San Francisco

    Google Scholar 

  20. Hand D, Mannila H, Smyth P (2001) Principals of data mining. MIT press, Cambridge

    Google Scholar 

  21. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin

    Google Scholar 

  22. Heer J, Chi EH (2001) Identification of Web user traffic composition using multi-modal clustering and information scent. In: Proceedings of the workshop on web mining, SIAM conference on data mining, Chicago, USA, pp 51–58

  23. Hsu C, Wallace WA (2007) An industrial network flow information integration model for supply chain management and intelligent transportation. Enterp Inf Syst 1(3):327–351

    Article  Google Scholar 

  24. Izza S (2009) Integration of industrial information systems: from syntactic to semantic integration approaches. Enterp Inf Syst 3(1):1–57

    Article  Google Scholar 

  25. Jaamour R (2005) Securing web services. Inf Sec J Global Persp 14(4):36–44

    Article  Google Scholar 

  26. Jasjit S (2008) Distributed R&D, cross-regional knowledge integration and quality of innovative output. Res Policy 37(1):77–96

    Article  Google Scholar 

  27. Jespersen SE, Thorhauge J, Pedersen TB (2002) A hybrid approach to web usage mining. In Proc. of 4th International Conference Data Warehousing and Knowledge Discovery (DaWaK’02) Aix-en-Province, France, pp 73–82

  28. Kakousis K, Paspallis N, Papadopoulos GA (2010) A survey of software adaptation in mobile and ubiquitous computing. Enterp Inf Syst 4(4):355–389

    Article  Google Scholar 

  29. Kargupta H, Hamzaoglu I, Stafford B (1997) Scalable, distributed data mining using an agent based architecture. In: Proceedings the third international conference on the knowledge discovery and data mining. AAAI Press, Menlo Park, California

  30. Kumar A, Kantardzic MM, Madden S (2006) Guest editors’ introduction: distributed data mining-framework and implementations. IEEE Internet Comput 10(4):15–17

    Article  Google Scholar 

  31. Lee SM, Olson DL, Lee SH (2009) Open process and open-source enterprise systems. Enterp Inf Syst 3(2):201–209

    Article  Google Scholar 

  32. Liu D, Deters R, Zhang WJ (2010) Architectural design for resilience. Enterp Inf Syst 4(2):137–152

    Article  Google Scholar 

  33. Luhn HP (1958) A business intelligence system. IBM J Res Dev 2(4):314–319

    Article  Google Scholar 

  34. Luo HY, Gao JL, Ji WL (2008) Research on data mining in e-business websites. In: 2008 International conference on computer science and software engineering

  35. Luo J, Xu L, Jamont JP, Zeng L, Shi Z (2007) Flood decision support system on agent grid: method and implementation. Enterp Inf Syst 1(1):49–68

    Article  Google Scholar 

  36. Luo JW, Wang MG, Hu J, Shi ZZ (2007) Distributed data mining on agent grid: issues, platform and development toolkit. Future Gener Comput Syst 23(1):61–68

    Article  Google Scholar 

  37. Luo P, He Q, Huang R, Lin F, Shi ZZ (2005) Execution engine of meta-learning system for KDD in multi-agent environment. In: Proceedings of the international workshop on autonomous intelligent systems: agents and data mining, St. Petersburg, Russia, pp 149–160

  38. Maedche A, Staab S (2001) Ontology learning for the Semantic Web. IEEE Intell Syst 16(2):72–79

    Article  Google Scholar 

  39. Marijn J, Jeffrey G, René WW (2006) Web service orchestration in public administration: challenges, roles, and growth stages. Inf Syst Manag 23(2):44–55

    Article  Google Scholar 

  40. Pan JZ (2007) A flexible ontology reasoning architecture for the semantic web. IEEE Trans Knowl Data Eng 19(2):246–260

    Article  Google Scholar 

  41. Pechoucek M, Marik V (2008) Industrial deployment of multi-agent technologies: review and selected case studies. Auton Agents Multi-Agent Syst 17(3):397–431

    Article  Google Scholar 

  42. Petrini M, Pozzebon M (2009) Managing sustainability with the support of business intelligence: integrating socio-environmental indicators and organisational context. J Strateg Inf Syst 18(4):178–191

    Article  Google Scholar 

  43. Piao CH, Hanc XF, Wu H (2010) Research on e-commerce transaction networks using multi-agent modeling and open application programming interface. Enterp Inf Syst 4(3):329–353

    Article  Google Scholar 

  44. Pipattanasomporn M, Feroze H, Rahman S (2009) Multi-agent systems in a distributed smart grid: design and implementation. In: Power systems conference and exposition, Washington pp 1–8

  45. Prodromidis AL, Chan PK, Stolfo SJ (2000) Meta-learning in distributed data mining systems: issues and approaches. In: Advances in distributed and parallel knowledge discovery, The MIT Press, pp 81–114

  46. Rao, V.S.(2009) Multi agent-based distributed data mining: an over view. Int J Rev Comput 83–92. http://www.ijric.org/volumes/Vol3/11Vol3.pdf

  47. Ryu SH, Casati F, Skogsrud H, Benatallah B, Saint-Paul R (2008) Supporting the dynamic evolution of web service protocols in service-oriented architectures. ACM Trans Web 2(2):1–46

    Article  Google Scholar 

  48. Stankovski V, Swain M, Kravtsov V et al (2008) Digging deep into the data mine with Data Mining Grid. IEEE Internet Comput 12(6):69–76

    Article  Google Scholar 

  49. Sumner M (2009) How alignment strategies influence ERP project success. Enterp Inf Syst 3(4):425–448

    Article  Google Scholar 

  50. Tan WN, Xu YC, Xu W, Xu LD, Zhao XH, Wang L, Fu LL (2010) A methodology toward manufacturing grid-based virtual enterprise operating platform. Enterp Inf Syst 4(3):283–309

    Article  Google Scholar 

  51. Theussl S, Feinerer I, Hornik K (2009) Distributed Text Mining with tm. The R User Conference 2009. Retrieved on Feburary 4th, 2011 at http://www.r-project.org/conferences/useR-2009/slides/Theussl+Feinerer+Hornik.pdf

  52. Tozicka J, Rovatsos M, Pechoucek M (2007) A framework for agent-based distributed machine learning and data mining. In: International conference on autonomous agents and multi-agent systems, Hawai’i, USA, pp 1–8

  53. Trkman P, McCormack K, Oliveira MPV, Ladeira MB (2010) The impact of business analytics on supply chain performance. Decis Support Syst 49(3):318–327

    Article  Google Scholar 

  54. Wang C, Ghenniwa H, Shen WM (2008) Real time distributed shop floor scheduling using an agent-based service-oriented architecture. Int J Prod Res 46(9):2433–2452

    Article  Google Scholar 

  55. Wang K, Bai XY, Li J, Ding C (2010) A service-based framework for pharmacogenomics data integration. Enterp Inf Syst 4(3):225–245

    Article  Google Scholar 

  56. Yang L, Zuo C, Wang YG (2005) Research and implementation of service oriented architecture for knowledge discovery. Chin J Comput 28(4):445–457

    Google Scholar 

  57. Yang H, Simon F (2009) A framework of business intelligence-driven data mining for e-business. 2009 Fifth International Joint Conference on INC, IMS and IDC

  58. Yu PJ, Buyya PR (2005) A taxonomy of scientific workflow systems for grid computing. ACM SIGMOD Record 34(3):44–49

    Article  Google Scholar 

  59. Zhang N, Bao H (2009) Research on distributed data mining technology based on Grid. In: First international workshop on database technology and applications, Wuhan, pp 440–443

  60. Zhang T, Ying S, Cao S, Zhang JK, Wei J (2008) A modeling approach to service-oriented architecture. Enterp Inf Syst 2(3):239–257

    Article  Google Scholar 

  61. Zhang Y, Bhattacharyya S (2007) Effectiveness of Q-learning as a tool for calibrating agent-based supply chain network models. Enterp Inf Syst 1(2):217–233

    Article  Google Scholar 

  62. Zhao Y, Raicu I, Lu S (2008) Cloud computing and grid computing 360-degree compared. In:Grid Computing Environments Workshop, 2008, pp 1–10

  63. Zhou B, Jia Y, Liu CY, Zhang X (2010) A Distributed Text Mining System for Online Web Textual Data Analysis. In: Proceedings of 2010 international conference on cyber-enabled distributed computing and knowledge discovery (CyberC), 1–4, Oct. 2010

  64. Zhuang Y, Chen JM, Xu D, Pan JG (2007) Distributed data mining based on multi-agent system. Computer Sci 34(12):163–167

    Google Scholar 

Download references

Acknowledgments

The research is supported by Natural Science Foundation of Hebei Province (No. G2010000903), and the Doctor Start-up Research Fund of Hebei University of Science and Technology (No. QD200945).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, B., Cao, S.G. & He, W. Distributed data mining for e-business. Inf Technol Manag 12, 67–79 (2011). https://doi.org/10.1007/s10799-011-0091-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10799-011-0091-8

Keywords

Navigation