Skip to main content

A Survey of Open Source Data Mining Systems

  • Conference paper
Emerging Technologies in Knowledge Discovery and Data Mining (PAKDD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4819))

Included in the following conference series:

Abstract

Open source data mining software represents a new trend in data mining research, education and industrial applications, especially in small and medium enterprises (SMEs). With open source software an enterprise can easily initiate a data mining project using the most current technology. Often the software is available at no cost, allowing the enterprise to instead focus on ensuring their staff can freely learn the data mining techniques and methods. Open source ensures that staff can understand exactly how the algorithms work by examining the source codes, if they so desire, and can also fine tune the algorithms to suit the specific purposes of the enterprise. However, diversity, instability, scalability and poor documentation can be major concerns in using open source data mining systems. In this paper, we survey open source data mining systems currently available on the Internet. We compare 12 open source systems against several aspects such as general characteristics, data source accessibility, data mining functionality, and usability. We discuss advantages and disadvantages of these open source data mining systems.

This paper was supported by the National Natural Science Foundation of China (NSFC) under grants No.60603066.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Free Software Foundation: The GNU project, Website (2007), http://www.gnu.org

  2. DuBois, P.: MySQL. Sams (2005)

    Google Scholar 

  3. University of Waikato, New Zealand: Weka 3.4.9, Website (2006), http://www.cs.waikato.ac.nz/ml/Weka/index.html

  4. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  5. Adomavicius, G., Tuzhilin, A.: Using data mining methods to build customer profiles. Computer  (2001)

    Google Scholar 

  6. Bounsaythip, C., Rinta, E.: Overview of data mining for customer behavior modeling. Technical report, VTT Information Technology (2001)

    Google Scholar 

  7. Ling, C.X., Li, C.: Data mining for direct marketing: Problems and solutions. American Association for Artificial Intelligence  (1998)

    Google Scholar 

  8. Rygielski, C., Wang, J.-C., Yen, D.C.: Data mining techniques for customer relationship management. Technology in Society 24, 483–502 (2002)

    Article  Google Scholar 

  9. Apte, C., Liu, B., Pednault, E.P.D., Smyth, P.: Business applications of data mining. Communications of the ACM 45, 49–53 (2002)

    Article  Google Scholar 

  10. Ahmed, S.R.: Applications of data mining in retail business. In: Proceedings of the International Conference on Information Technology: Coding and Computing (2004)

    Google Scholar 

  11. Kovalerchuk, B., Vityaev, E.: Data Mining in finance: Advances in Relational and Hybrid Methods. Kluwer Academic Publishers, Dordrecht (2000)

    MATH  Google Scholar 

  12. Han, J., Altman, R.B., Kumar, V., Mannila, H., Pregibon, D.: Emerging scientific applications in data mining. Communications of the ACM 45, 54–58 (2002)

    Article  Google Scholar 

  13. Grossman, R., Kamath, C., Kegelmeyer, P., Kumar, V., Namburu, R.: Data Mining for Scientific and Engineering Applications. Kluwer Academic Publishers, Dordrecht (2001)

    MATH  Google Scholar 

  14. Huang, J.: Data mining overview. Technical report, E-Business Technology Institute (2006)

    Google Scholar 

  15. Goebel, M., Gruenwald, L.: A survey of data mining and knowledge discovery software tools. In: SIGKDD Explorations, vol. 1, pp. 20–33. ACM SIGKDD (1999)

    Google Scholar 

  16. Open Source Initiative: The open source definition, Website (2007), http://www.opensource.org/docs/definition_plain.html

  17. Perens, B.: The open source definition, Website (2007), http://perens.com/Articles/OSD.html

  18. Wang, H., Wang, C.: Open source software adoption: A status report. IEEE SOFTWARE (2001)

    Google Scholar 

  19. Pyle, D.: Data Preparation for Data Mining. Morgan Kaufman, San Francisco (1999)

    Google Scholar 

  20. Object Management Group: Common warehouse metamodel (cwm), Website (2007), http://www.omg.org/cwm/

  21. Data Mining Group: Predictive model markup language (pmml) (2005)

    Google Scholar 

  22. Information Technology and Systems Center (ITSC) at the University of Alabama in Huntsville: Algorithm development and mining system, Website (2005), http://datamining.itsc.uah.edu/adam/

  23. HIT-HKU BI Lab: Alphaminer 2.0 (2006) Website: http://bi.hitsz.edu.cn/AlphaMiner/

  24. Data Bionics Research Group, University of Marburg: Databionic esom tools, Website (2006), http://databionic-esom.sourceforge.net/

  25. Williams, G.J.: Gnome data mining tools, Website (2006), http://www.togaware.com/datamining/gdatamine/

  26. Chair for Bioinformatics and Information Mining, University of Konstanz, Germany: Knime 1.2.0, Website (2007), http://www.knime.org/

  27. MiningMartResearch Team: Mining mart 1.1, Website (2006), http://mmart.cs.uni-dortmund.de/

  28. Stanford: Mlc++, Website (1997), http://www.sgi.com/tech/mlc/

  29. Artificial Intelligence Laboratory, University of Ljubljana, Slovenia: Orange 0.9.64, Website (2007), http://www.ailab.si/orange/

  30. Williams, G.J.: Rattle 2.1.116, Website (2006), http://Rattle.togaware.com/

  31. Ricco RAKOTOMALALA, University Lyon, France: Tanagra 1.4.12, Website (2006), http://chirouble.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html

  32. Artificial Intelligence Unit, University of Dortmund, Germany: Yale 3.4, Website (2006), http://rapid-i.com/

  33. Kleissner, C.: Data mining for the enterprise. In: Proceeding of the 31st Annual Hawaii International Conference on System Science, pp. 295–304 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Washio Zhi-Hua Zhou Joshua Zhexue Huang Xiaohua Hu Jinyan Li Chao Xie Jieyue He Deqing Zou Kuan-Ching Li Mário M. Freire

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, X., Ye, Y., Williams, G., Xu, X. (2007). A Survey of Open Source Data Mining Systems. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77018-3_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77016-9

  • Online ISBN: 978-3-540-77018-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics