Skip to main content

Advanced Fuzzy Clustering and Decision Tree Plug-Ins for DataEngineTM

  • Chapter
Intelligent Systems and Soft Computing

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1804))

  • 418 Accesses

Abstract

Although a large variety of data analysis tools are available on the market today, none of them is perfect; they all have their strengths and weaknesses. In such a situation it is important that a user can enhance the capabilities of a data analysis tool by his or her own favourite methods in order to compensate for shortcomings of the shipped version. However, only few commercial products offer such a possibility. A rare exception is DataEngineTM, which is provided with a well-documented interface for user-defined function blocks (plug-ins). In this paper we describe three plug-ins we implemented for this well-known tool: An advanced fuzzy clustering plug-in that extends the fuzzy c-means algorithm (which is a built-in feature of DataEngineTM) by other, more flexible algorithms, a decision tree classifier plug-in that overcomes the serious drawback that DataEngineTM lacks a native module for this highly important technique, and finally a naive Bayes classifier plug-in that makes available an old and time-tested statistical classification method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Nürnberger, A., Timm, H.: Or software: Dataengine. OR Spektrum 21, 305–313 (1999)

    Google Scholar 

  2. Timm, H.: A fuzzy cluster analysis plug-in for dataengine. In: Proc. 2nd Data Analysis Symposium, Aachen (1998)

    Google Scholar 

  3. Borgelt, C.: A decision tree plug-in for dataengine. In: Proc. 2nd Data Analysis Symposium, Aachen (1998)

    Google Scholar 

  4. Borgelt, C.: A naive bayes classifier plug-in for dataengine. In: Proc. 2nd Data Analysis Symposium, Aachen (1999)

    Google Scholar 

  5. Berry, M., Linoff, G.: Data Mining Techniques — For Marketing, Sales and Customer Support. J. Wiley & Sons, Chichester (1997)

    Google Scholar 

  6. Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)

    MATH  Google Scholar 

  7. Bezdek, J., Pal, S.: Fuzzy Models for Pattern Recognition — Methods that Search for Structures in Data. IEEE Press, Piscataway (1992)

    Google Scholar 

  8. Davé, R., Krishnapuram, R.: Robust clustering methods: A unified view. IEEE Trans. on Fuzzy Systems 5, 270–293 (1997)

    Article  Google Scholar 

  9. Davé, R.: Characterization and detection of noise in clustering. Pattern Recognition Letters 12, 657–664 (1991)

    Article  Google Scholar 

  10. Krishnapuram, R., Keller, J.: A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems 1, 98–110 (1993)

    Article  Google Scholar 

  11. Krishnapuram, R., Keller, J.: The possibilistic c-means algorithm: Insights and recommendations. IEEE Trans. on Fuzzy Systems 4, 385–393 (1996)

    Article  Google Scholar 

  12. Nasroui, O., Krishnapuram, R.: Crisp interpretations of fuzzy and possibilistic clustering algorithm. In: Proc. 3rd European Congress on Fuzzy and Intelligent Technologies, EUFIT 1995, Aachen, Germany, pp. 1312–1318. Verlag Mainz, Aachen (1994)

    Google Scholar 

  13. Barni, M., Capellini, V., Mecocci, A.: Comments on a possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems 4, 393–396 (1996)

    Google Scholar 

  14. Pal, N., Pal, K., Bezdek, J.: A mixed c-means clustering model. In: Proc. 6th IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE 1997), Barcelona, Spain, pp. 11–21. IEEE Press, Piscataway (1997)

    Chapter  Google Scholar 

  15. Gustafson, E., Kessel, W.: Fuzzy clustering with a fuzzy covariance matrix. In: Proc. IEEE Conf. on Decision and Control (CDC 1979), San Diego, CA, pp. 761–766. IEEE Press, Piscataway (1979)

    Google Scholar 

  16. Höppner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis. J. Wiley & Sons, Chichester (1999)

    MATH  Google Scholar 

  17. Gath, I., Geva, A.: Unsupervised optimal fuzzy clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 11, 773–781 (1989)

    Article  Google Scholar 

  18. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International, Belmont (1984)

    MATH  Google Scholar 

  19. Quinlan, J.: Induction of decision trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  20. Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo (1993)

    Google Scholar 

  21. Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Statistics 22, 79–86 (1951)

    Article  MATH  MathSciNet  Google Scholar 

  22. Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. on Information Theory 14(3), 462–467 (1968)

    Article  MATH  MathSciNet  Google Scholar 

  23. Lopez de Mantaras, R.: A distance-based attribute selection measure for decision tree induction. Machine Learning 6, 81–92 (1991)

    Google Scholar 

  24. Wehenkel, L.: On uncertainty measures used for decision tree induction. In: Proc. Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU 1996), Granada, Spain, pp. 413–417 (1996)

    Google Scholar 

  25. Zhou, X., Dillo, T.: A statistical-heuristic feature selection criterion for decision tree induction. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 13, 834–841 (1991)

    Article  Google Scholar 

  26. Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784. Springer, Heidelberg (1994)

    Google Scholar 

  27. Kira, K., Rendell, L.: A practical approach to feature selection. In: Proc. 9th Int. Conf. on Machine Learning (ICML 1992), pp. 250–256. Morgan Kaufman, San Franscisco (1992)

    Google Scholar 

  28. Kononenko, I.: On biases in estimating multi-valued attributes. In: Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining (KDD 1995), Montreal, Canada, pp. 1034–1040. AAAI Press, Menlo Park (1995)

    Google Scholar 

  29. Baim, P.: A method for attribute selection in inductive learning systems. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 10, 888–896 (1988)

    Article  Google Scholar 

  30. Cooper, G., Herskovits, E.: A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning. Kluwer, Dordrecht (1992)

    Google Scholar 

  31. Heckerman, D., Geiger, D., Chickering, D.: Learning bayesian networks: The combination of knowledge and statistical data. Machine Learning 20, 197–243 (1995)

    MATH  Google Scholar 

  32. Buntine, W.: Theory refinement on bayesian networks. In: Proc. 7th Conf. on Uncertainty in Artificial Intelligence, pp. 52–60. Morgan Kaufman, Los Angeles (1991)

    Google Scholar 

  33. Krichevsky, R., Trofimov, V.: The performance of universal coding. IEEE Trans. on Information Theory 27(2), 199–207 (1983)

    Article  MathSciNet  Google Scholar 

  34. Rissanen, J.: Stochastic complexity. Journal of the Royal Statistical Society (Series B) 49, 223–239 (1987)

    MATH  MathSciNet  Google Scholar 

  35. Gebhardt, J., Kruse, R.: Tightest hypertree decompositions of multivariate possibility distributions. In: Proc. Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU 1996), Granada, Spain, pp. 923–927 (1996)

    Google Scholar 

  36. Borgelt, C., Kruse, R.: Evaluation measures for learning probabilistic and possibilistic networks. In: Proc. 6th IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE 1997), Barcelona, Spain, pp. 1034–1038. IEEE Press, Piscataway (1997)

    Google Scholar 

  37. Borgelt, C., Kruse, R.: Attributauswahlmaβe für die induktion von entscheidungsbäumen: Ein überblick. In: Nakhaeizadeh, G. (ed.) Data Mining: Theoretische Aspekte und Anwendungen, pp. 77–98. Physica-Verlag, Heidelberg (1998)

    Google Scholar 

  38. Good, I.: The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Press, Cambridge (1965)

    MATH  Google Scholar 

  39. Duda, R., Hart, P.: Pattern Classification and Scene Analysis. J. Wiley & Sons, New York (1973)

    MATH  Google Scholar 

  40. Langley, P., Iba, W., Thompson, K.: An analysis of bayesian classifiers. In: Proc. 10th Nat. Conf. on Artificial Intelligence (AAAI 1992), San Jose, CA, USA, pp. 223–228. AAAI Press/MIT Press, Menlo Park/Cambridge (1992)

    Google Scholar 

  41. Langley, P., Sage, S.: Induction of selective bayesian classifiers. In: Proc. 10th Conf. on Uncertainty in Artificial Intelligence (UAI 1994), Seattle, WA, USA, pp. 399–406. Morgan Kaufmann, San Mateo (1994)

    Google Scholar 

  42. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, 2nd edn. Morgan Kaufman, San Mateo (1992)

    Google Scholar 

  43. Fisher, R.: The use of multiple measurements in taxonomic problems. Annals of Eugenics 7(2), 179–188 (1936)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Borgelt, C., Timm, H. (2000). Advanced Fuzzy Clustering and Decision Tree Plug-Ins for DataEngineTM . In: Azvine, B., Nauck, D.D., Azarmi, N. (eds) Intelligent Systems and Soft Computing. Lecture Notes in Computer Science(), vol 1804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10720181_8

Download citation

  • DOI: https://doi.org/10.1007/10720181_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67837-3

  • Online ISBN: 978-3-540-44917-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics