Skip to main content
Log in

Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework

  • Published:
International Journal of Fuzzy Systems Aims and scope Submit manuscript

Abstract

Clustering of numerical data is a very well researched problem and so is clustering of categorical data. However, when it comes to clustering of data with mixed attributes, the literature is not that rich. For numerical data, fuzzy clustering, in particular, the fuzzy c-means (FCM), is a very effective and popular algorithm, while for categorical data, use of mixture model is quite popular. In this paper, we propose a novel framework for clustering of mixed data which contains both numerical and categorical attributes. Our objective is to find the cluster substructures that are common to both the categorical and numerical data. Our formulation is inspired by the FCM algorithm (for dealing with numerical data), mixture models (for dealing with categorical data), and the collaborative clustering framework for aggregation of the two—it is an integrated approach that judiciously uses all three components. We use our algorithm on a few commonly used datasets and compare our results with those by some state of the art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63(2), 503–527 (2007)

    Article  Google Scholar 

  2. Bezdek, J., Keller, J., Krishnapuram, R., Pal, N.: Fuzzy models and algorithms for pattern recognition and image processing. Springer, Norwell, MA (1999)

    Book  MATH  Google Scholar 

  3. Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell, MA (1981)

    Book  MATH  Google Scholar 

  4. Bezdek, J.C., Ehrlich, R., Full, W.: Fcm: The fuzzy c-means clustering algorithm. Comput. Geosci. 10(2), 191–203 (1984)

    Article  Google Scholar 

  5. Bishop, C.M., et al.: Pattern recognition and machine learning, vol. 4. Springer, New York (2006)

    MATH  Google Scholar 

  6. Chatzis, S.P.: A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional. Expert Syst. Appl. 38(7), 8684–8689 (2011)

    Article  Google Scholar 

  7. Cheung, Y.M., Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 46(8), 2228–2238 (2013)

    Article  MATH  Google Scholar 

  8. Coletta, L.F., Vendramin, L., Hruschka, E.R., Campello, R.J., Pedrycz, W.: Collaborative fuzzy clustering algorithms: Some refinements and design guidelines. IEEE Trans. Fuzzy Syst. 20(3), 444–462 (2012)

    Article  Google Scholar 

  9. Everitt, B.S.: A finite mixture model for the clustering of mixed-mode data. Stat. Prob. Lett. 6(5), 305–309 (1988)

    Article  MathSciNet  Google Scholar 

  10. Gath, I., Geva, A.B.: Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 773–780 (1989)

    Article  MATH  Google Scholar 

  11. Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)

    Article  MATH  Google Scholar 

  12. He, Z., Xu, X., Deng, S.: Squeezer: an efficient algorithm for clustering categorical data. J. Comput. Sci. Technol. 17(5), 611–624 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  13. He, Z., Xu, X., Deng, S.: Clustering mixed numeric and categorical data: a cluster ensemble approach. arXiv preprintarXiv:cs/0509011 (2005)

  14. Honda, K., Ichihashi, H.: Regularized linear fuzzy clustering and probabilistic pca mixture models. IEEE Trans. Fuzzy Syst. 13(4), 508–516 (2005)

    Article  Google Scholar 

  15. Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining, (PAKDD), pp. 21–34. Singapore (1997)

  16. Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: DMKD, Citeseer (1997)

  17. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl. Discov. 2(3), 283–304 (1998)

    Article  Google Scholar 

  18. Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)

    Article  Google Scholar 

  19. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)

    Article  Google Scholar 

  20. Ji, J., Pang, W., Zhou, C., Han, X., Wang, Z.: A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowledge-Based Syst. 30, 129–135 (2012)

    Article  Google Scholar 

  21. Jorgensen, M., Hunt, L.: Mixture model clustering of data sets with categorical and continuous variables. In: Proceedings of the Conference ISIS’96, Australia, pp. 375–84 (1996)

  22. Modha, D.S., Spangler, W.S.: Feature weighting in k-means clustering. Mach. Learn. 52(3), 217–237 (2003). doi:10.1023/A:1024016609528

    Article  MATH  Google Scholar 

  23. Pedrycz, W.: Collaborative fuzzy clustering. Pattern Recognit. Lett. 23(14), 1675–1686 (2002). doi: 10.1016/S0167-8655(02)00130-7. URL http://www.sciencedirect.com/science/article/pii/S0167865502001307

  24. San, O.M., Huynh, V.N., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. Int. J. Appl. Math. Comput. Sci. 14(2), 241–248 (2004)

    MathSciNet  MATH  Google Scholar 

  25. Witold, P., Rai, P.: Collaborative clustering with the use of fuzzy c-means and its quantification. Fuzzy Sets Syst. 159(18), 2399–2427 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  26. Yang, M.S., Hwang, P.Y., Chen, D.H.: Fuzzy clustering algorithms for mixed feature variables. Fuzzy Sets Syst. 141(2), 301–317 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  27. Zheng, Z., Gong, M., Ma, J., Jiao, L., Wu, Q.: Unsupervised evolutionary clustering algorithm for mixed type data. In: Evolutionary Computation (CEC), 2010 IEEE Congress on, pp. 1–8. IEEE (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikhil R. Pal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pathak, A., Pal, N.R. Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework. Int. J. Fuzzy Syst. 18, 339–348 (2016). https://doi.org/10.1007/s40815-016-0168-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40815-016-0168-y

Keywords

Navigation