Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework

Pathak, Arkanath; Pal, Nikhil R.

doi:10.1007/s40815-016-0168-y

Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework

Published: 02 April 2016

Volume 18, pages 339–348, (2016)
Cite this article

International Journal of Fuzzy Systems Aims and scope Submit manuscript

Arkanath Pathak¹ &
Nikhil R. Pal²

425 Accesses
16 Citations
1 Altmetric
Explore all metrics

Abstract

Clustering of numerical data is a very well researched problem and so is clustering of categorical data. However, when it comes to clustering of data with mixed attributes, the literature is not that rich. For numerical data, fuzzy clustering, in particular, the fuzzy c-means (FCM), is a very effective and popular algorithm, while for categorical data, use of mixture model is quite popular. In this paper, we propose a novel framework for clustering of mixed data which contains both numerical and categorical attributes. Our objective is to find the cluster substructures that are common to both the categorical and numerical data. Our formulation is inspired by the FCM algorithm (for dealing with numerical data), mixture models (for dealing with categorical data), and the collaborative clustering framework for aggregation of the two—it is an integrated approach that judiciously uses all three components. We use our algorithm on a few commonly used datasets and compare our results with those by some state of the art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fuzzy Clustering/Co-clustering and Probabilistic Mixture Models-Induced Algorithms

Clustering Based on a Mixture of Fuzzy Models Approach

Cluster Analysis: An Application to a Real Mixed-Type Data Set

References

Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63(2), 503–527 (2007)
Article Google Scholar
Bezdek, J., Keller, J., Krishnapuram, R., Pal, N.: Fuzzy models and algorithms for pattern recognition and image processing. Springer, Norwell, MA (1999)
Book MATH Google Scholar
Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell, MA (1981)
Book MATH Google Scholar
Bezdek, J.C., Ehrlich, R., Full, W.: Fcm: The fuzzy c-means clustering algorithm. Comput. Geosci. 10(2), 191–203 (1984)
Article Google Scholar
Bishop, C.M., et al.: Pattern recognition and machine learning, vol. 4. Springer, New York (2006)
MATH Google Scholar
Chatzis, S.P.: A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional. Expert Syst. Appl. 38(7), 8684–8689 (2011)
Article Google Scholar
Cheung, Y.M., Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 46(8), 2228–2238 (2013)
Article MATH Google Scholar
Coletta, L.F., Vendramin, L., Hruschka, E.R., Campello, R.J., Pedrycz, W.: Collaborative fuzzy clustering algorithms: Some refinements and design guidelines. IEEE Trans. Fuzzy Syst. 20(3), 444–462 (2012)
Article Google Scholar
Everitt, B.S.: A finite mixture model for the clustering of mixed-mode data. Stat. Prob. Lett. 6(5), 305–309 (1988)
Article MathSciNet Google Scholar
Gath, I., Geva, A.B.: Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 773–780 (1989)
Article MATH Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)
Article MATH Google Scholar
He, Z., Xu, X., Deng, S.: Squeezer: an efficient algorithm for clustering categorical data. J. Comput. Sci. Technol. 17(5), 611–624 (2002)
Article MathSciNet MATH Google Scholar
He, Z., Xu, X., Deng, S.: Clustering mixed numeric and categorical data: a cluster ensemble approach. arXiv preprintarXiv:cs/0509011 (2005)
Honda, K., Ichihashi, H.: Regularized linear fuzzy clustering and probabilistic pca mixture models. IEEE Trans. Fuzzy Syst. 13(4), 508–516 (2005)
Article Google Scholar
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining, (PAKDD), pp. 21–34. Singapore (1997)
Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: DMKD, Citeseer (1997)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl. Discov. 2(3), 283–304 (1998)
Article Google Scholar
Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)
Article Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Article Google Scholar
Ji, J., Pang, W., Zhou, C., Han, X., Wang, Z.: A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowledge-Based Syst. 30, 129–135 (2012)
Article Google Scholar
Jorgensen, M., Hunt, L.: Mixture model clustering of data sets with categorical and continuous variables. In: Proceedings of the Conference ISIS’96, Australia, pp. 375–84 (1996)
Modha, D.S., Spangler, W.S.: Feature weighting in k-means clustering. Mach. Learn. 52(3), 217–237 (2003). doi:10.1023/A:1024016609528
Article MATH Google Scholar
Pedrycz, W.: Collaborative fuzzy clustering. Pattern Recognit. Lett. 23(14), 1675–1686 (2002). doi: 10.1016/S0167-8655(02)00130-7. URL http://www.sciencedirect.com/science/article/pii/S0167865502001307
San, O.M., Huynh, V.N., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. Int. J. Appl. Math. Comput. Sci. 14(2), 241–248 (2004)
MathSciNet MATH Google Scholar
Witold, P., Rai, P.: Collaborative clustering with the use of fuzzy c-means and its quantification. Fuzzy Sets Syst. 159(18), 2399–2427 (2008)
Article MathSciNet MATH Google Scholar
Yang, M.S., Hwang, P.Y., Chen, D.H.: Fuzzy clustering algorithms for mixed feature variables. Fuzzy Sets Syst. 141(2), 301–317 (2004)
Article MathSciNet MATH Google Scholar
Zheng, Z., Gong, M., Ma, J., Jiao, L., Wu, Q.: Unsupervised evolutionary clustering algorithm for mixed type data. In: Evolutionary Computation (CEC), 2010 IEEE Congress on, pp. 1–8. IEEE (2010)

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India
Arkanath Pathak
Electronics and Communication Sciences Unit, Indian Statistical Institute, Calcutta, West Bengal, 700108, India
Nikhil R. Pal

Authors

Arkanath Pathak
View author publications
You can also search for this author in PubMed Google Scholar
Nikhil R. Pal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikhil R. Pal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pathak, A., Pal, N.R. Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework. Int. J. Fuzzy Syst. 18, 339–348 (2016). https://doi.org/10.1007/s40815-016-0168-y

Download citation

Received: 15 October 2015
Revised: 02 February 2016
Accepted: 19 February 2016
Published: 02 April 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s40815-016-0168-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework

Abstract

Access this article

Similar content being viewed by others

Fuzzy Clustering/Co-clustering and Probabilistic Mixture Models-Induced Algorithms

Clustering Based on a Mixture of Fuzzy Models Approach

Cluster Analysis: An Application to a Real Mixed-Type Data Set

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework

Abstract

Access this article

Similar content being viewed by others

Fuzzy Clustering/Co-clustering and Probabilistic Mixture Models-Induced Algorithms

Clustering Based on a Mixture of Fuzzy Models Approach

Cluster Analysis: An Application to a Real Mixed-Type Data Set

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation