Comparison of Cluster Validity Measures Based <i>x</i>-Means

Yukihiro Hamasuna; Naohiko Kinoshita; Yasunori Endo

doi:10.20965/jaciii.2016.p0845

single-jc.php

« previous

JACIII Vol.20 No.5 pp. 845-853

doi: 10.20965/jaciii.2016.p0845

(2016)

Paper:

Views over last 60 days: 911

Comparison of Cluster Validity Measures Based x-Means

Yukihiro Hamasuna^, Naohiko Kinoshita^, and Yasunori Endo^

^*Department of Informatics, School of Science and Engineering, Kindai University
3-4-1 Kowakae, Higashi-osaka, Osaka 577-8502, Japan

^**Research Fellowship for Young Scientists, the Japan Society for the Promotion of Science (JSPS)
1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan

^***Faculty of Engineering, Information and Systems, University of Tsukuba
1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan

Received:

September 18, 2015

Accepted:

August 2, 2016

Published:

September 20, 2016

Keywords:

cluster validity measures, x-means, determining the number of clusters, fuzzy partition

Abstract

The x-means determines the suitable number of clusters automatically by executing k-means recursively. The Bayesian Information Criterion is applied to evaluate a cluster partition in the x-means. A novel type of x-means clustering is proposed by introducing cluster validity measures that are used to evaluate the cluster partition and determine the number of clusters instead of the information criterion. The proposed x-means uses cluster validity measures in the evaluation step, and an estimation of the particular probabilistic model is therefore not required. The performances of a conventional x-means and the proposed method are compared for crisp and fuzzy partitions using eight datasets. The comparison shows that the proposed method obtains better results than the conventional method, and that the cluster validity measures for a fuzzy partition are effective in the proposed method.

Cite this article as:

Y. Hamasuna, N. Kinoshita, and Y. Endo, “Comparison of Cluster Validity Measures Based x-Means,” J. Adv. Comput. Intell. Intell. Inform., Vol.20 No.5, pp. 845-853, 2016.

Data files:

References

[1] A. K. Jain, Data clustering: 50 years beyond K-means, Pattern Recognition Letters, Vol. 31, No. 8, pp. 651–666, 2010.
[2] J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Plenum Press, New York, 1981.
[3] S. Miyamoto, H. Ichihashi, and K. Honda, “Algorithms for Fuzzy Clustering,” Springer, Heidelberg, 2008.
[4] R. O. Duda and P. E. Hart, “Pattern Classification and Scene Analysis,” Wiley, New York, 1973.
[5] R. N. Davacutee, and R. Krishnapuram, “Robust clustering methods : A unified view,” IEEE Trans. on Fuzzy Systems, Vol.5, No.2, pp. 270-293, 1997.
[6] Y. Hamasuna and Y. Endo, “Sequential Extraction By Using Two Types of Crisp Possibilistic Clustering,” Proc. of the IEEE Int. Conf. on Systems, Man, and Cybernetics (IEEE SMC 2013), pp. 3505-3510, 2013.
[7] S. Miyamoto, Y. Kuroda, and K. Arai, “Algorithms for Sequential Extraction of Clusters by Possibilistic Method and Comparison with Mountain Clustering,” J. of Advanced Computational Intelligence and Intelligent Informatics (JACIII), Vol.12, No.5, pp. 448-453, 2008.
[8] R. R. Yager and D. P. Filev, “Approximate clustering via the mountain method,” IEEE Trans. on Systems, Man and Cybernetics, Vol.2, No.8, pp. 1279-1284, 1994.
[9] T. Ishioka, “An expansion of X-means for automatically determining the optimal number of clusters,” Proc. of the 4th IASTED Int. Conf. on Computational Intelligence, pp. 91-96, 2005.
[10] D. Pelleg and A. Moore, “X-means: Extending K-means with Efficient Estimation of the Number of Clusters,” Proc. of of the 17th Int. Conf. on Machine Learning (ICML2000), pp. 727-734, 2013.
[11] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.1, No.2, pp. 224-227, 1979.
[12] J. C. Dunn, “Well separated clusters and optimal fuzzy partitions,” J. of Cybernetics, Vol.4, pp. 95-104, 1974.
[13] I. Gath and A. B. Geva, “Unsupervised optimal fuzzy clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.11, No.7, pp. 773-780, 1989.
[14] W. Hashimoto, T. Nakamura, and S. Miyamoto, “Comparison and Evaluation of Different Cluster Validity Measures Including Their Kernelization,” J. of Advanced Computational Intelligence and Intelligent Informatics (JACIII), Vol.13, No.3, pp. 204-209, 2009.
[15] W. Wang and Y. Zhang, “On fuzzy cluster validity indices,” Fuzzy Sets and Systems, Vol.158, No.19, pp. 2095-2117, 2007.
[16] X. L. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.13, No.8, pp. 841-847, 1991.
[17] B. Rezaee, “A cluster validity index for fuzzy clustering,” Fuzzy Sets and Systems, Vol.161, No.23, pp. 3014-3025, 2010.
[18] G. Schwarz, “Estimating the dimension of a model,” The Annals of Statistics, Vol.6, No.2, pp. 461-464, 1978.
[19] M. Lichman, “UCI Machine Learning Repository (http://archive.ics.uci.edu/ml),” Irvine, CA: University of California, School of Information and Computer Science, 2013.
[20] W. M. Rand, “Objective criteria for the evaluation of clustering methods,” J. of the American Statistical Association, Vol.66, No.336, pp. 846-850, 1971.
[21] H. Ichihashi, K. Honda, and N. Tani, “Gasussian mixture PDF approximation and fuzzy c-means clustering with entropy regularization,” Proc. of 4th Asian Fuzzy Systems Symp., Vol.1, pp. 217-221, 2000

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] A. K. Jain, Data clustering: 50 years beyond K-means, Pattern Recognition Letters, Vol. 31, No. 8, pp. 651–666, 2010.

[2] [2] J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Plenum Press, New York, 1981.

[3] [3] S. Miyamoto, H. Ichihashi, and K. Honda, “Algorithms for Fuzzy Clustering,” Springer, Heidelberg, 2008.

[4] [4] R. O. Duda and P. E. Hart, “Pattern Classification and Scene Analysis,” Wiley, New York, 1973.

[5] [5] R. N. Davacutee, and R. Krishnapuram, “Robust clustering methods : A unified view,” IEEE Trans. on Fuzzy Systems, Vol.5, No.2, pp. 270-293, 1997.

[6] [6] Y. Hamasuna and Y. Endo, “Sequential Extraction By Using Two Types of Crisp Possibilistic Clustering,” Proc. of the IEEE Int. Conf. on Systems, Man, and Cybernetics (IEEE SMC 2013), pp. 3505-3510, 2013.

[7] [7] S. Miyamoto, Y. Kuroda, and K. Arai, “Algorithms for Sequential Extraction of Clusters by Possibilistic Method and Comparison with Mountain Clustering,” J. of Advanced Computational Intelligence and Intelligent Informatics (JACIII), Vol.12, No.5, pp. 448-453, 2008.

[8] [8] R. R. Yager and D. P. Filev, “Approximate clustering via the mountain method,” IEEE Trans. on Systems, Man and Cybernetics, Vol.2, No.8, pp. 1279-1284, 1994.

[9] [9] T. Ishioka, “An expansion of X-means for automatically determining the optimal number of clusters,” Proc. of the 4th IASTED Int. Conf. on Computational Intelligence, pp. 91-96, 2005.

[10] [10] D. Pelleg and A. Moore, “X-means: Extending K-means with Efficient Estimation of the Number of Clusters,” Proc. of of the 17th Int. Conf. on Machine Learning (ICML2000), pp. 727-734, 2013.

[11] [11] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.1, No.2, pp. 224-227, 1979.

[12] [12] J. C. Dunn, “Well separated clusters and optimal fuzzy partitions,” J. of Cybernetics, Vol.4, pp. 95-104, 1974.

[13] [13] I. Gath and A. B. Geva, “Unsupervised optimal fuzzy clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.11, No.7, pp. 773-780, 1989.

[14] [14] W. Hashimoto, T. Nakamura, and S. Miyamoto, “Comparison and Evaluation of Different Cluster Validity Measures Including Their Kernelization,” J. of Advanced Computational Intelligence and Intelligent Informatics (JACIII), Vol.13, No.3, pp. 204-209, 2009.

[15] [15] W. Wang and Y. Zhang, “On fuzzy cluster validity indices,” Fuzzy Sets and Systems, Vol.158, No.19, pp. 2095-2117, 2007.

[16] [16] X. L. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.13, No.8, pp. 841-847, 1991.

[17] [17] B. Rezaee, “A cluster validity index for fuzzy clustering,” Fuzzy Sets and Systems, Vol.161, No.23, pp. 3014-3025, 2010.

[18] [18] G. Schwarz, “Estimating the dimension of a model,” The Annals of Statistics, Vol.6, No.2, pp. 461-464, 1978.

[19] [19] M. Lichman, “UCI Machine Learning Repository (http://archive.ics.uci.edu/ml),” Irvine, CA: University of California, School of Information and Computer Science, 2013.

[20] [20] W. M. Rand, “Objective criteria for the evaluation of clustering methods,” J. of the American Statistical Association, Vol.66, No.336, pp. 846-850, 1971.

[21] [21] H. Ichihashi, K. Honda, and N. Tani, “Gasussian mixture PDF approximation and fuzzy c-means clustering with entropy regularization,” Proc. of 4th Asian Fuzzy Systems Symp., Vol.1, pp. 217-221, 2000

Comparison of Cluster Validity Measures Based x-Means

Yukihiro Hamasuna*, Naohiko Kinoshita**, and Yasunori Endo***

Yukihiro Hamasuna^, Naohiko Kinoshita^, and Yasunori Endo^