Abstract
Date sets with missing feature values are prevalent in clustering analysis. Most existing clustering methods for incomplete data rely on imputations of missing feature values. However, accurate imputations are usually hard to obtain especially for small-size or highly corrupted data sets. To address this issue, this paper proposes a robust fuzzy c-means (RFCM) clustering algorithm, which does not require imputations. The proposed RFCM represents the missing feature values by intervals, which can be easily constructed using the K-nearest neighbors method, and adopts a min-max optimization model to reduce the impact of noises on clustering performance. We give an equivalent tractable reformulation of the min-max optimization problem and propose an efficient solution method based on smoothing and gradient projection techniques. Experiments on UCI data sets validate the effectiveness of the proposed RFCM algorithm by comparison with existing clustering methods for incomplete data.
S. Song—This work was supported by the Major Program of the National Natural Science Foundation of China under Grant 41427806, the National Natural Science Foundation of China under Grants 61503211 and 9152002, and the Project of China Ocean Association under Grant DYXM-125-25-02.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Condat, L.: Fast projection onto the simplex and the l-1 ball. Preprint HAL, 1056171 (2014)
Hathaway, R.J., Bezdek, J.C.: Fuzzy c-means clustering of incomplete data. IEEE Trans. Syst. Man Cybern. Part B Cybern. 31(5), 735–744 (2001)
Honda, K., Ichihashi, H.: Linear fuzzy clustering techniques with missing values and their application to local principal component analysis. IEEE Trans. Fuzzy Syst. 12(2), 183–193 (2004)
Lanckriet, G.R.G., Ghaoui, L.E., Bhattacharyya, C., Jordan, M.I.: Minimax probability machine. Adv. Neural Inf. Process. Syst. 1, 801–808 (2002)
Li, D., Hong, G., Zhang, L.: A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Expert Syst. Appl. 37(10), 6942–6947 (2010)
Li, D., Hong, G., Zhang, L.: A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals. Soft. Comput. 17(10), 1787–1796 (2013)
Li, J., Song, S., Zhang, Y., Zhou, Z.: Robust k-median and k-means clustering algorithms for incomplete data. Math. Prob. Eng. 2016, 1–8 (2016)
Shibayama, T.: A PCA-like method for multivariate data with missing values. Japan. J. Educ. Psychol. 40(2), 257–265 (1992)
Song, S., Gong, Y., Zhang, Y., Huang, G., Huang, G.-B.: Dimension reduction by minimum error minimax probability machine. IEEE Trans. Syst. Man Cybern.: Syst. 47(1), 58–69 (2017)
Trafalis, T., Gilbert, R.: Robust support vector machines for classification and computational issues. Optim. Methods Softw. 22(1), 187–198 (2007)
Wang, B.L., Zhang, L.Y., Zhang, L., Bing, Z.H., Xu, X.H.: Missing data imputation by nearest-neighbor trained bp for fuzzy clustering. J. Inf. Comput. Sci. 11(15), 5367–5375 (2014)
Wang, Y., Zhang, Y., Yi, J., Qu, H., Miu, J.: A robust probability classifier based on the modified-distance. Math. Probl. Eng. 2014, 1–13 (2014)
Wang, Y., Zhang, Y., Zhang, F., Yi, J.: Robust quadratic regression and its application to energy-growth consumption problem. Math. Probl. Eng. 2013, 1–10 (2013)
Huan, X., Caramanis, C., Mannor, S.: Robustness and regularization of support vector machines. J. Mach. Learn. Res. 10, 1485–1510 (2009)
Yao, L., Weng, K.-S.: Imputation of incomplete data using adaptive ellipsoids with linear regression. J. Intell. Fuzzy Syst. 29(1), 253–265 (2015)
Zhang, L., Bing, Z., Zhang, L.: A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data. Pattern Anal. Appl. 18(2), 377–384 (2015)
Zhang, Y., Shen, Z.-J.M., Song, S.: Distributionally robust optimization of two-stage lot-sizing problems. Prod. Oper. Manag. 25(12), 2116–2131 (2016)
Zhang, Y., Song, S., Shen, Z.-J.M., Wu, C.: Data-driven robust shortest path problem with distributional uncertainty. IEEE Trans. Intell. Transp. Syst. (2017). doi:10.1109/TITS.2017.2709798
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, J., Song, S., Zhang, Y., Li, K. (2017). A Robust Fuzzy c-Means Clustering Algorithm for Incomplete Data. In: Yue, D., Peng, C., Du, D., Zhang, T., Zheng, M., Han, Q. (eds) Intelligent Computing, Networked Control, and Their Engineering Applications. ICSEE LSMS 2017 2017. Communications in Computer and Information Science, vol 762. Springer, Singapore. https://doi.org/10.1007/978-981-10-6373-2_1
Download citation
DOI: https://doi.org/10.1007/978-981-10-6373-2_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6372-5
Online ISBN: 978-981-10-6373-2
eBook Packages: Computer ScienceComputer Science (R0)