Abstract
Partially missing data sets are a prevailing problem in clustering analysis. We propose a hybrid algorithm combining fuzzy clustering with particle swarm optimization (PSO) for incomplete data clustering, and missing attributes are represented as intervals. Furthermore, we develop a neighbor interval reconstruction (NIR) method based on pre-classification results that estimates the nearest-neighbor interval of missing attribute using the nearest-neighbor rule, which avoids endpoints of intervals determined by different species information, thereby improving the accuracy of missing attribute intervals and enhancing the robustness of missing attribute imputation. Then, the PSO and fuzzy c-means hybrid algorithm are used for clustering the interval-valued data set, and the global optimization ability of the PSO can improve the accuracy of clustering results compared with gradient-based optimization methods. The experimental results for several UCI data sets show the superiority of the proposed NIR hybrid algorithm.
References
Chen M, Miao DQ (2011) Interval set clustering. Expert Syst Appl 38(4):2923–2932
Wang J, Chung FL, Wang ST, Deng ZH (2013) Double indices-induced FCM clustering and its integration with fuzzy subspace clustering. Pattern Anal Appl 6:1433–7541
Chang CT, Lai JZ, Jeng MD (2011) A fuzzy K-means clustering algorithm using cluster center displacement. J Inf Sci Eng 27(3):995–1009
Taherdangkoo M, Bagheri MH (2013) A powerful hybrid clustering method based on modified stem cells and Fuzzy C-means algorithms. Eng Appl Artif Intell 26(5–6):1493–1502
Abas AR (2010) Using general regression with local tuning for learning mixture models from incomplete data sets. Egypt Inform J 11(2):49–57
Abas AR (2012) Unsupervised learning of mixture models based on swarm intelligence and neural networks with optimal completion using incomplete data. Egypt Inform J 13(2):103–109
Lin HC, Su CT (2013) A selective Bayes classifier with meta-heuristics for incomplete data. Neurocomputing 15(106):95–102
Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B Cybern 31(5):735–744
Dixon JK (1979) Pattern recognition with partly missing data. IEEE Trans Syst Man Cybern 9(10):617–621
Di Nuovo AG (2011) Missing data analysis with fuzzy C-means: a study of its application in a psychological scenario. Expert Syst Appl 38(6):6793–6797
Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35
Simiński K (2013) Clustering with missing values. Fundam Inform 123(3):331–350
Nowicki RK (2010) On classification with missing data using rough-neuro-fuzzy systems. Int J Appl Math Comput Sci 20(1):55–67
Dopazo E, Ruiz-Tagle M (2011) A parametric GP model dealing with incomplete information for group decision-making. Appl Math Comput 218(2):514–519
Pei Z (2012) Rational decision making models with incomplete weight information for production line assessment. Inf Sci 222(10):696–716
Himmelspach L, Conrad S (2010) Fuzzy clustering of incomplete data based on cluster dispersion. Comput Intell Knowl Based Syst Des 6178:59–68
Zhang SC, Jin Z, Zhu XF (2011) Missing data imputation by utilizing information within incomplete instances. J Syst Softw 84(3):452–459
Subasi MM, Subasi E, Anthony M, Hammer PL (2011) A new imputation method for incomplete binary data. Discrete Appl Math 159(10):1040–1047
Hathaway RJ, Bezdek JC (2002) Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm. Pattern Recogn Lett 23(1):151–160
Sánchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10(3):189–201
Franco A, Maltoni D, Nanni L (2010) Data pre-processing through reward–punishment editing. Pattern Anal Appl 13(4):367–381
Doquire G, Verleysen M (2012) Feature selection with missing data using mutual information estimators. Neurocomputing 90:3–11
Van Hulse J, Khoshgoftaar TM (2011) Incomplete-case nearest neighbor imputation in software measurement data. In: Proceedings of Information Sciences, pp 1–15
Li D, Gu H, Zhang L (2010) A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Expert Syst Appl 37(10):6942–6947
Izakian H, Abraham A (2011) Fuzzy C-means and fuzzy swarm for fuzzy clustering problem. Expert Syst Appl 38(3):1835–1838
Benaichouche AN, Oulhadj H, Siarry P (2013) Improved spatial fuzzy c-means clustering for image segmentation using PSO initialization, Mahalanobis distance and post-segmentation correction. Digit Signal Process 23(5):1390–1400
Yu SW, Wei YM, Fan JL, Zhang X, Wang K (2012) Exploring the regional characteristics of inter-provincial CO2 emissions in China: an improved fuzzy clustering analysis based on particle swarm optimization. Appl Energy 92:552–562
Omran MG, Salman A, Engelbrecht AP (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344
Mohandes MA (2012) Modeling global solar radiation using particle swarm optimization (PSO). Sol Energy 86(11):3137–3145
Farahmand H, Rashidinejad M, Mousavi A, Gharaveisi AA, Irving MR, Taylor GA (2012) Hybrid mutation particle swarm optimization method for available transfer capability enhancement. Int J Electr Power Energy Syst 42(1):240–249
Zhang L, Zhao JQ, Zhang XN, Zhang SL (2013) Study of a new improved PSO-BP neural network algorithm. J Harbin Inst Technol 20(5):99–105
Acknowledgments
This work is supported by the National Nature Science Foundation of China (No. 61174115, No. 51104044).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, L., Bing, Z. & Zhang, L. A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data. Pattern Anal Applic 18, 377–384 (2015). https://doi.org/10.1007/s10044-014-0376-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-014-0376-8