Abstract
Clustering is one of the most widely used efficient approaches in data mining to find potential data structure. However, there are some reasons to cause the missing values in real data sets such as difficulties and limitations of data acquisition and random noises. Most of clustering methods can’t be used to deal with incomplete data sets for clustering analysis directly. For this reason, this paper proposes a three-way decisions clustering algorithm for incomplete data based on attribute significance and miss rate. Three-way decisions with interval sets naturally partition a cluster into positive region, boundary region and negative region, which has the advantage of dealing with soft clustering. First, the data set is divided into four parts such as sufficient data, valuable data, inadequate data and invalid data, according to the domain knowledge about the attribute significance and miss rate. Second, different strategies are devised to handle the four types based on three-way decisions. The experimental results on some data sets show preliminarily the effectiveness of the proposed algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Azam, N.: Formulating Three-way decision making with game-theoretic rough sets. In: Proceedings of CanadainConference on Electrical and Computer Engineering (CCECE 2013), pp. 695–698. IEEE Press (2013)
Dixon, J.K.: Pattern recognition with partly missing data. IEEE Transactions on Systems, Man, and Cybernetics 9, 617–621 (1979)
Himmelspach, L., Hommers, D., Conrad, S.: Cluster tendency assessment for fuzzy clustering of incomplete data. In: Proceedings of the 7th conference of the European Society for Fuzzy Logic and Technology, pp. 290–297. Atlantis Press (2011)
Honda, K., Nonoguchi, R., Notsu, A., Ichihashi, H.: PCA-guided k-Means clustering with incomplete data. In: 2011 IEEE International Conference on Fuzzy Systems (FUZZ), pp. 1710–1714. IEEE Press (2011)
Lai, P.H., O’Sullivan, J.A.: MDL hierarchical clustering with incomplete data. In: Information Theory and Applications Workshop (ITA), pp. 1–5. IEEE Press (2010)
Liang, D.C., Liu, D.: A novel risk decision-making based on decision-theoretic rough sets under hesitant fuzzy information. J. IEEE Transactions on Fuzzy Systems (2014)
Li, D., Gu, H., Zhang, L.Y.: A hybrid genetic algorithm-fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals. J. Soft Computing. 17, 1787–1796 (2013)
Li, D., Zhong, C.Q., Li, J.H.: An attribute weighted fuzzy c-means algorithm for incomplete data sets. In: 2012 International Conference on System Science and Engineering (ICSSE), pp. 449–453. IEEE Press (2012)
UCIrvine Machine Learning Repository: http://archive.ics.uci.edu/ml/
Wu, J., Song, C.H., Kong, J.M., Lee, W.D.: Extended mean field annealing for clustering incomplete data. In: International Symposium on Information Technology Convergence, pp. 8–12. IEEE Press (2007)
Yamamoto, T., Honda, K., Notsu, A., Ichihashi, H.: FCMdd-type linear fuzzy clustering for incomplete non-Euclidean relational data. In: 2011 IEEE International Conference on Fuzzy Systems (FUZZ), pp. 792–798. IEEE Press (2011)
Yao, Y.: An outline of a theory of three-way decisions. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 1–17. Springer, Heidelberg (2012)
Yao, Y.Y.: Three-way decisions with probabilistic rough sets. J. Information Sciences 180, 341–353 (2010)
Yu, H., Liu, Z.G., Wang, G.Y.: An automatic method to determine the number of clusters using decision-theoretic rough set. International Journal of Approximate Reasoning 55, 101–115 (2014)
Yu, H., Wang, Y.: Three-way decisions method for overlapping clustering. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 277–286. Springer, Heidelberg (2012)
Zhou, B., Yao, Y.Y., Luo, J.G.: Cost-sensitive three-way email spam filtering. Journal of Intelligent Information Systems 42, 19–45 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Yu, H., Su, T., Zeng, X. (2014). A Three-Way Decisions Clustering Algorithm for Incomplete Data. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds) Rough Sets and Knowledge Technology. RSKT 2014. Lecture Notes in Computer Science(), vol 8818. Springer, Cham. https://doi.org/10.1007/978-3-319-11740-9_70
Download citation
DOI: https://doi.org/10.1007/978-3-319-11740-9_70
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11739-3
Online ISBN: 978-3-319-11740-9
eBook Packages: Computer ScienceComputer Science (R0)