MP-KMeans: K-Means with Missing Pattern for Data of Missing Not at Random

Zhou, Ruifeng; Yu, Hong

doi:10.1007/978-3-031-21244-4_18

Ruifeng Zhou¹³ &
Hong Yu¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13633))

Included in the following conference series:

International Joint Conference on Rough Sets

674 Accesses

Abstract

K-Means is one of the most popular clustering algorithm. It aims to minimize the sum of pair-wise distance within a cluster. It has been widely used in data analysis, image recognition and many other fields. However, traditional K-Means cannot handle missing values, which greatly limits its application scenarios. Missing values are ubiquitous in the real world due to sensor failure, high cost, and privacy protection. The appearance of missing values leads to useful information lost in the information system, and makes it difficult to perform data mining. Currently, improvements of K-Means for missing values generally based on data completion and partial distance strategy. Above methods achieve satisfied performance with random missing values, but they will fail when data is missing not at random (MNAR). Considering the effect of missing mechanism, this paper proposes an improved method of traditional K-Means for data of missing not at random, which integrating missing pattern in the distance measurement to assist clustering process. The experiment results on public datasets show that the proposed method outperforms data completion-based K-Means and partial distance-based K-Means.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Afridi, M.K., Azam, N., Yao, J., Alanazi, E.: A three-way clustering approach for handling missing data using GTRS. Int. J. Approx. Reason. 98, 11–24 (2018)
Article MathSciNet MATH Google Scholar
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)
Article Google Scholar
Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56(1), 9–33 (2004)
Article MATH Google Scholar
Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2765–2781 (2013)
Article Google Scholar
Fan, J., Chow, T.W.: Sparse subspace clustering for data with missing entries and high-rank matrix completion. Neural Netw. 93, 36–44 (2017)
Article MATH Google Scholar
García-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R., Verleysen, M.: K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72(7–9), 1483–1493 (2009)
Article Google Scholar
Gunnemann, S., Muller, E., Raubach, S., Seidl, T.: Flexible fault tolerant subspace clustering for data with missing values. In: 2011 IEEE 11th International Conference on Data Mining, pp. 231–240. IEEE (2011)
Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
Google Scholar
Hathaway, R.J., Bezdek, J.C.: Fuzzy c-means clustering of incomplete data. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 31(5), 735–744 (2001)
Article Google Scholar
Li, J., Song, S., Zhang, Y., Zhou, Z.: Robust k-median and k-means clustering algorithms for incomplete data. Math. Probl. Eng. 2016 (2016)
Google Scholar
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Article MathSciNet MATH Google Scholar
Santos, M.S., Pereira, R.C., Costa, A.F., Soares, J.P., Santos, J., Abreu, P.H.: Generating synthetic missing data: a review by missing mechanism. IEEE Access 7, 11651–11667 (2019)
Article Google Scholar
Vassilvitskii, S., Arthur, D.: k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2006)
Google Scholar
Wang, H., Wang, S.: Discovering patterns of missing data in survey databases: an application of rough sets. Expert Syst. Appl. 36(3), 6256–6260 (2009)
Article Google Scholar
Wang, S., et al.: K-means clustering with incomplete data. IEEE Access 7, 69162–69171 (2019)
Article Google Scholar
Yao, Y.: Three-way decision: an interpretation of rules in rough set theory. In: Wen, P., Li, Y., Polkowski, L., Yao, Y., Tsumoto, S., Wang, G. (eds.) RSKT 2009. LNCS (LNAI), vol. 5589, pp. 642–649. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02962-2_81
Chapter Google Scholar
Yu, H., Su, T., Zeng, X.: A three-way decisions clustering algorithm for incomplete data. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds.) RSKT 2014. LNCS (LNAI), vol. 8818, pp. 765–776. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11740-9_70
Chapter Google Scholar
Zhang, L., Lu, W., Liu, X., Pedrycz, W., Zhong, C.: Fuzzy C-means clustering of incomplete data based on probabilistic information granules of missing values. Knowl.-Based Syst. 99, 51–70 (2016)
Article Google Scholar

Download references

Acknowledgements

This work was jointly supported by the National Natural Science Foundation of China (62136002, 61876027), and the Natural Science Foundation of Chongqing (cstc2022ycjh-bgzxm0004).

Author information

Authors and Affiliations

Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Ruifeng Zhou & Hong Yu

Authors

Ruifeng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Yu .

Editor information

Editors and Affiliations

University of Regina, Regina, SK, Canada
JingTao Yao
Iwate Prefectural University, Takizawa, Iwate, Japan
Hamido Fujita
Shanghai University, Shanghai, China
Xiaodong Yue
Tongji University, Shanghai, China
Duoqian Miao
University of Kansas, Lawrence, KS, USA
Jerzy Grzymala-Busse
Soochow University, Suzhou, Jiangsu, China
Fanzhang Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, R., Yu, H. (2022). MP-KMeans: K-Means with Missing Pattern for Data of Missing Not at Random. In: Yao, J., Fujita, H., Yue, X., Miao, D., Grzymala-Busse, J., Li, F. (eds) Rough Sets. IJCRS 2022. Lecture Notes in Computer Science(), vol 13633. Springer, Cham. https://doi.org/10.1007/978-3-031-21244-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-21244-4_18
Published: 11 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21243-7
Online ISBN: 978-3-031-21244-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MP-KMeans: K-Means with Missing Pattern for Data of Missing Not at Random