Abstract
Aiming at the privacy-preserving problem in data mining process, this paper proposes an improved K-Means algorithm over encrypted data, called HK-means++ that uses the idea of homomorphic encryption to solve the encrypted data multiplication problems, distance calculation problems and the comparison problems. Then apply these security protocols to the improved clustering algorithm framework. To prevent the leakage of privacy while calculating the distance between the sample points and the center points, it prevents the attacker from inferring the cluster grouping of the user by hiding the cluster center. To some extent, it would reduce the risk of leakage of private data in the cluster mining process. It is well known that the traditional K-Means algorithm is too dependent on the initial value. In this paper, we focus on solving the problem to reduce the number of iterations, and improve the clustering efficiency. The experimental results demonstrate that our proposed, HK-Means algorithm has good clustering performance and the running time is also reduced.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bonawitz, K., Ivanov, V., Kreuter, B., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. ACM, New York (2017)
Neha, B., Gordhan, B.: Privacy-preserving using distributed k-means clustering for arbitrarily partitioned data. Int. J. Eng. Res. Dev. 2(2), 2291–2295 (2014)
Su, D., Cao, J., Li, N.: Differentially private k-means clustering and a hybrid approach to private optimization. ACM Trans. Priv. Secur. 20(4), 1–33 (2017)
Yu, Q., Luo, Y., Chen, C., et al.: Outlier-eliminated k-means clustering algorithm based on differential privacy preservation. Appl. Intell. 45(4), 1179–1191 (2016)
Ren, J., Xiong, J., Yao, Z., et al.: DPLK-means: a novel differential privacy k-means mechanism. In: 2017 IEEE Second International Conference on Data Science in Cyberspace, Shenzhen, pp. 133–139. IEEE (2017)
Raphael, B., Raluca, P., Stephen, T., et al.: Machine learning classification over encrypted data. In: Network and Distributed System Security Symposium. NDSS Symposium, San Diego (2015)
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIFKDD International Conference on Knowledge Discovery & Data mining, pp. 206–215. ACM, New York (2003)
Gentry, G.: Computing arbitrary function of encrypted data. Commun. ACM 53(3), 97 (2010)
Fang, W., Yang, R., Xia, K.: SMC-based privacy protection clustering model. Syst. Eng. Electron. 54(7), 1505–1510 (2012)
Erkin, Z., Veugen, T., Toft, T., et al.: Privacy-Preserving Distributed Clustering. EURASIP J. Inf. Seur. 1, 1–15 (2013)
Yi, X., Zhang, Y.: Equally contributory privacy-preserving k-means clustering over vertically partitioned data. Inf. Syst. 38(1), 97–107 (2013)
Almutairi, N., Coenen, F., Dures, K.: K-means clustering using homomorphic encryption and an updatable distance matrix: secure third party data clustering with limited data owner interaction. In: Bellatreche, L., Chakravarthy, S. (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 274–285. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64283-3_20
Aggarwal, A., Kaur, D., Mittal, D., et al.: Secure data mining in cloud using homomorphic encryption. In: 2014 IEEE International Conference on Cloud Computing in Emerging Markets, Bangalore. IEEE (2014)
Gheid, Z., Challal, Y.: Efficient and privacy-preserving k-means clustering for big data mining. In: Proceedings of the IEEE Trustcom/BigdataSE/ISPA, Tianjin, pp. 791–798. IEEE (2016)
Angela, J., Frederik, A.: Unsupervised machine learning on encrypted data. In: Cid, C., Jacobson, M. (eds.) SAC 2018. LNCS, vol. 11349, pp. 453–478. Springer, Cham (2017). https://doi.org/10.1007/978-3-030-10970-7_21
Acknowledgments
This work is supported, in part, by the National Natural Science Foundation of China under grant No. 61872069, in part, by the Fundamental Research Funds for the Central Universities (N171704005), in part, by the Shenyang Science and Technology Plan Projects (18-013-0-01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, C., Wang, A., Liu, X., Xu, J. (2019). Research on K-Means Clustering Algorithm Over Encrypted Data. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science(), vol 11983. Springer, Cham. https://doi.org/10.1007/978-3-030-37352-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-37352-8_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37351-1
Online ISBN: 978-3-030-37352-8
eBook Packages: Computer ScienceComputer Science (R0)