Abstract
Clustering is a significant unsupervised machine learning task widely used for data mining and analysis. Fully homomorphic encryption allows data owners to outsource privacy-preserving computations without interaction. In this paper, we propose a fully privacy-preserving, effective, and efficient clustering scheme based on CKKS, in which we construct two iterative formulas to solve the challenging ciphertext comparison and division problems, respectively. Although our scheme already outperforms existing work, executing it on datasets MNIST and CIFAR-10 still results in unacceptable run time and memory consumption. To further address the above issues, we propose a block privacy-preserving clustering algorithm that splits records into subvectors and clusters these subvectors. Experimental results show that the clustering accuracy of our original algorithm is almost equivalent to the classical k-means algorithm. Compared to a state-of-the-art FHE-based scheme, our original algorithm not only outperforms theirs in accuracy but is also 4 orders of magnitude faster than theirs. In experiments testing our block algorithm, we conclude that the run time and memory consumption of this algorithm are greatly reduced.
This research is partially supported by Zhongguancun Laboratory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lattigo v4, August 2022. https://github.com/tuneinsight/lattigo, ePFL-LDS, Tune Insight SA
Pytorch cifar models, August 2022. https://github.com/chenyaofo/pytorch-cifar-models
Almutairi, N., Coenen, F., Dures, K.: K-means clustering using homomorphic encryption and an updatable distance matrix: secure third party data clustering with limited data owner interaction. In: Bellatreche, L., Chakravarthy, S. (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 274–285. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64283-3_20
Ashari, I., Banjarnahor, R., Farida, D., Aisyah, S., Dewi, A., Humaya, N.: Application of data mining with the k-means clustering method and davies bouldin index for grouping imdb movies. J. Appl. Inform. Comput. 6(1), 07–15 (2022). https://doi.org/10.30871/jaic.v6i1.3485. https://jurnal.polibatam.ac.id/index.php/JAIC/article/view/3485
Balcan, M.F., Dick, T., Liang, Y., Mou, W., Zhang, H.: Differentially private clustering in high-dimensional Euclidean spaces. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 322–331. PMLR (06–11 Aug 2017). https://proceedings.mlr.press/v70/balcan17a.html
Cheon, J.H., Han, K., Kim, A., Kim, M., Song, Y.: Bootstrapping for approximate homomorphic encryption. In: Nielsen, J.B., Rijmen, V. (eds.) EUROCRYPT 2018. LNCS, vol. 10820, pp. 360–384. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78381-9_14
Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10624, pp. 409–437. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70694-8_15
Estévez, P.A., Figueroa, C.J.: Online data visualization using the neural gas network. Neural Netw. 19(6), 923–934 (2006). https://doi.org/10.1016/j.neunet.2006.05.024. advances in Self Organising Maps - WSOM’05
Fränti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018). https://doi.org/10.1007/s10489-018-1238-7
Galántai, A.: The theory of newton’s method. Journal of Computational and Applied Mathematics 124(1), 25–44 (2000). https://doi.org/10.1016/S0377-0427(00)00435-0. https://www.sciencedirect.com/science/article/pii/S0377042700004350, numerical Analysis 2000. Vol. IV: Optimization and Nonlinear Equations
Gheid, Z., Challal, Y.: Efficient and privacy-preserving k-means clustering for big data mining. In: 2016 IEEE Trustcom/BigDataSE/ISPA, pp. 791–798 (2016). https://doi.org/10.1109/TrustCom.2016.0140
Huang, Z., Liu, J.: Optimal differentially private algorithms for k-means clustering. In: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2018, pp. 395–408. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3196959.3196977, https://doi.org/10.1145/3196959.3196977
Jäschke, A., Armknecht, F.: Unsupervised machine learning on encrypted data. In: Cid, C., Jacobson, M.J., Jr. (eds.) Selected Areas in Cryptography - SAC 2018, pp. 453–478. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-10970-7_21
Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. rep. (2009)
LeCun, Y., Cortes, C., Burges, C.: Mnist handwritten digit database. ATT Labs [Online]. (2010). http://yann.lecun.com/exdb/mnist2
Li, B., Micciancio, D.: On the security of homomorphic encryption on approximate numbers. Springer-Verlag (2021). https://doi.org/10.1007/978-3-030-77870-5_23
Li, F., Qian, Y., Wang, J., Dang, C., Jing, L.: Clustering ensemble based on sample’s stability. Artif. Intell. 273, 37–55 (2019). https://doi.org/10.1016/j.artint.2018.12.007
Liu, D., Bertino, E., Yi, X.: Privacy of outsourced k-means clustering. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security, pp. 123–134. ASIA CCS ’14, Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2590296.2590332. https://doi.org/10.1145/2590296.2590332
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982). https://doi.org/10.1109/TIT.1982.1056489
Lopez, C., Tucker, S., Salameh, T., Tucker, C.: An unsupervised machine learning method for discovering patient clusters based on genetic signatures. J. Biomed. Inform. 85, 30–39 (2018). https://doi.org/10.1016/j.jbi.2018.07.004
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Matsuoka, K., Banno, R., Matsumoto, N., Sato, T., Bian, S.: Virtual secure platform: A \(\{\)Five-Stage\(\}\) pipeline processor over \(\{\)TFHE\(\}\). In: 30th USENIX security symposium (USENIX Security 21), pp. 4007–4024 (2021)
Minh, H.L., Sang-To, T., Abdel Wahab, M., Cuong-Le, T.: A new metaheuristic optimization based on k-means clustering algorithm and its application to structural damage identification. Knowl.-Based Syst. 251, 109189 (2022). https://doi.org/10.1016/j.knosys.2022.109189. https://www.sciencedirect.com/science/article/pii/S0950705122005913
Mohassel, P., Rosulek, M., Trieu, N.: Practical privacy-preserving k-means clustering. Cryptology ePrint Archive, Paper 2019/1158 (2019), https://eprint.iacr.org/2019/1158. https://eprint.iacr.org/2019/1158
More, J.J., Sorensen, D.C.: Newton’s method (2 1982). https://doi.org/10.2172/5326201. https://www.osti.gov/biblio/5326201
Ni, L., Li, C., Wang, X., Jiang, H., Yu, J.: Dp-mcdbscan: differential privacy preserving multi-core dbscan clustering for network user data. IEEE Access 6, 21053–21063 (2018). https://doi.org/10.1109/ACCESS.2018.2824798
Rao, F.Y., Samanthula, B.K., Bertino, E., Yi, X., Liu, D.: Privacy-preserving and outsourced multi-user k-means clustering. In: 2015 IEEE Conference on Collaboration and Internet Computing (CIC), pp. 80–89 (2015). https://doi.org/10.1109/CIC.2015.20
Rodriguez, M.Z., Comin, C.H., Casanova, D., Bruno, O.M., Amancio, D.R., Costa, L.d.F., Rodrigues, F.A.: Clustering algorithms: A comparative approach. PLOS ONE 14(1), 1–34 (01 2019). https://doi.org/10.1371/journal.pone.0210236. https://doi.org/10.1371/journal.pone.0210236
Rong, H., Wang, H.M., Liu, J., Xian, M.: Privacy-preserving k-nearest neighbor computation in multiple cloud environments. IEEE Access 4, 9589–9603 (2016). https://doi.org/10.1109/ACCESS.2016.2633544
Samanthula, B.K., Elmehdwi, Y., Jiang, W.: k-nearest neighbor classification over semantically secure encrypted relational data. IEEE Trans. Knowl. Data Eng. 27(5), 1261–1273 (2015). https://doi.org/10.1109/TKDE.2014.2364027
Stemmer, U.: Locally private k-means clustering. In: Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, pp. 548–559. Society for Industrial and Applied Mathematics, USA (2020)
Su, D., Cao, J., Li, N., Bertino, E., Jin, H.: Differentially private k-means clustering. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, CODASPY 2016, pp. 26–37. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2857705.2857708. https://doi.org/10.1145/2857705.2857708
Ultsch, A.: Clustering wih som: U* c. Proc. Workshop on Self-Organizing Maps (01 2005)
Ultsch, A.: Emergence in self organizing feature maps. In: The 6th International Workshop on Self-Organizing Maps (WSOM 2007) (2007). https://doi.org/10.2390/biecoll-wsom2007-114. https://doi.org/10.2390/biecoll-wsom2007-114
Wei, W., ming Tang, C., Chen, Y.: Efficient privacy-preserving k-means clustering from secret-sharing-based secure three-party computation. Entropy 24 (2022)
Wu, W., Liu, J., Rong, H., Wang, H., Xian, M.: Efficient k-nearest neighbor classification over semantically secure hybrid encrypted cloud database. IEEE Access 6, 41771–41784 (2018). https://doi.org/10.1109/ACCESS.2018.2859758
Wu, W., Liu, J., Wang, H., Hao, J., Xian, M.: Secure and efficient outsourced k-means clustering using fully homomorphic encryption with ciphertext packing technique. IEEE Trans. Knowl. Data Eng. 33(10), 3424–3437 (2021). https://doi.org/10.1109/TKDE.2020.2969633
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, M., Wang, L., Zhang, X., Liu, Z., Wang, Y., Bao, H. (2024). Efficient Clustering on Encrypted Data. In: Pöpper, C., Batina, L. (eds) Applied Cryptography and Network Security. ACNS 2024. Lecture Notes in Computer Science, vol 14583. Springer, Cham. https://doi.org/10.1007/978-3-031-54770-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-54770-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54769-0
Online ISBN: 978-3-031-54770-6
eBook Packages: Computer ScienceComputer Science (R0)