Skip to main content

Efficient Clustering on Encrypted Data

  • Conference paper
  • First Online:
Applied Cryptography and Network Security (ACNS 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14583))

Included in the following conference series:

  • 204 Accesses

Abstract

Clustering is a significant unsupervised machine learning task widely used for data mining and analysis. Fully homomorphic encryption allows data owners to outsource privacy-preserving computations without interaction. In this paper, we propose a fully privacy-preserving, effective, and efficient clustering scheme based on CKKS, in which we construct two iterative formulas to solve the challenging ciphertext comparison and division problems, respectively. Although our scheme already outperforms existing work, executing it on datasets MNIST and CIFAR-10 still results in unacceptable run time and memory consumption. To further address the above issues, we propose a block privacy-preserving clustering algorithm that splits records into subvectors and clusters these subvectors. Experimental results show that the clustering accuracy of our original algorithm is almost equivalent to the classical k-means algorithm. Compared to a state-of-the-art FHE-based scheme, our original algorithm not only outperforms theirs in accuracy but is also 4 orders of magnitude faster than theirs. In experiments testing our block algorithm, we conclude that the run time and memory consumption of this algorithm are greatly reduced.

This research is partially supported by Zhongguancun Laboratory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lattigo v4, August 2022. https://github.com/tuneinsight/lattigo, ePFL-LDS, Tune Insight SA

  2. Pytorch cifar models, August 2022. https://github.com/chenyaofo/pytorch-cifar-models

  3. Almutairi, N., Coenen, F., Dures, K.: K-means clustering using homomorphic encryption and an updatable distance matrix: secure third party data clustering with limited data owner interaction. In: Bellatreche, L., Chakravarthy, S. (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 274–285. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64283-3_20

    Chapter  Google Scholar 

  4. Ashari, I., Banjarnahor, R., Farida, D., Aisyah, S., Dewi, A., Humaya, N.: Application of data mining with the k-means clustering method and davies bouldin index for grouping imdb movies. J. Appl. Inform. Comput. 6(1), 07–15 (2022). https://doi.org/10.30871/jaic.v6i1.3485. https://jurnal.polibatam.ac.id/index.php/JAIC/article/view/3485

  5. Balcan, M.F., Dick, T., Liang, Y., Mou, W., Zhang, H.: Differentially private clustering in high-dimensional Euclidean spaces. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 322–331. PMLR (06–11 Aug 2017). https://proceedings.mlr.press/v70/balcan17a.html

  6. Cheon, J.H., Han, K., Kim, A., Kim, M., Song, Y.: Bootstrapping for approximate homomorphic encryption. In: Nielsen, J.B., Rijmen, V. (eds.) EUROCRYPT 2018. LNCS, vol. 10820, pp. 360–384. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78381-9_14

    Chapter  Google Scholar 

  7. Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10624, pp. 409–437. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70694-8_15

    Chapter  Google Scholar 

  8. Estévez, P.A., Figueroa, C.J.: Online data visualization using the neural gas network. Neural Netw. 19(6), 923–934 (2006). https://doi.org/10.1016/j.neunet.2006.05.024. advances in Self Organising Maps - WSOM’05

  9. Fränti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018). https://doi.org/10.1007/s10489-018-1238-7

  10. Galántai, A.: The theory of newton’s method. Journal of Computational and Applied Mathematics 124(1), 25–44 (2000). https://doi.org/10.1016/S0377-0427(00)00435-0. https://www.sciencedirect.com/science/article/pii/S0377042700004350, numerical Analysis 2000. Vol. IV: Optimization and Nonlinear Equations

  11. Gheid, Z., Challal, Y.: Efficient and privacy-preserving k-means clustering for big data mining. In: 2016 IEEE Trustcom/BigDataSE/ISPA, pp. 791–798 (2016). https://doi.org/10.1109/TrustCom.2016.0140

  12. Huang, Z., Liu, J.: Optimal differentially private algorithms for k-means clustering. In: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2018, pp. 395–408. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3196959.3196977, https://doi.org/10.1145/3196959.3196977

  13. Jäschke, A., Armknecht, F.: Unsupervised machine learning on encrypted data. In: Cid, C., Jacobson, M.J., Jr. (eds.) Selected Areas in Cryptography - SAC 2018, pp. 453–478. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-10970-7_21

    Chapter  Google Scholar 

  14. Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. rep. (2009)

    Google Scholar 

  15. LeCun, Y., Cortes, C., Burges, C.: Mnist handwritten digit database. ATT Labs [Online]. (2010). http://yann.lecun.com/exdb/mnist2

  16. Li, B., Micciancio, D.: On the security of homomorphic encryption on approximate numbers. Springer-Verlag (2021). https://doi.org/10.1007/978-3-030-77870-5_23

    Article  Google Scholar 

  17. Li, F., Qian, Y., Wang, J., Dang, C., Jing, L.: Clustering ensemble based on sample’s stability. Artif. Intell. 273, 37–55 (2019). https://doi.org/10.1016/j.artint.2018.12.007

    Article  MathSciNet  Google Scholar 

  18. Liu, D., Bertino, E., Yi, X.: Privacy of outsourced k-means clustering. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security, pp. 123–134. ASIA CCS ’14, Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2590296.2590332. https://doi.org/10.1145/2590296.2590332

  19. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982). https://doi.org/10.1109/TIT.1982.1056489

    Article  MathSciNet  Google Scholar 

  20. Lopez, C., Tucker, S., Salameh, T., Tucker, C.: An unsupervised machine learning method for discovering patient clusters based on genetic signatures. J. Biomed. Inform. 85, 30–39 (2018). https://doi.org/10.1016/j.jbi.2018.07.004

    Article  Google Scholar 

  21. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)

    Google Scholar 

  22. Matsuoka, K., Banno, R., Matsumoto, N., Sato, T., Bian, S.: Virtual secure platform: A \(\{\)Five-Stage\(\}\) pipeline processor over \(\{\)TFHE\(\}\). In: 30th USENIX security symposium (USENIX Security 21), pp. 4007–4024 (2021)

    Google Scholar 

  23. Minh, H.L., Sang-To, T., Abdel Wahab, M., Cuong-Le, T.: A new metaheuristic optimization based on k-means clustering algorithm and its application to structural damage identification. Knowl.-Based Syst. 251, 109189 (2022). https://doi.org/10.1016/j.knosys.2022.109189. https://www.sciencedirect.com/science/article/pii/S0950705122005913

  24. Mohassel, P., Rosulek, M., Trieu, N.: Practical privacy-preserving k-means clustering. Cryptology ePrint Archive, Paper 2019/1158 (2019), https://eprint.iacr.org/2019/1158. https://eprint.iacr.org/2019/1158

  25. More, J.J., Sorensen, D.C.: Newton’s method (2 1982). https://doi.org/10.2172/5326201. https://www.osti.gov/biblio/5326201

  26. Ni, L., Li, C., Wang, X., Jiang, H., Yu, J.: Dp-mcdbscan: differential privacy preserving multi-core dbscan clustering for network user data. IEEE Access 6, 21053–21063 (2018). https://doi.org/10.1109/ACCESS.2018.2824798

    Article  Google Scholar 

  27. Rao, F.Y., Samanthula, B.K., Bertino, E., Yi, X., Liu, D.: Privacy-preserving and outsourced multi-user k-means clustering. In: 2015 IEEE Conference on Collaboration and Internet Computing (CIC), pp. 80–89 (2015). https://doi.org/10.1109/CIC.2015.20

  28. Rodriguez, M.Z., Comin, C.H., Casanova, D., Bruno, O.M., Amancio, D.R., Costa, L.d.F., Rodrigues, F.A.: Clustering algorithms: A comparative approach. PLOS ONE 14(1), 1–34 (01 2019). https://doi.org/10.1371/journal.pone.0210236. https://doi.org/10.1371/journal.pone.0210236

  29. Rong, H., Wang, H.M., Liu, J., Xian, M.: Privacy-preserving k-nearest neighbor computation in multiple cloud environments. IEEE Access 4, 9589–9603 (2016). https://doi.org/10.1109/ACCESS.2016.2633544

    Article  Google Scholar 

  30. Samanthula, B.K., Elmehdwi, Y., Jiang, W.: k-nearest neighbor classification over semantically secure encrypted relational data. IEEE Trans. Knowl. Data Eng. 27(5), 1261–1273 (2015). https://doi.org/10.1109/TKDE.2014.2364027

    Article  Google Scholar 

  31. Stemmer, U.: Locally private k-means clustering. In: Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, pp. 548–559. Society for Industrial and Applied Mathematics, USA (2020)

    Google Scholar 

  32. Su, D., Cao, J., Li, N., Bertino, E., Jin, H.: Differentially private k-means clustering. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, CODASPY 2016, pp. 26–37. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2857705.2857708. https://doi.org/10.1145/2857705.2857708

  33. Ultsch, A.: Clustering wih som: U* c. Proc. Workshop on Self-Organizing Maps (01 2005)

    Google Scholar 

  34. Ultsch, A.: Emergence in self organizing feature maps. In: The 6th International Workshop on Self-Organizing Maps (WSOM 2007) (2007). https://doi.org/10.2390/biecoll-wsom2007-114. https://doi.org/10.2390/biecoll-wsom2007-114

  35. Wei, W., ming Tang, C., Chen, Y.: Efficient privacy-preserving k-means clustering from secret-sharing-based secure three-party computation. Entropy 24 (2022)

    Google Scholar 

  36. Wu, W., Liu, J., Rong, H., Wang, H., Xian, M.: Efficient k-nearest neighbor classification over semantically secure hybrid encrypted cloud database. IEEE Access 6, 41771–41784 (2018). https://doi.org/10.1109/ACCESS.2018.2859758

    Article  Google Scholar 

  37. Wu, W., Liu, J., Wang, H., Hao, J., Xian, M.: Secure and efficient outsourced k-means clustering using fully homomorphic encryption with ciphertext packing technique. IEEE Trans. Knowl. Data Eng. 33(10), 3424–3437 (2021). https://doi.org/10.1109/TKDE.2020.2969633

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoping Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, M., Wang, L., Zhang, X., Liu, Z., Wang, Y., Bao, H. (2024). Efficient Clustering on Encrypted Data. In: Pöpper, C., Batina, L. (eds) Applied Cryptography and Network Security. ACNS 2024. Lecture Notes in Computer Science, vol 14583. Springer, Cham. https://doi.org/10.1007/978-3-031-54770-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-54770-6_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-54769-0

  • Online ISBN: 978-3-031-54770-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics