Skip to main content

Efficient Privacy Preserving Distributed K-Means for Non-IID Data

  • Conference paper
  • First Online:
Book cover Advances in Intelligent Data Analysis XIX (IDA 2021)

Abstract

Privacy is becoming a crucial requirement in many machine learning systems. In this paper we introduce an efficient and secure distributed K-Means algorithm, that is robust to non-IID data. The base idea of our proposal consists in each client computing the K-Means algorithm locally, with a variable number of clusters. The server will use the resultant centroids to apply the K-Means algorithm again, discovering the global centroids. To maintain the client’s privacy, homomorphic encryption and secure aggregation is used in the process of learning the global centroids. This algorithm is efficient and reduces transmission costs, since only the local centroids are used to find the global centroids. In our experimental evaluation, we demonstrate that our strategy achieves a similar performance to the centralized version even in cases where the data follows an extreme non-IID form.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/deric/clustering-benchmark.

  2. 2.

    We assume a constant time complexity for multiplication between the encrypted centroids and the plaintext global centroids, according to [20].

References

  1. Bonawitz, K., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191 (2017)

    Google Scholar 

  2. Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)

    Article  Google Scholar 

  3. Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. In: 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 88–93. IEEE (2013)

    Google Scholar 

  4. Farrand, T., Mireshghallah, F., Singh, S., Trask, A.: Neither private nor fair: impact of data imbalance on utility and fairness in differential privacy. In: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, pp. 15–19 (2020)

    Google Scholar 

  5. Graepel, T., Lauter, K., Naehrig, M.: ML confidential: machine learning on encrypted data. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 1–21. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37682-5_1

    Chapter  Google Scholar 

  6. Hu, X., et al.: Privacy-preserving K-means clustering upon negative databases. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11304, pp. 191–204. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04212-7_17

    Chapter  Google Scholar 

  7. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  Google Scholar 

  8. Jahangiri, A., Rakha, H.A.: Applying machine learning techniques to transportation mode recognition using mobile phone sensor data. IEEE Trans. Intell. Transp. Syst. 16(5), 2406–2417 (2015)

    Article  Google Scholar 

  9. Januzaj, E., Kriegel, H.P., Pfeifle, M.: Towards effective and efficient distributed clustering. In: Workshop on Clustering Large Data Sets (ICDM2003), Vol. 60 (2003)

    Google Scholar 

  10. Jiang, Z.L., et al.: Efficient two-party privacy-preserving collaborative k-means clustering protocol supporting both storage and computation outsourcing. Inf. Sci. 518, 168–180 (2020)

    Article  MathSciNet  Google Scholar 

  11. Liu, B., et al.: Follow my recommendations: a personalized privacy assistant for mobile app permissions. In: Twelfth Symposium on Usable Privacy and Security (SOUPS 2016), pp. 27–41 (2016)

    Google Scholar 

  12. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  13. Lu, Z., Shen, H.: A convergent differentially private k-means clustering algorithm. In: Yang, Q., Zhou, Z.-H., Gong, Z., Zhang, M.-L., Huang, S.-J. (eds.) PAKDD 2019. LNCS (LNAI), vol. 11439, pp. 612–624. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16148-4_47

    Chapter  Google Scholar 

  14. McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 54, pp. 1273–1282. PMLR (2017)

    Google Scholar 

  15. Navidi, W., Murphy Jr., W.S., Hereman, W.: Statistical methods in surveying by trilateration. Comput. Stat. Data Anal. 27(2), 209–227 (1998)

    Article  Google Scholar 

  16. Palacio-Niño, J., Berzal, F.: Evaluation metrics for unsupervised learning algorithms. arXiv preprint arXiv:1905.05667 (2019)

  17. Sarker, I.H., Hoque, M.M., Uddin, M.K., Alsanoosy, T.: Mobile data science and intelligent apps: concepts, AI-based modeling and research directions. Mob. Netw. Appl. 1–19 (2020). https://doi.org/10.1007/s11036-020-01650-z

  18. Schellekens, V., Chatalic, A., Houssiau, F., De Montjoye, Y.A., Jacques, L., Gribonval, R.: Differentially private compressive k-means. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7933–7937. IEEE (2019)

    Google Scholar 

  19. Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web. p. 1177–1178. WWW 2010, Association for Computing Machinery (2010)

    Google Scholar 

  20. Microsoft SEAL (release 3.5), Microsoft Research, Redmond, WA (2020)

    Google Scholar 

  21. Soliman, A., Girdzijauskas, S., Bouguelia, M.-R., Pashami, S., Nowaczyk, S.: Decentralized and adaptive K-means clustering for non-IID data using hyperLogLog counters. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12084, pp. 343–355. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47426-3_27

    Chapter  Google Scholar 

  22. Su, D., Cao, J., Li, N., Bertino, E., Jin, H.: Differentially private k-means clustering. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 26–37. ACM (2016)

    Google Scholar 

  23. Thiagarajan, A., et al.: Vtrack: accurate, energy-aware road traffic delay estimation using mobile phones. In: Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems, SenSys 2009, pp. 85–98. Association for Computing Machinery (2009)

    Google Scholar 

  24. Triebe, O.J., Rajagopal, R.: Federated K-Means: clustering algorithm and proof of concept (2020)

    Google Scholar 

  25. Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. KDD 2003 (2003)

    Google Scholar 

  26. Xing, K., Hu, C., Yu, J., Cheng, X., Zhang, F.: Mutual privacy preserving \( k \)-means clustering in social participatory sensing. IEEE Trans. Ind. Inform. 13(4), 2066–2076 (2017)

    Article  Google Scholar 

  27. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

  28. Yin, H., Zhang, J., Xiong, Y., Huang, X., Deng, T.: PPK-means: achieving privacy-preserving clustering over encrypted multi-dimensional cloud data. Electronics 7(11), 310 (2018)

    Article  Google Scholar 

  29. Yuan, C., Yang, H.: Research on k-value selection method of k-means clustering algorithm. J. Multi. Sci. J. 2(2), 226–235 (2019)

    Google Scholar 

  30. Yuan, J., Tian, Y.: Practical privacy-preserving mapreduce based k-means clustering over large-scale dataset. IEEE Trans. Cloud Comput. 7(2), 568–579 (2019)

    Article  Google Scholar 

  31. Zhang, W., Li, C., Peng, G., Chen, Y., Zhang, Z.: A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mech. Syst. Signal Process. 100, 439–453 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

The work presented in this paper was carried out in the scope of project COP-MODE, that has received funding from the European Union’s Horizon 2020 research and innovation programme under the NGI_TRUST grant agreement no 825618, and the project AIDA: Adaptive, Intelligent and Distributed Assurance Platform (POCI-01-0247-FEDER-045907), co-financed by the European Regional Development Fund through the COMPETE2020 program and by the Portuguese Foundation for Science and Technology (FCT) under the CMU Portugal Program. Ricardo Mendes wishes to acknowledge the Portuguese funding institution FCT - Foundation for Science and Technology for supporting his research under the Ph.D. grant SFRH/BD/128599/2017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to André Brandão .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Brandão, A., Mendes, R., Vilela, J.P. (2021). Efficient Privacy Preserving Distributed K-Means for Non-IID Data. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds) Advances in Intelligent Data Analysis XIX. IDA 2021. Lecture Notes in Computer Science(), vol 12695. Springer, Cham. https://doi.org/10.1007/978-3-030-74251-5_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74251-5_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74250-8

  • Online ISBN: 978-3-030-74251-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics