Abstract
k-means clustering, which partitions data records into different clusters such that the records in the same cluster are close to each other, has many important applications such as image segmentation and genes detection. While the k-means clustering has been well-studied by a significant amount of works, most of the existing schemes are not designed for peer-to-peer (P2P) networks. P2P networks impose several efficiency and security challenges for performing clustering over distributed data. In this paper, we propose a novel privacy-preserving k-means clustering scheme over distributed data in P2P networks, which achieves local synchronization and privacy protection. Specifically, we design a secure aggregation protocol and a secure division protocol based on homomorphic encryption to securely compute clusters without revealing the privacy of individual peer. Moreover, we propose a novel massage encoding method to improve the performance of our aggregation protocol. We formally prove that the proposed scheme is secure under the semi-honest model and demonstrate the performance of our proposed scheme.
Similar content being viewed by others
References
Ang HH, Gopalkrishnan V, Hoi SC, Ng WK (2008) Cascade rsvm in peer-to-peer networks. In: Joint European conference on machine learning and knowledge discovery in databases, pp 55–70. Springer
Bandyopadhyay S, Giannella C, Maulik U, Kargupta H, Liu K, Datta S (2006) Clustering distributed data streams in peer-to-peer environments. Inform Sci 176(14):1952–1985
Bhuyan HK, Kamila NK (2015) Privacy preserving sub-feature selection in distributed data mining. Appl Soft Comput 36:552–569
Chien Y (1974) Pattern classification and scene analysis. IEEE Trans Autom Control 19(4):462–463
Das K, Bhaduri K, Kargupta H (2010) A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks. Knowl Inf Syst 24(3):341–367
Datta S, Bhaduri K, Giannella C, Wolff R, Kargupta H (2006) Distributed data mining in peer-to-peer networks. IEEE Internet Comput 10(4):18–26
Datta S, Giannella C, Kargupta H (2008) Approximate distributed k-means clustering over a peer-to-peer network. IEEE Trans Knowl Data Eng 21(10):1372–1388
Doganay MC, Pedersen TB, Saygin Y, Savaṡ E, Levi A (2008) Distributed privacy preserving k-means clustering with additive secret sharing. In: Proceedings of the 2008 international workshop on privacy and anonymity in information society, pp 3–11. ACM
Gligorijević V, Pržulj N (2015) Methods for biological data integration: perspectives and challenges. J R Soc Interface 12(112):20150,571
Goldreich O (2004) Foundations of cryptography: Volume II, Basic Applications. Cambridge University Press, Cambridge
Hao M, Li H, Luo X, Xu G, Yang H, Liu S (2019) Efficient and privacy-enhanced federated learning for industrial artificial intelligence. IEEE Transactions on Industrial Informatics pp 1–1. https://doi.org/10.1109/TII.2019.2945367
Huang Y, Evans D, Katz J, Malka L (2011) Faster secure two-party computation using garbled circuits. In: USENIX security symposium, vol 201, pp 331–335
Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining, pp 593–599. ACM
Jha S, Kruger L, McDaniel P (2005) Privacy preserving clustering. In: European symposium on research in computer security, pp 397–417. Springer
Jia Q, Guo L, Jin Z, Fang Y (2018) Preserving model privacy for machine learning in distributed systems. IEEE Trans Parallel and Distrib Syst 29(8):1808–1822
Jiang W, Li H, Xu G, Wen M, Dong G, Lin X (2019) Ptas: Privacy-preserving thin-client authentication scheme in blockchain-based pki. Futur Gener Comput Syst 96:185– 195
Khan U, Schmidt-Thieme L, Nanopoulos A (2017) Collaborative svm classification in scale-free peer-to-peer networks. Expert Syst Appl 69:74–86
Koskela T, Kassinen O, Harjula E, Ylianttila M (2013) P2p group management systems: A conceptual analysis. ACM Comput Surv (CSUR) 45(2):20
Levitin A (2012) Introduction to the design & analysis of algorithms. Pearson Education
Li H, Liu D, Dai Y, Luan TH, Yu S (2018) Personalized search over encrypted data with efficient and secure updates in mobile clouds. IEEE Trans Emerg Topics Comput 6(1):97–109
Li H, Yang Y, Dai Y, Yu S, Xiang Y (2017) Achieving secure and efficient dynamic searchable symmetric encryption over medical cloud data. IEEE Transactions on Cloud Computing pp 1–1. https://doi.org/10.1109/TCC.2017.2769645
Li X, Zhu Y, Wang J (2019) Highly efficient privacy preserving location-based services with enhanced one-round blind filter. IEEE Transactions on Emerging Topics in Computing. https://doi.org/10.1109/TETC.2019.2926385
Li X, Zhu Y, Wang J, Liu Z, Liu Y, Zhang M (2018) On the soundness and security of privacy-preserving svm for outsourcing data classification. IEEE Trans Depend Secure Comput 15(5):906–912
Liu Y, Zhao Q (2019) E-voting scheme using secret sharing and k-anonymity. World Wide Web 22 (4):1657–1667
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Luo P, Xiong H, Lü K, Shi Z (2007) Distributed classification in peer-to-peer networks. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 968–976. ACM
Mashayekhi H, Habibi J, Khalafbeigi T, Voulgaris S, Van Steen M (2015) Gdcluster: A general decentralized clustering algorithm. IEEE Trans Knowl Data Eng 27(7):1892–1905
Mohassel P, Zhang Y (2017) Secureml: A system for scalable privacy-preserving machine learning. In: 2017 IEEE symposium on security and privacy (SP), pp 19–38. IEEE
Muller WT, Eisenhardt M, Henrich A (2003) Efficient content-based p2p image retrieval using peer content descriptions. In: Internet Imaging V, vol 5304, pp. 57–68. International Society for Optics and Photonics
Ormándi R, Hegedu̇s I, Jelasity M (2013) Gossip learning with linear models on fully distributed data. Concurr Comput Pract Exp 25(4):556–571
Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: International conference on the theory and applications of cryptographic techniques, pp. 223–238. Springer
Papapetrou O, Siberski W, Siersdorfer S (2015) Efficient model sharing for scalable collaborative classification. Peer-to-Peer Netw Appl 8(3):384–398
Ren H, Li H, Dai Y, Yang K, Lin X (2018) Querying in internet of things with privacy preserving: Challenges, solutions and opportunities. IEEE Netw 32(6):144–151
Shokri R, Shmatikov V (2015) Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp 1310–1321. ACM
Song J, Liu Y, Shao J, Tang C (2019) A dynamic membership data aggregation (dmda) protocol for smart grid. IEEE Systems Journal. https://doi.org/10.1109/JSYST.2019.2912415
Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 206–215. ACM
Vaidya J, Clifton C (2004) Privacy preserving naive bayes classifier for vertically partitioned data. In: Proceedings of the 2004 SIAM international conference on data mining, pp 522–526. SIAM
Vaidya J, Kantarcıoġlu M, Clifton C (2008) Privacy-preserving naive bayes classification. The VLDB J 17(4):879–898
Wolff R, Bhaduri K, Kargupta H (2008) A generic local algorithm for mining data streams in large distributed systems. IEEE Trans Knowl Data Eng 21(4):465–478
Xing K, Hu C, Yu J, Cheng X, Zhang F (2017) Mutual privacy preserving k-means clustering in social participatory sensing. IEEE Trans Ind Inform 13(4):2066–2076
Xu G, Li H, Dai Y, Yang K, Lin X (2019) Enabling efficient and geometric range query with access control over encrypted spatial data. IEEE Trans Inf Forensics Secur 14(4):870–885
Xu G, Li H, Liu S, Wen M, Lu R (2019) Efficient and privacy-preserving truth discovery in mobile crowd sensing systems. IEEE Trans Veh Technol 68(4):3854–3865
Xu G, Li H, Liu S, Yang K, Lin X (2020) Verifynet: Secure and verifiable federated learning. IEEE Trans Inf Forensics Secur 15(1):911–926
Xu G, Li H, Ren H, Yang K, Deng RH (2019) Data security issues in deep learning: Attacks, countermeasures and opportunities. IEEE Commun Mag 57(11):116–122. https://doi.org/10.1109/MCOM.001.1900091
Xu M, Guo M, Shang L, Jia X (2016) Multi-value image segmentation based on fcm algorithm and graph cut theory. In: 2016 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1333–1340. IEEE
Xue Q, Zhu Y, Wang J (2019) Joint distribution estimation and naïve bayes classification under local differential privacy. IEEE Transactions on Emerging Topics in Computing. https://doi.org/10.1109/TETC.2019.2959581
Yu H, Vaidya J, Jiang X (2006) Privacy-preserving svm classification on vertically partitioned data. In: Pacific-asia conference on knowledge discovery and data mining, pp 647–656. Springer
Yu TK, Lee D, Chang SM, Zhan J (2010) Multi-party k-means clustering with privacy consideration. In: International symposium on parallel and distributed processing with applications, pp 200–207. IEEE
Zhu Y, Li X, Wang J, Liu Y, Qu Z (2017) Practical secure naïve bayesian classification over encrypted big data in cloud. Int J Found Comput Sci 28(06):683–703
Zhu Y, Zhang Y, Li X, Yan H, Li J (2018) Improved collusion-resisting secure nearest neighbor query over encrypted data in cloud. Concurrency and Computation Practice and Experience. https://doi.org/10.1002/cpe.4681
Acknowledgements
This work is partly supported by the National Key Research and Development Program of China (No. 2017YFB0802300), the Natural Science Foundation of China (No. 61602240), the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX18_0305), and the Research Fund of Guangxi Key Laboratory of Trusted Software (No. kx201906).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed Consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection: Special Issue on Security and Privacy in Machine Learning Assisted P2P Networks
Guest Editors: Hongwei Li, Rongxing Lu and Mohamed Mahmoud
Rights and permissions
About this article
Cite this article
Zhu, Y., Li, X. Privacy-preserving k-means clustering with local synchronization in peer-to-peer networks. Peer-to-Peer Netw. Appl. 13, 2272–2284 (2020). https://doi.org/10.1007/s12083-020-00881-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12083-020-00881-x