Privacy-preserved federated clustering with Non-IID data via GANs

Zhao, Jianzhe; Wang, Wenji; Wang, Jiabao; Zhang, Songyang; Fan, Zhelin; Matwin, Stan

doi:10.1007/s11227-025-07006-2

Privacy-preserved federated clustering with Non-IID data via GANs

Published: 17 February 2025

Volume 81, article number 512, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jianzhe Zhao¹,
Wenji Wang¹,
Jiabao Wang²,
Songyang Zhang¹,
Zhelin Fan¹ &
…
Stan Matwin³

130 Accesses
Explore all metrics

Abstract

Federated clustering (FedC) is designed to cluster participants by utilizing global similarity measures and then training on independent clusters to enhance global accuracy. As an unsupervised federated learning approach, FedC operates on distributed and unlabeled data while upholding privacy. However, it faces challenges, such as non-independent and identically distributed (Non-IID) data on clients rendering the global clustering structure fragile, and potential privacy leaks through shared gradients. In response, this study introduces GFC-DP, a privacy-preserving federated clustering algorithm tailored for Non-IID data using generative adversarial networks (GANs), to address both data heterogeneity and privacy protection concerns. The algorithm incorporates GANs to generate synthetic data, leveraging global information to construct robust clustering structures. Notably, as the first work introducing a client selection strategy in GANs model training, it enhances the performance of global GANs models by defining a client evaluation equation and subsequently selecting better-performing clients to participate in GANs model training. Additionally, Gaussian noise is introduced during GANs model training to bolster privacy and counter model inversion and membership inference attacks. One-shot FedC is performed on the client side based on global centroids to obtain a stable global clustering structure. We conducted comprehensive experiments on the MNIST, Cifar-10, Rotated MNIST, and Rotated Cifar-10 datasets. The results demonstrate that, in Non-IID scenarios, GFC-DP achieves superior accuracy in both GANs performance and clustering effectiveness compared to similar algorithms in image classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Domain Isolation and Sample Clustered Federated Learning for Semantic Segmentation

FedGAN: A Federated Semi-supervised Learning from Non-IID Data

Multi-generator MD-GAN with Reset Discriminator: A Framework to Handle Non-IID Data

Data Availability

All data generated or analyzed during this study can be accessed by tensorflow/g3doc/tutorials/mnist/ and http://www.cs.toronto.edu/kriz/cifar-10-python.tar.gz.

References

McMahan HB, Moore E, Ramage D, Hampson S, y Arcas BA (2016) Communication-efficient learning of deep networks from decentralized data. In: International Conference on Artificial Intelligence and Statistics, pp 1273–1282
Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–19
Article MATH Google Scholar
Nguyen DC, Pham Q-V, Pathirana PN, Ding M, Seneviratne AP, Lin Z, Dobre OA, Hwang WJ (2021) Federated learning for smart healthcare: a survey. ACM Comput Surv (CSUR) 55:1–37
Google Scholar
Zhang J, Zhou J, Guo J, Sun X (2023) Visual object detection for privacy-preserving federated learning. IEEE Access 11:33324–33335
Article MATH Google Scholar
Hosseinzadeh M, Hemmati A, Rahmani AM (2022) Federated learning-based iot: a systematic literature review. Int J Commun Syst 35(11):e5185
Article Google Scholar
Zhang F, Kuang K, Chen L, You Z, Shen T, Xiao J, Zhang Y, Wu C, Wu F, Zhuang Y, Li X (2023) Federated unsupervised representation learning. Front Inf Technol Electron Eng 24(8):1181–1193. https://doi.org/10.1631/FITEE.2200268
Article Google Scholar
Ghosh A, Chung J, Yin D, Ramchandran K (2020) An efficient framework for clustered federated learning. IEEE Trans Inf Theory 68:8076–8091
Article MathSciNet MATH Google Scholar
Yoon T, Shin S, Hwang SJ, Yang E (2021) Fedmix: approximation of mixup under mean augmented federated learning. arXiv:2107.00233
Lu L, Lin Y, Wen Y, Zhu J, Xiong S (2023) Federated clustering for recognizing driving styles from private trajectories. Eng Appl Artif Intell 118:105714
Article Google Scholar
Li Y, Wang S, Chi C-Y, Quek TQS (2023) Differentially private federated clustering over non-iid data. arXiv:2301.00955
Sattler F, Müller K, Samek W (2019) Clustered federated learning: model-agnostic distributed multi-task optimization under privacy constraints. arXiv:1910.01991
Kolluri A, Baluta T, Saxena P (2021) Private hierarchical clustering in federated networks. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security
Zhu H, Xu J, Liu S, Jin Y (2021) Federated learning on non-iid data: a survey. arXiv:2106.06843
Nishio T, Yonetani R (2018) Client selection for federated learning with heterogeneous resources in mobile edge. ICC 2019 - 2019 IEEE International Conference on Communications (ICC), 1–7
Dennis DK, Li T, Smith V (2021) Heterogeneity for the win: one-shot federated clustering. In: International Conference on Machine Learning, pp 2611–2620. https://api.semanticscholar.org/CorpusID:232075682
Hong J, Wang H, Wang Z, Zhou J (2022) Efficient split-mix federated learning for on-demand and in-situ customization. arXiv:2203.09747
Augenstein S, McMahan HB, Ramage D, Ramaswamy S, Kairouz P, Chen M, Mathews R, y Arcas BA (2020) Generative models for effective ml on private, decentralized datasets
Mukherjee S, Asnani H, Lin E, Kannan S (2018) Clustergan: latent space clustering in generative adversarial networks. In: AAAI Conference on Artificial Intelligence, pp 4610–4617. https://api.semanticscholar.org/CorpusID:52188737
Yoon T, Shin S, Hwang SJ, Yang E (2021) Fedmix: approximation of mixup under mean augmented federated learning. arXiv:2107.00233
Wang K, Deng N, Li X (2023) An efficient content popularity prediction of privacy preserving based on federated learning and wasserstein gan. IEEE Internet Things J 10:3786–3798
Article MATH Google Scholar
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. arXiv:1701.07875
Jie M, Long G, Zhou T, Jiang J, Zhang C (2022) On the convergence of clustered federated learning. arXiv:2202.06187
Yan J, Liu J, Qi J, Zhang Z (2022) Federated clustering with gan-based data synthesis. arXiv:2210.16524
Mohassel P, Zhang Y (2017) Secureml: a system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp 19–38. IEEE
Wei K, Li J, Ding M, Ma C, Yang HH, Farokhi F, Jin S, Quek TQS, Poor HV (2020) Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans Inf Forensics Secur 15:3454–3469. https://doi.org/10.1109/TIFS.2020.2988575
Article MATH Google Scholar
Geyer R, Klein T, Nabi M (2017) Differentially private federated learning: a client level perspective. arXiv:1712.07557
Chamikara MAP, Liu D, Camtepe S, Nepal S, Grobler M, Bertók P, Khalil I (2022) Local differential privacy for federated learning in industrial settings. arXiv:2202.06053
Shokri R, Stronati M, Song C, Shmatikov V (2016) Membership inference attacks against machine learning models. 2017 IEEE Symposium on Security and Privacy (SP), 3–18
Fredrikson M, Jha S, Ristenpart T (2015) Model inversion attacks that exploit confidence information and basic countermeasures. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
Torkzadehmahani R, Kairouz P, Paten BJ (2019) Dp-cgan: Differentially private synthetic data and label generation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 98–104
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434
Kim Y, Lee W (2022) Distributed raman spectrum data augmentation system using federated learning with deep generative models. Sensors (Basel, Switzerland) 22
Chuenbubpha T, Boonchoo T, Haga J, Rattanatamrong P (2023) Solving non-iid in federated learning for image classification using gans. In: 2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 333–338. https://doi.org/10.1109/JCSSE58229.2023.10202100
Li Z, Shao J, Mao Y, Wang JH, Zhang J (2022) Federated learning with gan-based data synthesis for non-iid clients. In: FL@IJCAI, pp. 17–32. https://api.semanticscholar.org/CorpusID:249626271
Gad G, Fadlullah ZM (2022) Federated learning via augmented knowledge distillation for heterogenous deep human activity recognition systems. Sensors (Basel, Switzerland) 23
Dwork C, McSherry F, Nissim K, Smith AD (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography Conference, pp. 265–284. https://api.semanticscholar.org/CorpusID:2468323
Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth AL (2015) Preserving statistical validity in adaptive data analysis. In: Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing, pp. 117–126
Zhang L, Shen B, Barnawi A, Xi S, Kumar N, Wu Y (2021) Feddpgan: Federated differentially private generative adversarial networks framework for the detection of covid-19 pneumonia. Inf Syst Front 23:1403–1415
Article MATH Google Scholar
Stallmann M, Wilbik A (2022) Towards federated clustering: a federated fuzzy c-means algorithm (ffcm). arXiv:2201.07316
Liu B, Guo Y, Chen X (2021) Pfa: privacy-preserving federated adaptation for effective model personalization. Proceedings of the Web Conference 2021
Sattler F, Müller K-R, Samek W (2019) Clustered federated learning: model-agnostic distributed multitask optimization under privacy constraints. IEEE Trans Neural Netw Learn Syst 32:3710–3722
Article MathSciNet MATH Google Scholar
Liu Y, Peng J, Yu JJQ, Wu Y (2019) Ppgan: privacy-preserving generative adversarial network. 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), 985–989
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: NIPS, pp. 342–398. https://api.semanticscholar.org/CorpusID:1033682
Liu S, Qian Y, Hao Y (2024) Balancing privacy and attack utility: calibrating sample difficulty for membership inference attacks in transfer learning. In: 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S), pp 159–160. https://doi.org/10.1109/DSN-S60304.2024.00046
Chen L, Zhao D, Tao L, Wang K, Qiao S, Zeng X, Tan CW (2024) A credible and fair federated learning framework based on blockchain. IEEE Trans Artif Intell. https://doi.org/10.1109/TAI.2024.3355362
Article MATH Google Scholar
Zhu T, Ye D, Zhou S, Liu B, Zhou W (2023) Label-only model inversion attacks: Attack with the least information. IEEE Trans Inf Forensics Secur 18:991–1005. https://doi.org/10.1109/TIFS.2022.3233190
Article MATH Google Scholar
Chen L, Zhang W, Dong C, Huang Z, Nie Y, Hou Z, Qiao S, Tan CW (2024) Feddrl: trustworthy federated learning model fusion method based on staged reinforcement learning. Comput Inform 43(1):1–37. https://doi.org/10.31577/cai_2024_1_1
Article Google Scholar
Zhao C, Gao Z, Wang Q, Mo Z, Yu X (2022) Fedgan: A federated semi-supervised learning from non-iid data. In: Wang L, Segal M, Chen J, Qiu T (eds) Wireless Algorithms, Systems, and Applications. Springer, Cham, pp 181–192
Chapter MATH Google Scholar
Wijesinghe A, Zhang S, Ding Z (2024) Ps-fedgan: An efficient federated learning framework with strong data privacy. IEEE Internet Things J 11(16):27584–27596. https://doi.org/10.1109/JIOT.2024.3399226
Article Google Scholar
Singh R, Liu F, Sun Y, Shroff NB (2024) Multi-armed bandits with dependent arms. Mach Learn 113(1):45–71. https://doi.org/10.1007/S10994-023-06457-Z
Article MathSciNet MATH Google Scholar
Wakayama S, Ahmed N (2024) Observation-augmented contextual multi-armed bandits for robotic search and exploration. IEEE Robot Autom Lett 9(10):8531–8538. https://doi.org/10.1109/LRA.2024.3448133
Article MATH Google Scholar
Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, ???. https://doi.org/10.1145/2976749.2978318
Chen L, Zhang W, Dong C, Zhao D, Zeng X, Qiao S, Zhu Y, Tan CW (2024) Fedtkd: a trustworthy heterogeneous federated learning based on adaptive knowledge distillation. Entropy. https://doi.org/10.3390/e26010096
Article Google Scholar
Ribeiro B, Gomes L, Barbarroxa R, Vale ZA (2023) A novel framework for multiagent knowledge-based federated learning systems. In: Practical Applications of Agents and Multi-Agent Systems, pp 296–306. https://api.semanticscholar.org/CorpusID:259938844

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. 62102074 and the Natural Science Foundation of Liaoning Province No. 2024-MSBA-49.

Author information

Authors and Affiliations

Software College, Northeastern University, Shenyang, Liaoning, China
Jianzhe Zhao, Wenji Wang, Songyang Zhang & Zhelin Fan
Software College, Zhejiang University, Ningbo, Zhejiang, China
Jiabao Wang
Department of Computer Science, Dalhousie University, Halifax, NS, Canada
Stan Matwin

Authors

Jianzhe Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Wenji Wang
View author publications
You can also search for this author inPubMed Google Scholar
Jiabao Wang
View author publications
You can also search for this author inPubMed Google Scholar
Songyang Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zhelin Fan
View author publications
You can also search for this author inPubMed Google Scholar
Stan Matwin
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Jianzhe Zhao involved in conceptualization and methodology. Wenji Wang involved in data curation, software, and writing—original draft preparation. Jiabao Wang involved in visualization, software, and investigation. Songyang Zhang involved in software and validation. Linzhe Fan involved in software, writing—reviewing and editing. Stan Matwin involved in conceptualization and writing—reviewing.

Corresponding author

Correspondence to Wenji Wang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

It is not applicable for the study since it does not involve humans or animals.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, J., Wang, W., Wang, J. et al. Privacy-preserved federated clustering with Non-IID data via GANs. J Supercomput 81, 512 (2025). https://doi.org/10.1007/s11227-025-07006-2

Download citation

Accepted: 30 January 2025
Published: 17 February 2025
DOI: https://doi.org/10.1007/s11227-025-07006-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy-preserved federated clustering with Non-IID data via GANs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Domain Isolation and Sample Clustered Federated Learning for Semantic Segmentation

FedGAN: A Federated Semi-supervised Learning from Non-IID Data

Multi-generator MD-GAN with Reset Discriminator: A Framework to Handle Non-IID Data

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now