Abstract
Federated clustering (FedC) is designed to cluster participants by utilizing global similarity measures and then training on independent clusters to enhance global accuracy. As an unsupervised federated learning approach, FedC operates on distributed and unlabeled data while upholding privacy. However, it faces challenges, such as non-independent and identically distributed (Non-IID) data on clients rendering the global clustering structure fragile, and potential privacy leaks through shared gradients. In response, this study introduces GFC-DP, a privacy-preserving federated clustering algorithm tailored for Non-IID data using generative adversarial networks (GANs), to address both data heterogeneity and privacy protection concerns. The algorithm incorporates GANs to generate synthetic data, leveraging global information to construct robust clustering structures. Notably, as the first work introducing a client selection strategy in GANs model training, it enhances the performance of global GANs models by defining a client evaluation equation and subsequently selecting better-performing clients to participate in GANs model training. Additionally, Gaussian noise is introduced during GANs model training to bolster privacy and counter model inversion and membership inference attacks. One-shot FedC is performed on the client side based on global centroids to obtain a stable global clustering structure. We conducted comprehensive experiments on the MNIST, Cifar-10, Rotated MNIST, and Rotated Cifar-10 datasets. The results demonstrate that, in Non-IID scenarios, GFC-DP achieves superior accuracy in both GANs performance and clustering effectiveness compared to similar algorithms in image classification tasks.















Similar content being viewed by others
Data Availability
All data generated or analyzed during this study can be accessed by tensorflow/g3doc/tutorials/mnist/ and http://www.cs.toronto.edu/kriz/cifar-10-python.tar.gz.
References
McMahan HB, Moore E, Ramage D, Hampson S, y Arcas BA (2016) Communication-efficient learning of deep networks from decentralized data. In: International Conference on Artificial Intelligence and Statistics, pp 1273–1282
Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–19
Nguyen DC, Pham Q-V, Pathirana PN, Ding M, Seneviratne AP, Lin Z, Dobre OA, Hwang WJ (2021) Federated learning for smart healthcare: a survey. ACM Comput Surv (CSUR) 55:1–37
Zhang J, Zhou J, Guo J, Sun X (2023) Visual object detection for privacy-preserving federated learning. IEEE Access 11:33324–33335
Hosseinzadeh M, Hemmati A, Rahmani AM (2022) Federated learning-based iot: a systematic literature review. Int J Commun Syst 35(11):e5185
Zhang F, Kuang K, Chen L, You Z, Shen T, Xiao J, Zhang Y, Wu C, Wu F, Zhuang Y, Li X (2023) Federated unsupervised representation learning. Front Inf Technol Electron Eng 24(8):1181–1193. https://doi.org/10.1631/FITEE.2200268
Ghosh A, Chung J, Yin D, Ramchandran K (2020) An efficient framework for clustered federated learning. IEEE Trans Inf Theory 68:8076–8091
Yoon T, Shin S, Hwang SJ, Yang E (2021) Fedmix: approximation of mixup under mean augmented federated learning. arXiv:2107.00233
Lu L, Lin Y, Wen Y, Zhu J, Xiong S (2023) Federated clustering for recognizing driving styles from private trajectories. Eng Appl Artif Intell 118:105714
Li Y, Wang S, Chi C-Y, Quek TQS (2023) Differentially private federated clustering over non-iid data. arXiv:2301.00955
Sattler F, Müller K, Samek W (2019) Clustered federated learning: model-agnostic distributed multi-task optimization under privacy constraints. arXiv:1910.01991
Kolluri A, Baluta T, Saxena P (2021) Private hierarchical clustering in federated networks. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security
Zhu H, Xu J, Liu S, Jin Y (2021) Federated learning on non-iid data: a survey. arXiv:2106.06843
Nishio T, Yonetani R (2018) Client selection for federated learning with heterogeneous resources in mobile edge. ICC 2019 - 2019 IEEE International Conference on Communications (ICC), 1–7
Dennis DK, Li T, Smith V (2021) Heterogeneity for the win: one-shot federated clustering. In: International Conference on Machine Learning, pp 2611–2620. https://api.semanticscholar.org/CorpusID:232075682
Hong J, Wang H, Wang Z, Zhou J (2022) Efficient split-mix federated learning for on-demand and in-situ customization. arXiv:2203.09747
Augenstein S, McMahan HB, Ramage D, Ramaswamy S, Kairouz P, Chen M, Mathews R, y Arcas BA (2020) Generative models for effective ml on private, decentralized datasets
Mukherjee S, Asnani H, Lin E, Kannan S (2018) Clustergan: latent space clustering in generative adversarial networks. In: AAAI Conference on Artificial Intelligence, pp 4610–4617. https://api.semanticscholar.org/CorpusID:52188737
Yoon T, Shin S, Hwang SJ, Yang E (2021) Fedmix: approximation of mixup under mean augmented federated learning. arXiv:2107.00233
Wang K, Deng N, Li X (2023) An efficient content popularity prediction of privacy preserving based on federated learning and wasserstein gan. IEEE Internet Things J 10:3786–3798
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. arXiv:1701.07875
Jie M, Long G, Zhou T, Jiang J, Zhang C (2022) On the convergence of clustered federated learning. arXiv:2202.06187
Yan J, Liu J, Qi J, Zhang Z (2022) Federated clustering with gan-based data synthesis. arXiv:2210.16524
Mohassel P, Zhang Y (2017) Secureml: a system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp 19–38. IEEE
Wei K, Li J, Ding M, Ma C, Yang HH, Farokhi F, Jin S, Quek TQS, Poor HV (2020) Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans Inf Forensics Secur 15:3454–3469. https://doi.org/10.1109/TIFS.2020.2988575
Geyer R, Klein T, Nabi M (2017) Differentially private federated learning: a client level perspective. arXiv:1712.07557
Chamikara MAP, Liu D, Camtepe S, Nepal S, Grobler M, Bertók P, Khalil I (2022) Local differential privacy for federated learning in industrial settings. arXiv:2202.06053
Shokri R, Stronati M, Song C, Shmatikov V (2016) Membership inference attacks against machine learning models. 2017 IEEE Symposium on Security and Privacy (SP), 3–18
Fredrikson M, Jha S, Ristenpart T (2015) Model inversion attacks that exploit confidence information and basic countermeasures. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
Torkzadehmahani R, Kairouz P, Paten BJ (2019) Dp-cgan: Differentially private synthetic data and label generation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 98–104
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434
Kim Y, Lee W (2022) Distributed raman spectrum data augmentation system using federated learning with deep generative models. Sensors (Basel, Switzerland) 22
Chuenbubpha T, Boonchoo T, Haga J, Rattanatamrong P (2023) Solving non-iid in federated learning for image classification using gans. In: 2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 333–338. https://doi.org/10.1109/JCSSE58229.2023.10202100
Li Z, Shao J, Mao Y, Wang JH, Zhang J (2022) Federated learning with gan-based data synthesis for non-iid clients. In: FL@IJCAI, pp. 17–32. https://api.semanticscholar.org/CorpusID:249626271
Gad G, Fadlullah ZM (2022) Federated learning via augmented knowledge distillation for heterogenous deep human activity recognition systems. Sensors (Basel, Switzerland) 23
Dwork C, McSherry F, Nissim K, Smith AD (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography Conference, pp. 265–284. https://api.semanticscholar.org/CorpusID:2468323
Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth AL (2015) Preserving statistical validity in adaptive data analysis. In: Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing, pp. 117–126
Zhang L, Shen B, Barnawi A, Xi S, Kumar N, Wu Y (2021) Feddpgan: Federated differentially private generative adversarial networks framework for the detection of covid-19 pneumonia. Inf Syst Front 23:1403–1415
Stallmann M, Wilbik A (2022) Towards federated clustering: a federated fuzzy c-means algorithm (ffcm). arXiv:2201.07316
Liu B, Guo Y, Chen X (2021) Pfa: privacy-preserving federated adaptation for effective model personalization. Proceedings of the Web Conference 2021
Sattler F, Müller K-R, Samek W (2019) Clustered federated learning: model-agnostic distributed multitask optimization under privacy constraints. IEEE Trans Neural Netw Learn Syst 32:3710–3722
Liu Y, Peng J, Yu JJQ, Wu Y (2019) Ppgan: privacy-preserving generative adversarial network. 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), 985–989
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: NIPS, pp. 342–398. https://api.semanticscholar.org/CorpusID:1033682
Liu S, Qian Y, Hao Y (2024) Balancing privacy and attack utility: calibrating sample difficulty for membership inference attacks in transfer learning. In: 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S), pp 159–160. https://doi.org/10.1109/DSN-S60304.2024.00046
Chen L, Zhao D, Tao L, Wang K, Qiao S, Zeng X, Tan CW (2024) A credible and fair federated learning framework based on blockchain. IEEE Trans Artif Intell. https://doi.org/10.1109/TAI.2024.3355362
Zhu T, Ye D, Zhou S, Liu B, Zhou W (2023) Label-only model inversion attacks: Attack with the least information. IEEE Trans Inf Forensics Secur 18:991–1005. https://doi.org/10.1109/TIFS.2022.3233190
Chen L, Zhang W, Dong C, Huang Z, Nie Y, Hou Z, Qiao S, Tan CW (2024) Feddrl: trustworthy federated learning model fusion method based on staged reinforcement learning. Comput Inform 43(1):1–37. https://doi.org/10.31577/cai_2024_1_1
Zhao C, Gao Z, Wang Q, Mo Z, Yu X (2022) Fedgan: A federated semi-supervised learning from non-iid data. In: Wang L, Segal M, Chen J, Qiu T (eds) Wireless Algorithms, Systems, and Applications. Springer, Cham, pp 181–192
Wijesinghe A, Zhang S, Ding Z (2024) Ps-fedgan: An efficient federated learning framework with strong data privacy. IEEE Internet Things J 11(16):27584–27596. https://doi.org/10.1109/JIOT.2024.3399226
Singh R, Liu F, Sun Y, Shroff NB (2024) Multi-armed bandits with dependent arms. Mach Learn 113(1):45–71. https://doi.org/10.1007/S10994-023-06457-Z
Wakayama S, Ahmed N (2024) Observation-augmented contextual multi-armed bandits for robotic search and exploration. IEEE Robot Autom Lett 9(10):8531–8538. https://doi.org/10.1109/LRA.2024.3448133
Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, ???. https://doi.org/10.1145/2976749.2978318
Chen L, Zhang W, Dong C, Zhao D, Zeng X, Qiao S, Zhu Y, Tan CW (2024) Fedtkd: a trustworthy heterogeneous federated learning based on adaptive knowledge distillation. Entropy. https://doi.org/10.3390/e26010096
Ribeiro B, Gomes L, Barbarroxa R, Vale ZA (2023) A novel framework for multiagent knowledge-based federated learning systems. In: Practical Applications of Agents and Multi-Agent Systems, pp 296–306. https://api.semanticscholar.org/CorpusID:259938844
Funding
This work was supported by the National Natural Science Foundation of China under Grant No. 62102074 and the Natural Science Foundation of Liaoning Province No. 2024-MSBA-49.
Author information
Authors and Affiliations
Contributions
Jianzhe Zhao involved in conceptualization and methodology. Wenji Wang involved in data curation, software, and writing—original draft preparation. Jiabao Wang involved in visualization, software, and investigation. Songyang Zhang involved in software and validation. Linzhe Fan involved in software, writing—reviewing and editing. Stan Matwin involved in conceptualization and writing—reviewing.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethical approval
It is not applicable for the study since it does not involve humans or animals.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, J., Wang, W., Wang, J. et al. Privacy-preserved federated clustering with Non-IID data via GANs. J Supercomput 81, 512 (2025). https://doi.org/10.1007/s11227-025-07006-2
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-025-07006-2