Abstract
Data imbalance and privacy disclosure shortcomings have become the main problems in the process of multi-source credit data fusion, the former causes conflicts during the fusion process, the latter brings huge security risks. While federated learning is used for data privacy protection, communication cost defects and inaccurate fusion results will follow. In order to effectively unify data fusion, the paper proposes an approach based on federated distillation learning, which uses synthetic distillation data instead of traditional parameter transfer models to fuse to reduce time cost and improve accuracy without compromising data privacy,simultaneously utilizing local data to train the model and conducting interactive learning with the server's model. Specifically, it uses a decision tree model to distill knowledge from credit data, replacing the traditional parameter transfer model. At the same time, the Generic Adversarial Network is used to balance data distribution and solve the problem of data imbalance on the server. The experimental results show that the method proposed has improved both utilization performance and unbalanced data processing by at least 3%.
Similar content being viewed by others
References
Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113
Bellotti T, Crook J (2019) Support vector machines for credit scoring and discovery of significant features. Expert Syst Appl 36(2):3302–3308
Ben-David A (2008) Rule effectiveness in rule-based systems: a credit scoring case study. Expert Syst Appl 34(4):2783–2788
Blanco A, Pino-Mejı´as R, Lara J, Rayo S (2013) Credit scoring models for the microfinance industry using neural networks: evidence from Peru. Expert Syst Appl 40(1):356–364
Yang Q, Liu Y, Chen T et al (2019) Federated Machine Learning: Concept and Applications. ACM Trans Intell Syst Technol 10(2):1–19
Huang J, Qian F, Guo Y et al (2013) An in depth study of LTE: Effect of network protocol and application behavior on performance. ACM Sigcomm Comput Commun Rev 43(4):363–374
BrenChawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F (2017) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci 479:448–455
Li T, Sahu AK, Talwalkar A et al (2020) Federated learning: Challenges, methods, and future directions. IEEE Signal Process Mag 37(3):50–60
Smith V, Chiang CK, Sanjabi M (2017) Federated multi-task learning. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS). Long Beach: Curran Associates, 4424−4434
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial nets. In: Advances in neural information processing systems. Neural Comput Appl 32:8451–8462
Duan M, Liu D, Chen X et al (2019) Astraea: self -balancing federated learning for improving classification accuracy of mobile deep learning applications. 2019 IEEE 37th international conference on computer design. New York: IEEE 2019:246–254
ChenY, Ssun X, Jin Y (2019) Communication-efficient federated deep learning with asynchronous model update and temporally weighted aggregation. 2019, arXiv:1903.07424
Liu L, Zhang J, Song S H, et al (2019) Client-edge-cloud hierarchical federated learning. 2019, arXiv:1905.06641
Yao X, Huang T, Wu C, et al (2019) Federated learning with additional mechanisms on clients to reduce communication costs. 2019, arXiv:1908.05891
Mcmahan HB, Moore E, Ramage D et al (2017) Communication-efficient learning of deep networks from decentralized data. Artif Intell Stat 10:1273–1282
Beimel A, Korolova A, Nissim K, et al (2019) The power of synergy in differential privacy: Combining a small curator with local randomizers. arXiv preprint,arXiv:1912.08951
Ye D, Wei H, Xiaojun C et al (2020) Efficient and secure federated learning based on secret sharing and gradients selection. J Comput Res Dev 57(10):2241–2250
Tran N H, Bao W, Zomaya A, et al (2019) Federated learning over wireless networks: Optimization model design and analysis. In Proceedings of IEEE INFOCOM 2019 - IEEE Conference on Computer Communications. Piscataway, NJ: IEEE. 1387–1395
Shiqiang W, Tuor T, Salonidis T et al (2019) Adaptive federated learning in resource constrained edge computing systems. IEEE J Sel Areas Commun 37(6):1205–1221
Eunjeong Jeong, Seungeun Oh, Hyesung Kim, et al (2018) Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data. CoRR abs/1811.11479
Itahara S, Nishio T, Koda Y et al (2021) Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-IID private data. IEEE Trans Mob Comput 22(1):91–205
Li D, Wang J (2019) Fedmd: Heterogenous federated learning via model distillation. ar Xiv :1910.03581
Chang H, Shejwalkar V, Shokri R, et al (2019) Cronus: Robust and heterogeneous collaborative learning with black-box knowledge transfer. 2019, ar Xiv preprint ar Xiv:1912.11279
Wu Y, Cai S, Xiao X. et al (2020) Privacy preserving vertical federated learning for tree-based model. arXiv preprint arXiv:2008.06170
Yang M, Song L, Xu J, et al (2019) The tradeoff between privacy and accuracy in anomaly detection using federated XGBoost. arXiv preprint arXiv:1907.07157
Liu L, Zhang H, Ji Y, Wu QJ (2019) Towards AI fashion design: an attribute-GAN model for clothing match. Neurocomputing 341:156–167
Luo C, Wu D, Wu D (2018) A deep learning approach for credit scoring using credit default swaps. Eng Appl Artif Intell 65:465–470
Zhang H, Sun Y, Liu L et al (2018) ClothingOut: a category-supervised GAN model for clothing segmentation and retrieval. Neural Comput Appl 32:4219
Zhang Y, Wang D, Chen Y, Shang H, Tian Q (2017) Credit risk assessment based on long short-term memory model. Int Conf Intell Comput. https://doi.org/10.1007/978-3-319-63312-1_62
Zojaji Z, Atani RE, Monadjemi AH et al (2016) A survey of credit card fraud detection techniques: data and technique oriented perspective. ArXiv preprint arXiv:1611.06439
Lei K, Xie Y, Zhong S et al (2020) Generative adversarial fusion network for class imbalance credit scoring. Neural Comput Appl 32:8451–8462
Heo B, Lee M, Yun S (2019) Knowledge distillation with adversarial samples supporting decision boundary. Proc AAAI Conf Artif Intell 33:3771–3778
Yang C, Xie L, Su C et al (2019) Snapshot distillation: teacher-student optimization in one generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Long Beach, pp 2859–2868
Cha H, Park J, Kim H et al (2019) Federated reinforcement distillation with proxy experience memory. In: Proceedings of the IEEE conference on federated machine learning for user privacy and data confidentiality (FML 2019)
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comput Sci 14(7):38–39
Jeong E, Oh S, Kim H, et al (2018) Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data. abs/1811.11479
Funding
The National Key Research and Development Program of China,2019YFB1404602.
Author information
Authors and Affiliations
Contributions
All authors contribute equally to the article. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, X., Sun, Z., Mao, L. et al. A multi-source credit data fusion approach based on federated distillation learning. Int. J. Mach. Learn. & Cyber. 15, 1153–1164 (2024). https://doi.org/10.1007/s13042-023-02032-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-02032-z