Abstract
Federated learning (FL) is a recent distributed machine learning paradigm, which allows the data owners to participate in the training process while keeping the data privacy. However, recent studies have shown that data can be still leaked through the gradient sharing mechanism in FL. Increasing batch size is often viewed as a promising defense strategy against data leakage. In this chapter, we provide an overview of data leakage problems in FL, revisit this attack premise, and propose an advanced data leakage attack to efficiently recover batch data from the aggregated gradients. We name our proposed method as c atastrophic d a ta leakage in f ederated l e arning (CAFE). Comparing to existing data leakage attacks, CAFE demonstrates the ability to perform large-batch data leakage attack with high recovery quality. Our experimental results suggest that data participated in FL, especially the vertical case, have a high risk of being leaked from the training gradients.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Beguier C, Tramel EW (2020) SAFER: Sparse secure aggregation for federated learning. arXiv, eprint:200714861
Chen T, Jin X, Sun Y, Yin W (2020) VAFL: a method of vertical asynchronous federated learning. In: International workshop on federated learning for user privacy and data confidentiality in conjunction with ICML
Cheng K, Fan T, Jin Y, Liu Y, Chen T, Yang Q (2019) Secureboost: A lossless federated learning framework. arXiv, eprint:190108755
Dwork C, Smith A, Steinke T, Ullman J, Vadhan S (2015) Robust traceability from trace amounts. In: 2015 IEEE 56th annual symposium on foundations of computer science, USA, pp 650–669
Fan L, Ng K, Ju C, Zhang T, Liu C, Chan CS, Yang Q (2020) Rethinking privacy preserving deep learning: How to evaluate and thwart privacy attacks. arXiv, eprint:200611601
Fredrikson M, Jha S, Ristenpart T (2015) Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. Association for Computing Machinery, New York, NY, pp 1322–1333
Geiping J, Bauermeister H, Dröge H, Moeller M (2020) Inverting gradients - how easy is it to break privacy in federated learning? In: Advances in neural information processing systems, vol 33, pp 16937–16947
Gonzalez RC, Woods RE (1992) Digital image processing. Addison-Wesley, New York
Guerraoui R, Gupta N, Pinot R, Rouault S, Stephan J (2021) Differential privacy and byzantine resilience in SGD: Do they add up? arXiv, eprint:210208166
Guo S, Zhang T, Xiang T, Liu Y (2020) Differentially private decentralized learning. arXiv, eprint:200607817
Hitaj B, Ateniese G, Pérez-Cruz F (2017) Deep models under the GAN: information leakage from collaborative deep learning. arXiv, eprint:170207464
Huang Y, Song Z, Chen D, Li K, Arora S (2020) TextHide: Tackling data privacy in language understanding tasks. In: The conference on empirical methods in natural language processing
Huang Y, Su Y, Ravi S, Song Z, Arora S, Li K (2020) Privacy-preserving learning via deep net pruning. arXiv, eprint:200301876
Li Z, Huang Z, Chen C, Hong C (2019) Quantification of the leakage in federated learning. In: International workshop on federated learning for user privacy and data confidentiality. West 118–120 Vancouver Convention Center, Vancouver
Li O, Sun J, Yang X, Gao W, Zhang H, Xie J, Smith V, Wang C (2021) Label leakage and protection in two-party split learning. arXiv, eprint:210208504
Liang G, Chawathe SS (2004) Privacy-preserving inter-database operations. In: 2nd Symposium on intelligence and security informatics (ISI 2004), Berlin, Heidelberg, pp 66–82
Liu R, Cao Y, Yoshikawa M, Chen H (2020) FedSel: Federated SGD under local differential privacy with top-k dimension selection. arXiv, eprint:200310637
Liu Y, Kang Y, Zhang X, Li L, Cheng Y, Chen T, Hong M, Yang Q (2020) A communication efficient vertical federated learning framework. arXiv, eprint:191211187
Long Y, Bindschaedler V, Wang L, Bu D, Wang X, Tang H, Gunter CA, Chen K (2018) Understanding membership inferences on well-generalized learning models. arXiv, eprint:180204889
Lyu L, Yu H, Ma X, Sun L, Zhao J, Yang Q, Yu PS (2020) Privacy and robustness in federated learning: Attacks and defenses. arXiv, eprint:201206337
McMahan HB, Moore E, Ramage D, y Arcas BA (2016) Federated learning of deep networks using model averaging. arXiv, eprint:160205629
Melis L, Song C, Cristofaro ED, Shmatikov V (2018) Inference attacks against collaborative learning. In: Proceedings of the 35th annual computer security applications conference. Association for Computing Machinery, New York, NY, pp 148–162
Niu C, Wu F, Tang S, Hua L, Jia R, Lv C, Wu Z, Chen G (2019) Secure federated submodel learning. arXiv, eprint:191102254
Pan X, Zhang M, Yan Y, Zhu J, Yang M (2020) Theory-oriented deep leakage from gradients via linear equation solver. arXiv, eprint:201013356
Qian J, Nassar H, Hansen LK (2021) On the limits to learning input data from gradients. arXiv, eprint:201015718
Scannapieco M, Figotin I, Bertino E, Elmagarmid A (2007) Privacy preserving schema and data matching. In: Proceedings of the ACM SIGMOD international conference on management of data, Beijing, pp 653–664
Shokri R, Shmatikov V (2015) Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Association for Computing Machinery, New York, NY, CCS ’15, pp 1310–1321
Shokri R, Stronati M, Song C, Shmatikov V (2017) Membership inference attacks against machine learning models. In: 2017 IEEE symposium on security and privacy (SP), pp 3–18
So J, Guler B, Avestimehr AS (2021) Byzantine-resilient secure federated learning. arXiv, eprint:200711115
So J, Guler B, Avestimehr AS (2021) Turbo-aggregate: Breaking the quadratic aggregation barrier in secure federated learning. arXiv, eprint:200204156
Sun L, Lyu L (2020) Federated model distillation with noise-free differential privacy. arXiv, eprint:200905537
Sun J, Li A, Wang B, Yang H, Li H, Chen Y (2020) Provable defense against privacy leakage in federated learning from representation perspective. arXiv, eprint:201206043
Trefethen LN, Bau D (1997) Numerical linear algebra. SIAM, Philadelphia
Wei W, Liu L, Loper M, Chow KH, Gursoy ME, Truex S, Wu Y (2020) A framework for evaluating gradient leakage attacks in federated learning. arXiv, eprint:200410397
Wei K, Li J, Ding M, Ma C, Su H, Zhang B, Poor HV (2021) User-level privacy-preserving federated learning: Analysis and performance optimization. arXiv, eprint:200300229
Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: Concept and applications, vol 10. Association for Computing Machinery, New York, NY
Zhao B, Mopuri KR, Bilen H (2020) iDLG: Improved deep leakage from gradients. arXiv, eprint:200102610
Zhu J, Blaschko MB (2021) R-GAP: Recursive gradient attack on privacy. In: International conference on learning representations
Zhu L, Liu Z, Han S (2019) Deep leakage from gradients. In: Advances in neural information processing systems, Vancouver, pp 14774–14784
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Jin, X., Chen, PY., Chen, T. (2022). Data Leakage in Federated Learning. In: Ludwig, H., Baracaldo, N. (eds) Federated Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-96896-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-96896-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96895-3
Online ISBN: 978-3-030-96896-0
eBook Packages: Computer ScienceComputer Science (R0)