Data Leakage in Federated Learning

Jin, Xiao; Chen, Pin-Yu; Chen, Tianyi

doi:10.1007/978-3-030-96896-0_15

Xiao Jin³,
Pin-Yu Chen⁴ &
Tianyi Chen³

2754 Accesses
1 Citations
1 Altmetric

Abstract

Federated learning (FL) is a recent distributed machine learning paradigm, which allows the data owners to participate in the training process while keeping the data privacy. However, recent studies have shown that data can be still leaked through the gradient sharing mechanism in FL. Increasing batch size is often viewed as a promising defense strategy against data leakage. In this chapter, we provide an overview of data leakage problems in FL, revisit this attack premise, and propose an advanced data leakage attack to efficiently recover batch data from the aggregated gradients. We name our proposed method as c atastrophic d a ta leakage in f ederated l e arning (CAFE). Comparing to existing data leakage attacks, CAFE demonstrates the ability to perform large-batch data leakage attack with high recovery quality. Our experimental results suggest that data participated in FL, especially the vertical case, have a high risk of being leaked from the training gradients.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Beguier C, Tramel EW (2020) SAFER: Sparse secure aggregation for federated learning. arXiv, eprint:200714861
Google Scholar
Chen T, Jin X, Sun Y, Yin W (2020) VAFL: a method of vertical asynchronous federated learning. In: International workshop on federated learning for user privacy and data confidentiality in conjunction with ICML
Google Scholar
Cheng K, Fan T, Jin Y, Liu Y, Chen T, Yang Q (2019) Secureboost: A lossless federated learning framework. arXiv, eprint:190108755
Google Scholar
Dwork C, Smith A, Steinke T, Ullman J, Vadhan S (2015) Robust traceability from trace amounts. In: 2015 IEEE 56th annual symposium on foundations of computer science, USA, pp 650–669
Google Scholar
Fan L, Ng K, Ju C, Zhang T, Liu C, Chan CS, Yang Q (2020) Rethinking privacy preserving deep learning: How to evaluate and thwart privacy attacks. arXiv, eprint:200611601
Google Scholar
Fredrikson M, Jha S, Ristenpart T (2015) Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. Association for Computing Machinery, New York, NY, pp 1322–1333
Google Scholar
Geiping J, Bauermeister H, Dröge H, Moeller M (2020) Inverting gradients - how easy is it to break privacy in federated learning? In: Advances in neural information processing systems, vol 33, pp 16937–16947
Google Scholar
Gonzalez RC, Woods RE (1992) Digital image processing. Addison-Wesley, New York
Google Scholar
Guerraoui R, Gupta N, Pinot R, Rouault S, Stephan J (2021) Differential privacy and byzantine resilience in SGD: Do they add up? arXiv, eprint:210208166
Google Scholar
Guo S, Zhang T, Xiang T, Liu Y (2020) Differentially private decentralized learning. arXiv, eprint:200607817
Google Scholar
Hitaj B, Ateniese G, Pérez-Cruz F (2017) Deep models under the GAN: information leakage from collaborative deep learning. arXiv, eprint:170207464
Google Scholar
Huang Y, Song Z, Chen D, Li K, Arora S (2020) TextHide: Tackling data privacy in language understanding tasks. In: The conference on empirical methods in natural language processing
Google Scholar
Huang Y, Su Y, Ravi S, Song Z, Arora S, Li K (2020) Privacy-preserving learning via deep net pruning. arXiv, eprint:200301876
Google Scholar
Li Z, Huang Z, Chen C, Hong C (2019) Quantification of the leakage in federated learning. In: International workshop on federated learning for user privacy and data confidentiality. West 118–120 Vancouver Convention Center, Vancouver
Google Scholar
Li O, Sun J, Yang X, Gao W, Zhang H, Xie J, Smith V, Wang C (2021) Label leakage and protection in two-party split learning. arXiv, eprint:210208504
Google Scholar
Liang G, Chawathe SS (2004) Privacy-preserving inter-database operations. In: 2nd Symposium on intelligence and security informatics (ISI 2004), Berlin, Heidelberg, pp 66–82
Chapter Google Scholar
Liu R, Cao Y, Yoshikawa M, Chen H (2020) FedSel: Federated SGD under local differential privacy with top-k dimension selection. arXiv, eprint:200310637
Google Scholar
Liu Y, Kang Y, Zhang X, Li L, Cheng Y, Chen T, Hong M, Yang Q (2020) A communication efficient vertical federated learning framework. arXiv, eprint:191211187
Google Scholar
Long Y, Bindschaedler V, Wang L, Bu D, Wang X, Tang H, Gunter CA, Chen K (2018) Understanding membership inferences on well-generalized learning models. arXiv, eprint:180204889
Google Scholar
Lyu L, Yu H, Ma X, Sun L, Zhao J, Yang Q, Yu PS (2020) Privacy and robustness in federated learning: Attacks and defenses. arXiv, eprint:201206337
Google Scholar
McMahan HB, Moore E, Ramage D, y Arcas BA (2016) Federated learning of deep networks using model averaging. arXiv, eprint:160205629
Google Scholar
Melis L, Song C, Cristofaro ED, Shmatikov V (2018) Inference attacks against collaborative learning. In: Proceedings of the 35th annual computer security applications conference. Association for Computing Machinery, New York, NY, pp 148–162
Google Scholar
Niu C, Wu F, Tang S, Hua L, Jia R, Lv C, Wu Z, Chen G (2019) Secure federated submodel learning. arXiv, eprint:191102254
Google Scholar
Pan X, Zhang M, Yan Y, Zhu J, Yang M (2020) Theory-oriented deep leakage from gradients via linear equation solver. arXiv, eprint:201013356
Google Scholar
Qian J, Nassar H, Hansen LK (2021) On the limits to learning input data from gradients. arXiv, eprint:201015718
Google Scholar
Scannapieco M, Figotin I, Bertino E, Elmagarmid A (2007) Privacy preserving schema and data matching. In: Proceedings of the ACM SIGMOD international conference on management of data, Beijing, pp 653–664
Google Scholar
Shokri R, Shmatikov V (2015) Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Association for Computing Machinery, New York, NY, CCS ’15, pp 1310–1321
Google Scholar
Shokri R, Stronati M, Song C, Shmatikov V (2017) Membership inference attacks against machine learning models. In: 2017 IEEE symposium on security and privacy (SP), pp 3–18
Google Scholar
So J, Guler B, Avestimehr AS (2021) Byzantine-resilient secure federated learning. arXiv, eprint:200711115
Google Scholar
So J, Guler B, Avestimehr AS (2021) Turbo-aggregate: Breaking the quadratic aggregation barrier in secure federated learning. arXiv, eprint:200204156
Google Scholar
Sun L, Lyu L (2020) Federated model distillation with noise-free differential privacy. arXiv, eprint:200905537
Google Scholar
Sun J, Li A, Wang B, Yang H, Li H, Chen Y (2020) Provable defense against privacy leakage in federated learning from representation perspective. arXiv, eprint:201206043
Google Scholar
Trefethen LN, Bau D (1997) Numerical linear algebra. SIAM, Philadelphia
Book Google Scholar
Wei W, Liu L, Loper M, Chow KH, Gursoy ME, Truex S, Wu Y (2020) A framework for evaluating gradient leakage attacks in federated learning. arXiv, eprint:200410397
Google Scholar
Wei K, Li J, Ding M, Ma C, Su H, Zhang B, Poor HV (2021) User-level privacy-preserving federated learning: Analysis and performance optimization. arXiv, eprint:200300229
Google Scholar
Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: Concept and applications, vol 10. Association for Computing Machinery, New York, NY
Google Scholar
Zhao B, Mopuri KR, Bilen H (2020) iDLG: Improved deep leakage from gradients. arXiv, eprint:200102610
Google Scholar
Zhu J, Blaschko MB (2021) R-GAP: Recursive gradient attack on privacy. In: International conference on learning representations
Google Scholar
Zhu L, Liu Z, Han S (2019) Deep leakage from gradients. In: Advances in neural information processing systems, Vancouver, pp 14774–14784
Google Scholar

Download references

Author information

Authors and Affiliations

Rensselaer Polytechnic Institute, Troy, NY, USA
Xiao Jin & Tianyi Chen
IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
Pin-Yu Chen

Authors

Xiao Jin
View author publications
You can also search for this author in PubMed Google Scholar
Pin-Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tianyi Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Jin .

Editor information

Editors and Affiliations

IBM Research – Almaden, San Jose, CA, USA
Heiko Ludwig
IBM Research -- Almaden, San Jose, CA, USA
Nathalie Baracaldo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jin, X., Chen, PY., Chen, T. (2022). Data Leakage in Federated Learning. In: Ludwig, H., Baracaldo, N. (eds) Federated Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-96896-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-96896-0_15
Published: 08 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96895-3
Online ISBN: 978-3-030-96896-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics