Skip to main content

Data Leakage in Federated Learning

  • Chapter
  • First Online:
Federated Learning

Abstract

Federated learning (FL) is a recent distributed machine learning paradigm, which allows the data owners to participate in the training process while keeping the data privacy. However, recent studies have shown that data can be still leaked through the gradient sharing mechanism in FL. Increasing batch size is often viewed as a promising defense strategy against data leakage. In this chapter, we provide an overview of data leakage problems in FL, revisit this attack premise, and propose an advanced data leakage attack to efficiently recover batch data from the aggregated gradients. We name our proposed method as c atastrophic d a ta leakage in f ederated l e arning (CAFE). Comparing to existing data leakage attacks, CAFE demonstrates the ability to perform large-batch data leakage attack with high recovery quality. Our experimental results suggest that data participated in FL, especially the vertical case, have a high risk of being leaked from the training gradients.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Beguier C, Tramel EW (2020) SAFER: Sparse secure aggregation for federated learning. arXiv, eprint:200714861

    Google Scholar 

  2. Chen T, Jin X, Sun Y, Yin W (2020) VAFL: a method of vertical asynchronous federated learning. In: International workshop on federated learning for user privacy and data confidentiality in conjunction with ICML

    Google Scholar 

  3. Cheng K, Fan T, Jin Y, Liu Y, Chen T, Yang Q (2019) Secureboost: A lossless federated learning framework. arXiv, eprint:190108755

    Google Scholar 

  4. Dwork C, Smith A, Steinke T, Ullman J, Vadhan S (2015) Robust traceability from trace amounts. In: 2015 IEEE 56th annual symposium on foundations of computer science, USA, pp 650–669

    Google Scholar 

  5. Fan L, Ng K, Ju C, Zhang T, Liu C, Chan CS, Yang Q (2020) Rethinking privacy preserving deep learning: How to evaluate and thwart privacy attacks. arXiv, eprint:200611601

    Google Scholar 

  6. Fredrikson M, Jha S, Ristenpart T (2015) Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. Association for Computing Machinery, New York, NY, pp 1322–1333

    Google Scholar 

  7. Geiping J, Bauermeister H, Dröge H, Moeller M (2020) Inverting gradients - how easy is it to break privacy in federated learning? In: Advances in neural information processing systems, vol 33, pp 16937–16947

    Google Scholar 

  8. Gonzalez RC, Woods RE (1992) Digital image processing. Addison-Wesley, New York

    Google Scholar 

  9. Guerraoui R, Gupta N, Pinot R, Rouault S, Stephan J (2021) Differential privacy and byzantine resilience in SGD: Do they add up? arXiv, eprint:210208166

    Google Scholar 

  10. Guo S, Zhang T, Xiang T, Liu Y (2020) Differentially private decentralized learning. arXiv, eprint:200607817

    Google Scholar 

  11. Hitaj B, Ateniese G, Pérez-Cruz F (2017) Deep models under the GAN: information leakage from collaborative deep learning. arXiv, eprint:170207464

    Google Scholar 

  12. Huang Y, Song Z, Chen D, Li K, Arora S (2020) TextHide: Tackling data privacy in language understanding tasks. In: The conference on empirical methods in natural language processing

    Google Scholar 

  13. Huang Y, Su Y, Ravi S, Song Z, Arora S, Li K (2020) Privacy-preserving learning via deep net pruning. arXiv, eprint:200301876

    Google Scholar 

  14. Li Z, Huang Z, Chen C, Hong C (2019) Quantification of the leakage in federated learning. In: International workshop on federated learning for user privacy and data confidentiality. West 118–120 Vancouver Convention Center, Vancouver

    Google Scholar 

  15. Li O, Sun J, Yang X, Gao W, Zhang H, Xie J, Smith V, Wang C (2021) Label leakage and protection in two-party split learning. arXiv, eprint:210208504

    Google Scholar 

  16. Liang G, Chawathe SS (2004) Privacy-preserving inter-database operations. In: 2nd Symposium on intelligence and security informatics (ISI 2004), Berlin, Heidelberg, pp 66–82

    Chapter  Google Scholar 

  17. Liu R, Cao Y, Yoshikawa M, Chen H (2020) FedSel: Federated SGD under local differential privacy with top-k dimension selection. arXiv, eprint:200310637

    Google Scholar 

  18. Liu Y, Kang Y, Zhang X, Li L, Cheng Y, Chen T, Hong M, Yang Q (2020) A communication efficient vertical federated learning framework. arXiv, eprint:191211187

    Google Scholar 

  19. Long Y, Bindschaedler V, Wang L, Bu D, Wang X, Tang H, Gunter CA, Chen K (2018) Understanding membership inferences on well-generalized learning models. arXiv, eprint:180204889

    Google Scholar 

  20. Lyu L, Yu H, Ma X, Sun L, Zhao J, Yang Q, Yu PS (2020) Privacy and robustness in federated learning: Attacks and defenses. arXiv, eprint:201206337

    Google Scholar 

  21. McMahan HB, Moore E, Ramage D, y Arcas BA (2016) Federated learning of deep networks using model averaging. arXiv, eprint:160205629

    Google Scholar 

  22. Melis L, Song C, Cristofaro ED, Shmatikov V (2018) Inference attacks against collaborative learning. In: Proceedings of the 35th annual computer security applications conference. Association for Computing Machinery, New York, NY, pp 148–162

    Google Scholar 

  23. Niu C, Wu F, Tang S, Hua L, Jia R, Lv C, Wu Z, Chen G (2019) Secure federated submodel learning. arXiv, eprint:191102254

    Google Scholar 

  24. Pan X, Zhang M, Yan Y, Zhu J, Yang M (2020) Theory-oriented deep leakage from gradients via linear equation solver. arXiv, eprint:201013356

    Google Scholar 

  25. Qian J, Nassar H, Hansen LK (2021) On the limits to learning input data from gradients. arXiv, eprint:201015718

    Google Scholar 

  26. Scannapieco M, Figotin I, Bertino E, Elmagarmid A (2007) Privacy preserving schema and data matching. In: Proceedings of the ACM SIGMOD international conference on management of data, Beijing, pp 653–664

    Google Scholar 

  27. Shokri R, Shmatikov V (2015) Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Association for Computing Machinery, New York, NY, CCS ’15, pp 1310–1321

    Google Scholar 

  28. Shokri R, Stronati M, Song C, Shmatikov V (2017) Membership inference attacks against machine learning models. In: 2017 IEEE symposium on security and privacy (SP), pp 3–18

    Google Scholar 

  29. So J, Guler B, Avestimehr AS (2021) Byzantine-resilient secure federated learning. arXiv, eprint:200711115

    Google Scholar 

  30. So J, Guler B, Avestimehr AS (2021) Turbo-aggregate: Breaking the quadratic aggregation barrier in secure federated learning. arXiv, eprint:200204156

    Google Scholar 

  31. Sun L, Lyu L (2020) Federated model distillation with noise-free differential privacy. arXiv, eprint:200905537

    Google Scholar 

  32. Sun J, Li A, Wang B, Yang H, Li H, Chen Y (2020) Provable defense against privacy leakage in federated learning from representation perspective. arXiv, eprint:201206043

    Google Scholar 

  33. Trefethen LN, Bau D (1997) Numerical linear algebra. SIAM, Philadelphia

    Book  Google Scholar 

  34. Wei W, Liu L, Loper M, Chow KH, Gursoy ME, Truex S, Wu Y (2020) A framework for evaluating gradient leakage attacks in federated learning. arXiv, eprint:200410397

    Google Scholar 

  35. Wei K, Li J, Ding M, Ma C, Su H, Zhang B, Poor HV (2021) User-level privacy-preserving federated learning: Analysis and performance optimization. arXiv, eprint:200300229

    Google Scholar 

  36. Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: Concept and applications, vol 10. Association for Computing Machinery, New York, NY

    Google Scholar 

  37. Zhao B, Mopuri KR, Bilen H (2020) iDLG: Improved deep leakage from gradients. arXiv, eprint:200102610

    Google Scholar 

  38. Zhu J, Blaschko MB (2021) R-GAP: Recursive gradient attack on privacy. In: International conference on learning representations

    Google Scholar 

  39. Zhu L, Liu Z, Han S (2019) Deep leakage from gradients. In: Advances in neural information processing systems, Vancouver, pp 14774–14784

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jin, X., Chen, PY., Chen, T. (2022). Data Leakage in Federated Learning. In: Ludwig, H., Baracaldo, N. (eds) Federated Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-96896-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96896-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96895-3

  • Online ISBN: 978-3-030-96896-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics