Skip to main content

The Applicability of Federated Learning to Official Statistics

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2023 (IDEAL 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14404))

  • 793 Accesses

Abstract

This work investigates the potential of Federated Learning (FL) for official statistics and shows how well the performance of FL models can keep up with centralized learning methods. FL is particularly interesting for official statistics because its utilization can safeguard the privacy of data holders, thus facilitating access to a broader range of data. By simulating three different use cases, important insights on the applicability of the technology are gained. The use cases are based on a medical insurance data set, a fine dust pollution data set and a mobile radio coverage data set–all of which are from domains close to official statistics. We provide a detailed analysis of the results, including a comparison of centralized and FL algorithm performances for each simulation. In all three use cases, we were able to train models via FL which reach a performance very close to the centralized model benchmarks. Our key observations and their implications for transferring the simulations into practice are summarized. We arrive at the conclusion that FL has the potential to emerge as a pivotal technology in future use cases of official statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Scanner data in consumer price statistics https://www.destatis.de/EN/Service/EXSTAT/Datensaetze/scanner-data.html, accessed on September 29, 2023.

  2. 2.

    Use of MNO data https://cros-legacy.ec.europa.eu/content/12-use-mno-data_en, accessed on September 29, 2023.

  3. 3.

    TensorFlow Federated https://tensorflow.org/federated, accessed on September 29, 2023.

  4. 4.

    PyCaret https://pycaret.org/, accessed on September 29, 2023.

  5. 5.

    scikit-learn https://scikit-learn.org/, accessed on September 29, 2023.

  6. 6.

    Code repository for this paper: https://www.github.com/joshua-stock/fl-official-statistics, accessed on September 29, 2023. Note that for the mobile radio coverage simulation, the code has only been executed locally on the private data set, hence it is not included in the repository.

  7. 7.

    US health insurance dataset https://www.kaggle.com/datasets/teertha/ushealthinsurancedataset, accessed on September 29, 2023.

  8. 8.

    Air quality and health https://www.who.int/teams/environment-climate-change-and-health/air-quality-and-health/policy-progress/sustainable-development-goals-air-pollution, accessed on September 29, 2023.

  9. 9.

    Beijing multi-site air-quality data set https://www.kaggle.com/datasets/sid321axn/beijing-multisite-airquality-data-set, accessed on September 29, 2023.

  10. 10.

    umlaut website https://www.umlaut.com/, accessed on September 29, 2023.

References

  1. Beck, M., Dumpert, F., Feuerhake, J.: Machine learning in official statistics. In: arXiv preprint arXiv:1812.10422 (2018)

  2. Buckley, D.: 15. United Nations Economic Commission for Europe: trialling approaches to privacy-preserving federated machine learning (2023)

    Google Scholar 

  3. United Nations Economic Commission for Europe. Machine Learning for Official Statistics. In: UNECE Machine Learning Group (2022)

    Google Scholar 

  4. Hu, B., et al.: Federated region-learning: an edge computing based framework for urban environment sensing. In: IEEE GLOBECOM (2018)

    Google Scholar 

  5. Kulkarni, V., Kulkarni, M., Pant, A.: Survey of personalization techniques for federated learning. In: WorldS4. IEEE (2020)

    Google Scholar 

  6. Li, Q., Wen, Z., He, B.: Practical federated gradient boosting decision trees. In: AAAI (2020)

    Google Scholar 

  7. McMahan, B., et al.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)

    Google Scholar 

  8. Melis, L., et al.: Exploiting unintended feature leakage in collaborative learning. In: SP, pp. 691–706. IEEE (2019)

    Google Scholar 

  9. Reddi, S., et al.: Adaptive federated optimization. In: arXiv preprint arXiv:2003.00295 (2020)

  10. Saidani, Y., Bohnensteffen, S., Hadam, S.: Qualität von Mobilfunkdaten - Projekterfahrungen und Anwendungsfälle aus der amtlichen Statistik. Wirtschaft und Statistik 5, 55–67 (2022)

    Google Scholar 

  11. Santos, B., et al.: Insights into privacy-preserving federated machine learning from the perspective of a national statistical office. In: Conference of European Statistics (2023)

    Google Scholar 

  12. Stock, J., et al.: Lessons learned: defending against property inference attacks. In: SECRYPT (2023). https://doi.org/10.5220/0012049200003555

  13. Yin, X., Zhu, Y., Hu, J.: A comprehensive survey of privacy-preserving federated learning: a taxonomy, review, and future directions. ACM Comput. Surv. (CSUR) 54(6), 1–36 (2021)

    Article  Google Scholar 

  14. Yung, W., et al.: A quality framework for statistical algorithms. Stat. J. IAOS 38(1), 291–308 (2022)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joshua Stock .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Stock, J., Hauke, O., Weißmann, J., Federrath, H. (2023). The Applicability of Federated Learning to Official Statistics. In: Quaresma, P., Camacho, D., Yin, H., Gonçalves, T., Julian, V., Tallón-Ballesteros, A.J. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2023. IDEAL 2023. Lecture Notes in Computer Science, vol 14404. Springer, Cham. https://doi.org/10.1007/978-3-031-48232-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48232-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48231-1

  • Online ISBN: 978-3-031-48232-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics