Abstract
This work investigates the potential of Federated Learning (FL) for official statistics and shows how well the performance of FL models can keep up with centralized learning methods. FL is particularly interesting for official statistics because its utilization can safeguard the privacy of data holders, thus facilitating access to a broader range of data. By simulating three different use cases, important insights on the applicability of the technology are gained. The use cases are based on a medical insurance data set, a fine dust pollution data set and a mobile radio coverage data set–all of which are from domains close to official statistics. We provide a detailed analysis of the results, including a comparison of centralized and FL algorithm performances for each simulation. In all three use cases, we were able to train models via FL which reach a performance very close to the centralized model benchmarks. Our key observations and their implications for transferring the simulations into practice are summarized. We arrive at the conclusion that FL has the potential to emerge as a pivotal technology in future use cases of official statistics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Scanner data in consumer price statistics https://www.destatis.de/EN/Service/EXSTAT/Datensaetze/scanner-data.html, accessed on September 29, 2023.
- 2.
Use of MNO data https://cros-legacy.ec.europa.eu/content/12-use-mno-data_en, accessed on September 29, 2023.
- 3.
TensorFlow Federated https://tensorflow.org/federated, accessed on September 29, 2023.
- 4.
PyCaret https://pycaret.org/, accessed on September 29, 2023.
- 5.
scikit-learn https://scikit-learn.org/, accessed on September 29, 2023.
- 6.
Code repository for this paper: https://www.github.com/joshua-stock/fl-official-statistics, accessed on September 29, 2023. Note that for the mobile radio coverage simulation, the code has only been executed locally on the private data set, hence it is not included in the repository.
- 7.
US health insurance dataset https://www.kaggle.com/datasets/teertha/ushealthinsurancedataset, accessed on September 29, 2023.
- 8.
Air quality and health https://www.who.int/teams/environment-climate-change-and-health/air-quality-and-health/policy-progress/sustainable-development-goals-air-pollution, accessed on September 29, 2023.
- 9.
Beijing multi-site air-quality data set https://www.kaggle.com/datasets/sid321axn/beijing-multisite-airquality-data-set, accessed on September 29, 2023.
- 10.
umlaut website https://www.umlaut.com/, accessed on September 29, 2023.
References
Beck, M., Dumpert, F., Feuerhake, J.: Machine learning in official statistics. In: arXiv preprint arXiv:1812.10422 (2018)
Buckley, D.: 15. United Nations Economic Commission for Europe: trialling approaches to privacy-preserving federated machine learning (2023)
United Nations Economic Commission for Europe. Machine Learning for Official Statistics. In: UNECE Machine Learning Group (2022)
Hu, B., et al.: Federated region-learning: an edge computing based framework for urban environment sensing. In: IEEE GLOBECOM (2018)
Kulkarni, V., Kulkarni, M., Pant, A.: Survey of personalization techniques for federated learning. In: WorldS4. IEEE (2020)
Li, Q., Wen, Z., He, B.: Practical federated gradient boosting decision trees. In: AAAI (2020)
McMahan, B., et al.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)
Melis, L., et al.: Exploiting unintended feature leakage in collaborative learning. In: SP, pp. 691–706. IEEE (2019)
Reddi, S., et al.: Adaptive federated optimization. In: arXiv preprint arXiv:2003.00295 (2020)
Saidani, Y., Bohnensteffen, S., Hadam, S.: Qualität von Mobilfunkdaten - Projekterfahrungen und Anwendungsfälle aus der amtlichen Statistik. Wirtschaft und Statistik 5, 55–67 (2022)
Santos, B., et al.: Insights into privacy-preserving federated machine learning from the perspective of a national statistical office. In: Conference of European Statistics (2023)
Stock, J., et al.: Lessons learned: defending against property inference attacks. In: SECRYPT (2023). https://doi.org/10.5220/0012049200003555
Yin, X., Zhu, Y., Hu, J.: A comprehensive survey of privacy-preserving federated learning: a taxonomy, review, and future directions. ACM Comput. Surv. (CSUR) 54(6), 1–36 (2021)
Yung, W., et al.: A quality framework for statistical algorithms. Stat. J. IAOS 38(1), 291–308 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Stock, J., Hauke, O., Weißmann, J., Federrath, H. (2023). The Applicability of Federated Learning to Official Statistics. In: Quaresma, P., Camacho, D., Yin, H., Gonçalves, T., Julian, V., Tallón-Ballesteros, A.J. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2023. IDEAL 2023. Lecture Notes in Computer Science, vol 14404. Springer, Cham. https://doi.org/10.1007/978-3-031-48232-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-48232-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48231-1
Online ISBN: 978-3-031-48232-8
eBook Packages: Computer ScienceComputer Science (R0)