Abstract
We propose a privacy-preserving framework using Mondrian k-anonymity with decision trees in a Federated Learning (FL) setting for the horizontally partitioned data. Data heterogeneity in FL makes the data non-IID (Non-Independent and Identically Distributed). We use a novel approach to create non-IID partitions of data by solving an optimization problem. In this work, each device trains a decision tree classifier. Devices share the root node of their trees with the aggregator. The aggregator merges the trees by choosing the most common split attribute and grows the branches based on the split values of the chosen split attribute. This recursive process stops when all the nodes to be merged are leaf nodes. After the merging operation, the aggregator sends the merged decision tree to the distributed devices. Therefore, we aim to build a joint machine learning model based on the data from multiple devices while offering k-anonymity to the participants.
Keywords
This study was partially funded by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Dode, A.: The challenges of implementing general data protection law (GDPR). In: 14th International Conference “Standardization, Protypes and Quality: A Means of Balkan Countries’collaboration”, p. 65 (2018)
Anonymisation and GDPR compliance; an overview - GDPR Summary (2021). https://www.gdprsummary.com/anonymisation-and-gdpr/. Accessed 27 June 2021
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR, April 2017
Dua, D., Graff, C.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine (2019). http://archive.ics.uci.edu/ml
Peng, W., Chen, J., Zhou, H.: An implementation of ID3-decision tree learning algorithm. arch.usyd.edu.au/wpeng/DecisionTree2.pdf. Accessed 13 May 2009
Steinberg, D., Colla, P.: CART: classification and regression trees. Top Ten Algorithms Data Min. 9, 179 (2009)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: 22nd International Conference on Data Engineering (ICDE 2006), p. 25. IEEE, April 2006
Slijepčević, D., Henzl, M., Klausner, L.D., Dam, T., Kieseberg, P., Zeppelzauer, M.: k-anonymity in practice: how generalisation and suppression affect machine learning classifiers. arXiv preprint arXiv:2102.04763 (2021)
Buratović, I., Miličević, M., Žubrinić, K.: Effects of data anonymisation on the data mining results. In: 2012 Proceedings of the 35th International Convention MIPRO, pp. 1619–1623. IEEE, May 2012
Fan, C., Li, P.: Classification acceleration via merging decision trees. In: Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference, pp. 13–22 (2020)
Hall, L., Chawla, N., Bowyer, K.: Combining decision trees learned in parallel. In: Working Notes of the KDD-97 Workshop on Distributed Data Mining, pp. 10–15 (1998)
Bursteinas, B., Long, J.: Merging distributed classifiers. In: Proceedings of 5th World Multi-conference on Systemics, Cybernetics and Informatics (2001)
Andrzejak, A., Langner, F., Zabala, S.: Interpretable models from distributed data via merging of decision trees. In: Proceedings of 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) (2013)
Strecht, P., Mendes-Moreira, J., Soares, C.: Merging decision trees: a case study in predicting student performance. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 535–548. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14717-8_42
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression (1998)
Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(05), 571–588 (2002)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-84858-7
Torra, V.: A systematic construction of non-I.I.D. data sets from a single dataset, manuscript (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Kwatra, S., Torra, V. (2022). A k-Anonymised Federated Learning Framework with Decision Trees. In: Garcia-Alfaro, J., Muñoz-Tapia, J.L., Navarro-Arribas, G., Soriano, M. (eds) Data Privacy Management, Cryptocurrencies and Blockchain Technology. DPM CBT 2021 2021. Lecture Notes in Computer Science(), vol 13140. Springer, Cham. https://doi.org/10.1007/978-3-030-93944-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-93944-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93943-4
Online ISBN: 978-3-030-93944-1
eBook Packages: Computer ScienceComputer Science (R0)