research-article

Cross-silo federated learning based decision trees

Authors:
Saikishore Kalloori

ETH Zürich, Switzerland

ETH Zürich, Switzerland
View Profile

,
Severin Klingler

ETH Zürich, Switzerland

ETH Zürich, Switzerland
View Profile

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied ComputingApril 2022Pages 1117–1124https://doi.org/10.1145/3477314.3507149

Published:06 May 2022Publication History

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

Pages 1117–1124

ABSTRACT

Most research and application in the field of Machine Learning focus on training a model for a particular task such as churn prediction by using training data present on one machine or in a data center. Nowadays, in many organizations and industries, the training data exists in different (isolated) locations. In order to protect data privacy and security, it is not feasible to gather all the training data to one location and perform a centralized training of machine learning models. Federated Learning (FL) is a form of machine learning technique where the goal is to learn a high-quality model trained across multiple clients (such as mobile devices) or data centers without ever exchanging their training data. Most of the existing research on FL focuses on two directions: (a) training parametric models such as neural networks and (b) mainly focusing on an FL setup containing millions of clients. However, in this work, we focus on non-parametric models such as decision trees, and more specifically, we build decision trees using federated learning and train random forest model. Our work aims at involving corporate companies instead of mobile devices in the federated learning process. We consider a setting where a small number of organizations or industry companies collaboratively build machine learning models without exchanging their privately held large data sets. We designed a federated decision tree-based random forest algorithm using FL and conducted our experiments using different datasets. Our results demonstrate that each participating corporate company have benefit in improving their model's performance from federated learning. We also introduce how to incorporate differential privacy into our decision tree-based random forest algorithm.

References

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785--794.Google ScholarDigital Library
Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, and Qiang Yang. 2019. Secureboost: A lossless federated learning framework. arXiv preprint arXiv:1901.08755 (2019).Google Scholar
Lucas Airam C de Souza, Gabriel Antonio F Rebello, Gustavo F Camilo, Lucas CB Guimarães, and Otto Carlos MB Duarte. 2020. DFedForest: Decentralized Federated Forest. In 2020 IEEE International Conference on Blockchain (Blockchain). IEEE, 90--97.Google Scholar
Cynthia Dwork. 2006. Differential privacy, in automata, languages and programming. ser. Lecture Notes in Computer Scienc 4052 (2006), 112.Google Scholar
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer, 265--284.Google Scholar
Yoav Freund, Robert E Schapire, et al. 1996. Experiments with a new boosting algorithm. In icml, Vol. 96. Citeseer, 148--156.Google Scholar
Craig Gentry and Dan Boneh. 2009. A fully homomorphic encryption scheme. Vol. 20. Stanford university Stanford.Google Scholar
Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. 2012. Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41, 6 (2012), 1673--1693.Google ScholarDigital Library
Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604 (2018).Google Scholar
Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne. 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017).Google Scholar
Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2019. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977 (2019).Google Scholar
Saikishore Kalloori and Severin Klingler. 2021. Horizontal Cross-Silo Federated Recommender Systems. In Fifteenth ACM Conference on Recommender Systems. 680--684.Google Scholar
Jakub Konečnỳ, H Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016).Google Scholar
Jakub Konečnỳ, H Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016).Google Scholar
Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).Google Scholar
Yang Liu, Yingting Liu, Zhijie Liu, Yuxuan Liang, Chuishi Meng, Junbo Zhang, and Yu Zheng. 2020. Federated forest. IEEE Transactions on Big Data (2020).Google ScholarCross Ref
Yang Liu, Zhuo Ma, Ximeng Liu, Siqi Ma, Surya Nepal, and Robert Deng. 2019. Boosting privately: Privacy-preserving federated extreme boosting for mobile crowdsensing. arXiv preprint arXiv:1907.10218 (2019).Google Scholar
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics. 1273--1282.Google Scholar
H Brendan McMahan, Eider Moore, Daniel Ramage, and Blaise Aguera y Arcas. 2016. Federated learning of deep networks using model averaging. (2016).Google Scholar
H Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2017. Learning differentially private recurrent language models. arXiv preprint arXiv:1710.06963 (2017).Google Scholar
Karl Pearson. 1895. X. Contributions to the mathematical theory of evolution.---II. Skew variation in homogeneous material. Philosophical Transactions of the Royal Society of London.(A.) 186 (1895), 343--414.Google ScholarCross Ref
Charles S Roehrig. 1988. Conditions for identification in nonparametric and parametric models. Econometrica: Journal of the Econometric Society (1988), 433--447.Google Scholar
Christian Schneebeli, Saikishore Kalloori, and Severin Klingler. 2021. A Practical Federated Learning Framework for Small Number of Stakeholders. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 910--913.Google ScholarDigital Library
Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet S Talwalkar. 2017. Federated multi-task learning. In Advances in Neural Information Processing Systems. 4424--4434.Google Scholar
Shuang Song, Kamalika Chaudhuri, and Anand D Sarwate. 2013. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing. IEEE, 245--248.Google ScholarCross Ref
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1--19.Google ScholarDigital Library

Index Terms

Cross-silo federated learning based decision trees
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Classification and regression trees
2. Security and privacy
  1. Security services
    1. Privacy-preserving protocols

Recommendations

Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data

In this paper we describe a machine learning approach to word sense disambiguation that uses unlabeled data. Our method is based on selective sampling with committees of decision trees. The committee members are trained on a small set of labeled ...
Read More
Towards federated unsupervised representation learning
EdgeSys '20: Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking

Making deep learning models efficient at inferring nowadays requires training with an extensive number of labeled data that are gathered in a centralized system. However, gathering labeled data is an expensive and time-consuming process, centralized ...
Read More
Cross-silo heterogeneous model federated multitask learning
Abstract
Federated learning (FL) is a machine learning technique that enables participants to collaboratively train high-quality models without exchanging their private data. Participants utilizing cross-silo FL (CS-FL) settings are independent ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
April 2022
2099 pages
ISBN:9781450387132
DOI:10.1145/3477314
Conference Chairs:
Jiman Hong
Soongsil University
,
Miroslav Bures
Czech Technical University, Czechia
,
Program Chairs:
Juw Won Park
University of Louisville
,
Tomas Cerny
Baylor University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 May 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
decision tree
federated learning
random forest
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 353
  Total Downloads
- Downloads (Last 12 months)93
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cross-silo federated learning based decision trees

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data

Towards federated unsupervised representation learning

Cross-silo heterogeneous model federated multitask learning