ABSTRACT
Most research and application in the field of Machine Learning focus on training a model for a particular task such as churn prediction by using training data present on one machine or in a data center. Nowadays, in many organizations and industries, the training data exists in different (isolated) locations. In order to protect data privacy and security, it is not feasible to gather all the training data to one location and perform a centralized training of machine learning models. Federated Learning (FL) is a form of machine learning technique where the goal is to learn a high-quality model trained across multiple clients (such as mobile devices) or data centers without ever exchanging their training data. Most of the existing research on FL focuses on two directions: (a) training parametric models such as neural networks and (b) mainly focusing on an FL setup containing millions of clients. However, in this work, we focus on non-parametric models such as decision trees, and more specifically, we build decision trees using federated learning and train random forest model. Our work aims at involving corporate companies instead of mobile devices in the federated learning process. We consider a setting where a small number of organizations or industry companies collaboratively build machine learning models without exchanging their privately held large data sets. We designed a federated decision tree-based random forest algorithm using FL and conducted our experiments using different datasets. Our results demonstrate that each participating corporate company have benefit in improving their model's performance from federated learning. We also introduce how to incorporate differential privacy into our decision tree-based random forest algorithm.
- Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785--794.Google ScholarDigital Library
- Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, and Qiang Yang. 2019. Secureboost: A lossless federated learning framework. arXiv preprint arXiv:1901.08755 (2019).Google Scholar
- Lucas Airam C de Souza, Gabriel Antonio F Rebello, Gustavo F Camilo, Lucas CB Guimarães, and Otto Carlos MB Duarte. 2020. DFedForest: Decentralized Federated Forest. In 2020 IEEE International Conference on Blockchain (Blockchain). IEEE, 90--97.Google Scholar
- Cynthia Dwork. 2006. Differential privacy, in automata, languages and programming. ser. Lecture Notes in Computer Scienc 4052 (2006), 112.Google Scholar
- Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer, 265--284.Google Scholar
- Yoav Freund, Robert E Schapire, et al. 1996. Experiments with a new boosting algorithm. In icml, Vol. 96. Citeseer, 148--156.Google Scholar
- Craig Gentry and Dan Boneh. 2009. A fully homomorphic encryption scheme. Vol. 20. Stanford university Stanford.Google Scholar
- Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. 2012. Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41, 6 (2012), 1673--1693.Google ScholarDigital Library
- Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604 (2018).Google Scholar
- Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne. 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017).Google Scholar
- Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2019. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977 (2019).Google Scholar
- Saikishore Kalloori and Severin Klingler. 2021. Horizontal Cross-Silo Federated Recommender Systems. In Fifteenth ACM Conference on Recommender Systems. 680--684.Google Scholar
- Jakub Konečnỳ, H Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016).Google Scholar
- Jakub Konečnỳ, H Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016).Google Scholar
- Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).Google Scholar
- Yang Liu, Yingting Liu, Zhijie Liu, Yuxuan Liang, Chuishi Meng, Junbo Zhang, and Yu Zheng. 2020. Federated forest. IEEE Transactions on Big Data (2020).Google ScholarCross Ref
- Yang Liu, Zhuo Ma, Ximeng Liu, Siqi Ma, Surya Nepal, and Robert Deng. 2019. Boosting privately: Privacy-preserving federated extreme boosting for mobile crowdsensing. arXiv preprint arXiv:1907.10218 (2019).Google Scholar
- Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics. 1273--1282.Google Scholar
- H Brendan McMahan, Eider Moore, Daniel Ramage, and Blaise Aguera y Arcas. 2016. Federated learning of deep networks using model averaging. (2016).Google Scholar
- H Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2017. Learning differentially private recurrent language models. arXiv preprint arXiv:1710.06963 (2017).Google Scholar
- Karl Pearson. 1895. X. Contributions to the mathematical theory of evolution.---II. Skew variation in homogeneous material. Philosophical Transactions of the Royal Society of London.(A.) 186 (1895), 343--414.Google ScholarCross Ref
- Charles S Roehrig. 1988. Conditions for identification in nonparametric and parametric models. Econometrica: Journal of the Econometric Society (1988), 433--447.Google Scholar
- Christian Schneebeli, Saikishore Kalloori, and Severin Klingler. 2021. A Practical Federated Learning Framework for Small Number of Stakeholders. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 910--913.Google ScholarDigital Library
- Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet S Talwalkar. 2017. Federated multi-task learning. In Advances in Neural Information Processing Systems. 4424--4434.Google Scholar
- Shuang Song, Kamalika Chaudhuri, and Anand D Sarwate. 2013. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing. IEEE, 245--248.Google ScholarCross Ref
- Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1--19.Google ScholarDigital Library
Index Terms
- Cross-silo federated learning based decision trees
Recommendations
Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data
In this paper we describe a machine learning approach to word sense disambiguation that uses unlabeled data. Our method is based on selective sampling with committees of decision trees. The committee members are trained on a small set of labeled ...
Towards federated unsupervised representation learning
EdgeSys '20: Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and NetworkingMaking deep learning models efficient at inferring nowadays requires training with an extensive number of labeled data that are gathered in a centralized system. However, gathering labeled data is an expensive and time-consuming process, centralized ...
Cross-silo heterogeneous model federated multitask learning
AbstractFederated learning (FL) is a machine learning technique that enables participants to collaboratively train high-quality models without exchanging their private data. Participants utilizing cross-silo FL (CS-FL) settings are independent ...
Comments