Abstract
Many Federated Learning algorithms have been focused on linear models, kernel-based, and neural-network-based models. However, recent interest in tree-based models such as Random Forest and Gradient Boosted Trees such as XGBoost has started to be explored due to their simplicity, robust performance, and interpretability in various applications. In this chapter, we introduce recent innovations, techniques, and implementations specifically for tree-based algorithms. We highlight how these tree-based methods differ from many of the existing FL methods and some of the key advantages they have compared to other Federated Learning algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
For an in-depth look at the various cryptographic techniques, see Chap. 14
- 3.
- 4.
- 5.
Note: While 𝜖 is used here as a notation for the histogram approximation error, this term does not collude with the notation used in differential privacy (DP).
- 6.
References
Blomer J, Ganis G (2015) Large-scale merging of histograms using distributed in-memory computing. J Phys Conf Ser 664:092003. IOP Publishing
Bonawitz K, Ivanov V, Kreuter B, Marcedone A, McMahan HB, Patel S, Ramage D, Segal A, Seth K (2017) Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 1175–1191
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system supplementary material
Cheng K, Fan T, Jin Y, Liu Y, Chen T, Yang Q (2019) SecureBoost: a lossless federated learning framework. arXiv preprint arXiv:1901.08755
Dang Z, Gu B, Huang H (2020) Large-scale kernel method for vertical federated learning. In: Federated learning. Springer, Cham, pp 66–80
de Souza LAC, Rebello GAF, Camilo GF, Guimarães LCB, Duarte OCMB (2020) DFedForest: decentralized federated forest. In: 2020 IEEE international conference on blockchain (Blockchain). IEEE, pp 90–97
Dimitrakopoulos GN, Vrahatis AG, Plagianakos V, Sgarbas K (2018) Pathway analysis using XGBoost classification in biomedical data. In: Proceedings of the 10th Hellenic conference on artificial intelligence, pp 1–6
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference. Springer, pp 265–284
Fang W, Chen C, Tan J, Yu C, Lu Y, Wang L, Zhou J, Alex X (2020) A hybrid-domain framework for secure gradient tree boosting. ArXiv, abs/2005.08479
Feng Z, Xiong H, Song C, Yang S, Zhao B, Wang L, Chen Z, Yang S, Liu L, Huan J (2019) SecureGBM: secure multi-party gradient boosting. In: 2019 IEEE international conference on Big Data (Big Data). IEEE, pp 1312–1321
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Giacomelli I, Jha S, Kleiman R, Page D, Yoon K (2019) Privacy-preserving collaborative prediction using random forests. AMIA Summits Transl Sci Proc 2019:248
Greenwald M, Khanna S (2001) Space-efficient online computation of quantile summaries. ACM SIGMOD Rec 30(2):58–66
Hard A, Rao K, Mathews R, Ramaswamy S, Beaufays F, Augenstein S, Eichner H, Kiddon C, Ramage D (2018) Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, New York
Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R et al (2019) Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) LightGBM: a highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, pp 3146–3154
Keck T (2017) FastBDT: a speed-optimized multivariate classification algorithm for the belle II experiment. Comput Softw Big Sci 1(1):2
Leung C (2020) Towards privacy-preserving collaborative gradient boosted decision tree learning
Li Q, Wen Z, He B (2020) Practical federated gradient boosting decision trees. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 4642–4649
Li S, Zhang X (2019) Research on orthopedic auxiliary classification and prediction model based on XGBoost algorithm. Neural Comput Appl 32(7):1971–1979
Liu Y, Liu Y, Liu Z, Liang Y, Meng C, Zhang J, Zheng Y (2020) Federated forest. IEEE Trans Big Data
Liu Y, Ma Z, Liu X, Ma S, Nepal S, Deng R (2019) Boosting privately: privacy-preserving federated extreme boosting for mobile crowdsensing. arXiv preprint arXiv:1907.10218
Ludwig H, Baracaldo N, Thomas G, Zhou Y, Anwar A, Rajamoni S, Ong Y, Radhakrishnan J, Verma A, Sinn M et al (2020) IBM federated learning: an enterprise framework white paper v0. 1. arXiv preprint arXiv:2007.10987
Masson C, Rim JE, Lee HK (2019) DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees. arXiv preprint arXiv:1908.10693
McMahan HB, Moore E, Ramage D, Hampson S et al (2016) Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629
Nobre J, Neves RF (2019) Combining principal component analysis, discrete wavelet transform and XGBoost to trade in the financial markets. Expert Syst Appl 125:181–194
Ong YJ, Zhou Y, Baracaldo N, Ludwig H (2020) Adaptive histogram-based gradient boosted trees for federated learning. arXiv preprint arXiv:2012.06670
Pelttari H et al (2020) Federated learning for mortality prediction in intensive care units
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Salzberg SL (1993, 1994) C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., San Mateo
Sjöberg A, Gustavsson E, Koppisetty AC, Jirstrand M (2019) Federated learning of deep neural decision forests. In: International conference on machine learning, optimization, and data science. Springer, pp 700–710
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowl-Based Syst 10(05):557–570
Tian Z, Zhang R, Hou X, Liu J, Ren K (2020) FederBoost: private federated learning for GBDT. arXiv preprint arXiv:2011.02796
Truex S, Baracaldo N, Anwar A, Steinke T, Ludwig H, Zhang R (2018) A hybrid approach to privacy-preserving federated learning
Wang Z, Yang Y, Liu Y, Liu X, Gupta BB, Ma J (2020) Cloud-based federated boosting for mobile crowdsensing. arXiv preprint arXiv:2005.05304
Wu Y, Cai S, Xiao X, Chen G, Ooi BC (2020) Privacy preserving vertical federated learning for tree-based models. arXiv preprint arXiv:2008.06170
XingFen W, Xiangbin Y, Yangchun M (2018) Research on user consumption behavior prediction based on improved XGBoost algorithm. In: 2018 IEEE international conference on Big Data (Big Data). IEEE, pp 4169–4175
Yamamoto F, Wang L, Ozawa S (2020) New approaches to federated XGBoost learning for privacy-preserving data analysis. In: International conference on neural information processing. Springer, pp 558–569
Yang M, Song L, Xu J, Li C, Tan G (2019) The tradeoff between privacy and accuracy in anomaly detection using federated XGBoost. arXiv preprint arXiv:1907.07157
Zhang J, Zhao X, Yuan P (2020) Federated security tree algorithm for user privacy protection. J Comput Appl 40(10):2980–2985
Zhang Q, Wang W (2007) A fast algorithm for approximate quantiles in high speed data streams. In: 19th international conference on scientific and statistical database management (SSDBM 2007). IEEE, p 29
Xie L, Liu J, Lu S, Chang T-H, Shi Q (2021) An efficient learning framework for federated XGBoost using secret sharing and distributed optimization. arXiv preprint arXiv:2105.05717
Abay A, Zhou Y, Baracaldo N, Rajamoni S, Chuba E, Ludwig H (2020) Mitigating Bias in Federated Learning. arXiv preprint arXiv:2012.02447
Ravichandran S, Khurana D, Venkatesh B, Edakunni NU (2020) FairXGBoost: fairness-aware classification in XGBoost arXiv preprint arXiv:2009.01442
Chai Z, Ali A, Zawad S, Truex S, Anwar A, Baracaldo N, Zhou Y, Ludwig H, Yan F, Cheng Y (2020) TiFL: a tier-based federated learning system. arXiv preprint arXiv:2001.09249
Chen X, Zhou S, Yang K, Fan H, Feng Z, Chen Z, Wang H, Wang Y (2021) Fed-EINI: an efficient and interpretable inference framework for decision tree ensembles in federated learning. arXiv preprint arXiv:2105.09540
Dua D, Graff C. UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine. http://archive.ics.uci.edu/ml
U.S. Department of Transportation (2009) Airline On-Time Statistics and Delay Causes. https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Ong, Y.J., Baracaldo, N., Zhou, Y. (2022). Tree-Based Models for Federated Learning Systems. In: Ludwig, H., Baracaldo, N. (eds) Federated Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-96896-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-96896-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96895-3
Online ISBN: 978-3-030-96896-0
eBook Packages: Computer ScienceComputer Science (R0)