Tree-Based Models for Federated Learning Systems

Ong, Yuya Jeremy; Baracaldo, Nathalie; Zhou, Yi

doi:10.1007/978-3-030-96896-0_2

Yuya Jeremy Ong³,
Nathalie Baracaldo³ &
Yi Zhou³

3128 Accesses
2 Citations

Abstract

Many Federated Learning algorithms have been focused on linear models, kernel-based, and neural-network-based models. However, recent interest in tree-based models such as Random Forest and Gradient Boosted Trees such as XGBoost has started to be explored due to their simplicity, robust performance, and interpretability in various applications. In this chapter, we introduce recent innovations, techniques, and implementations specifically for tree-based algorithms. We highlight how these tree-based methods differ from many of the existing FL methods and some of the key advantages they have compared to other Federated Learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.kaggle.com/
2.
For an in-depth look at the various cryptographic techniques, see Chap. 14
3.
https://fate.fedai.org/
4.
https://github.com/mc2-project/secure-xgboost
5.
Note: While 𝜖 is used here as a notation for the histogram approximation error, this term does not collude with the notation used in differential privacy (DP).
6.
https://www.kaggle.com/giovamata/airlinedelaycauses

References

Blomer J, Ganis G (2015) Large-scale merging of histograms using distributed in-memory computing. J Phys Conf Ser 664:092003. IOP Publishing
Google Scholar
Bonawitz K, Ivanov V, Kreuter B, Marcedone A, McMahan HB, Patel S, Ramage D, Segal A, Seth K (2017) Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 1175–1191
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Google Scholar
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system supplementary material
Google Scholar
Cheng K, Fan T, Jin Y, Liu Y, Chen T, Yang Q (2019) SecureBoost: a lossless federated learning framework. arXiv preprint arXiv:1901.08755
Google Scholar
Dang Z, Gu B, Huang H (2020) Large-scale kernel method for vertical federated learning. In: Federated learning. Springer, Cham, pp 66–80
Chapter Google Scholar
de Souza LAC, Rebello GAF, Camilo GF, Guimarães LCB, Duarte OCMB (2020) DFedForest: decentralized federated forest. In: 2020 IEEE international conference on blockchain (Blockchain). IEEE, pp 90–97
Google Scholar
Dimitrakopoulos GN, Vrahatis AG, Plagianakos V, Sgarbas K (2018) Pathway analysis using XGBoost classification in biomedical data. In: Proceedings of the 10th Hellenic conference on artificial intelligence, pp 1–6
Google Scholar
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference. Springer, pp 265–284
Google Scholar
Fang W, Chen C, Tan J, Yu C, Lu Y, Wang L, Zhou J, Alex X (2020) A hybrid-domain framework for secure gradient tree boosting. ArXiv, abs/2005.08479
Google Scholar
Feng Z, Xiong H, Song C, Yang S, Zhao B, Wang L, Chen Z, Yang S, Liu L, Huan J (2019) SecureGBM: secure multi-party gradient boosting. In: 2019 IEEE international conference on Big Data (Big Data). IEEE, pp 1312–1321
Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Article MathSciNet Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Article MathSciNet Google Scholar
Giacomelli I, Jha S, Kleiman R, Page D, Yoon K (2019) Privacy-preserving collaborative prediction using random forests. AMIA Summits Transl Sci Proc 2019:248
Google Scholar
Greenwald M, Khanna S (2001) Space-efficient online computation of quantile summaries. ACM SIGMOD Rec 30(2):58–66
Article Google Scholar
Hard A, Rao K, Mathews R, Ramaswamy S, Beaufays F, Augenstein S, Eichner H, Kiddon C, Ramage D (2018) Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604
Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, New York
Book Google Scholar
Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R et al (2019) Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977
Google Scholar
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) LightGBM: a highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, pp 3146–3154
Google Scholar
Keck T (2017) FastBDT: a speed-optimized multivariate classification algorithm for the belle II experiment. Comput Softw Big Sci 1(1):2
Article Google Scholar
Leung C (2020) Towards privacy-preserving collaborative gradient boosted decision tree learning
Google Scholar
Li Q, Wen Z, He B (2020) Practical federated gradient boosting decision trees. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 4642–4649
Article Google Scholar
Li S, Zhang X (2019) Research on orthopedic auxiliary classification and prediction model based on XGBoost algorithm. Neural Comput Appl 32(7):1971–1979
Article Google Scholar
Liu Y, Liu Y, Liu Z, Liang Y, Meng C, Zhang J, Zheng Y (2020) Federated forest. IEEE Trans Big Data
Google Scholar
Liu Y, Ma Z, Liu X, Ma S, Nepal S, Deng R (2019) Boosting privately: privacy-preserving federated extreme boosting for mobile crowdsensing. arXiv preprint arXiv:1907.10218
Google Scholar
Ludwig H, Baracaldo N, Thomas G, Zhou Y, Anwar A, Rajamoni S, Ong Y, Radhakrishnan J, Verma A, Sinn M et al (2020) IBM federated learning: an enterprise framework white paper v0. 1. arXiv preprint arXiv:2007.10987
Google Scholar
Masson C, Rim JE, Lee HK (2019) DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees. arXiv preprint arXiv:1908.10693
Google Scholar
McMahan HB, Moore E, Ramage D, Hampson S et al (2016) Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629
Google Scholar
Nobre J, Neves RF (2019) Combining principal component analysis, discrete wavelet transform and XGBoost to trade in the financial markets. Expert Syst Appl 125:181–194
Article Google Scholar
Ong YJ, Zhou Y, Baracaldo N, Ludwig H (2020) Adaptive histogram-based gradient boosted trees for federated learning. arXiv preprint arXiv:2012.06670
Google Scholar
Pelttari H et al (2020) Federated learning for mortality prediction in intensive care units
Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Google Scholar
Salzberg SL (1993, 1994) C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., San Mateo
Google Scholar
Sjöberg A, Gustavsson E, Koppisetty AC, Jirstrand M (2019) Federated learning of deep neural decision forests. In: International conference on machine learning, optimization, and data science. Springer, pp 700–710
Google Scholar
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowl-Based Syst 10(05):557–570
Article MathSciNet Google Scholar
Tian Z, Zhang R, Hou X, Liu J, Ren K (2020) FederBoost: private federated learning for GBDT. arXiv preprint arXiv:2011.02796
Google Scholar
Truex S, Baracaldo N, Anwar A, Steinke T, Ludwig H, Zhang R (2018) A hybrid approach to privacy-preserving federated learning
Google Scholar
Wang Z, Yang Y, Liu Y, Liu X, Gupta BB, Ma J (2020) Cloud-based federated boosting for mobile crowdsensing. arXiv preprint arXiv:2005.05304
Google Scholar
Wu Y, Cai S, Xiao X, Chen G, Ooi BC (2020) Privacy preserving vertical federated learning for tree-based models. arXiv preprint arXiv:2008.06170
Google Scholar
XingFen W, Xiangbin Y, Yangchun M (2018) Research on user consumption behavior prediction based on improved XGBoost algorithm. In: 2018 IEEE international conference on Big Data (Big Data). IEEE, pp 4169–4175
Google Scholar
Yamamoto F, Wang L, Ozawa S (2020) New approaches to federated XGBoost learning for privacy-preserving data analysis. In: International conference on neural information processing. Springer, pp 558–569
Google Scholar
Yang M, Song L, Xu J, Li C, Tan G (2019) The tradeoff between privacy and accuracy in anomaly detection using federated XGBoost. arXiv preprint arXiv:1907.07157
Google Scholar
Zhang J, Zhao X, Yuan P (2020) Federated security tree algorithm for user privacy protection. J Comput Appl 40(10):2980–2985
Google Scholar
Zhang Q, Wang W (2007) A fast algorithm for approximate quantiles in high speed data streams. In: 19th international conference on scientific and statistical database management (SSDBM 2007). IEEE, p 29
Google Scholar
Xie L, Liu J, Lu S, Chang T-H, Shi Q (2021) An efficient learning framework for federated XGBoost using secret sharing and distributed optimization. arXiv preprint arXiv:2105.05717
Google Scholar
Abay A, Zhou Y, Baracaldo N, Rajamoni S, Chuba E, Ludwig H (2020) Mitigating Bias in Federated Learning. arXiv preprint arXiv:2012.02447
Google Scholar
Ravichandran S, Khurana D, Venkatesh B, Edakunni NU (2020) FairXGBoost: fairness-aware classification in XGBoost arXiv preprint arXiv:2009.01442
Google Scholar
Chai Z, Ali A, Zawad S, Truex S, Anwar A, Baracaldo N, Zhou Y, Ludwig H, Yan F, Cheng Y (2020) TiFL: a tier-based federated learning system. arXiv preprint arXiv:2001.09249
Google Scholar
Chen X, Zhou S, Yang K, Fan H, Feng Z, Chen Z, Wang H, Wang Y (2021) Fed-EINI: an efficient and interpretable inference framework for decision tree ensembles in federated learning. arXiv preprint arXiv:2105.09540
Google Scholar
Dua D, Graff C. UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine. http://archive.ics.uci.edu/ml
U.S. Department of Transportation (2009) Airline On-Time Statistics and Delay Causes. https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp

Download references

Author information

Authors and Affiliations

IBM Research – Almaden, San Jose, CA, USA
Yuya Jeremy Ong, Nathalie Baracaldo & Yi Zhou

Authors

Yuya Jeremy Ong
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Baracaldo
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuya Jeremy Ong .

Editor information

Editors and Affiliations

IBM Research – Almaden, San Jose, CA, USA
Heiko Ludwig
IBM Research -- Almaden, San Jose, CA, USA
Nathalie Baracaldo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ong, Y.J., Baracaldo, N., Zhou, Y. (2022). Tree-Based Models for Federated Learning Systems. In: Ludwig, H., Baracaldo, N. (eds) Federated Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-96896-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-96896-0_2
Published: 08 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96895-3
Online ISBN: 978-3-030-96896-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics