Skip to main content

Tree-Based Models for Federated Learning Systems

  • Chapter
  • First Online:
Federated Learning

Abstract

Many Federated Learning algorithms have been focused on linear models, kernel-based, and neural-network-based models. However, recent interest in tree-based models such as Random Forest and Gradient Boosted Trees such as XGBoost has started to be explored due to their simplicity, robust performance, and interpretability in various applications. In this chapter, we introduce recent innovations, techniques, and implementations specifically for tree-based algorithms. We highlight how these tree-based methods differ from many of the existing FL methods and some of the key advantages they have compared to other Federated Learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.kaggle.com/

  2. 2.

    For an in-depth look at the various cryptographic techniques, see Chap. 14

  3. 3.

    https://fate.fedai.org/

  4. 4.

    https://github.com/mc2-project/secure-xgboost

  5. 5.

    Note: While 𝜖 is used here as a notation for the histogram approximation error, this term does not collude with the notation used in differential privacy (DP).

  6. 6.

    https://www.kaggle.com/giovamata/airlinedelaycauses

References

  1. Blomer J, Ganis G (2015) Large-scale merging of histograms using distributed in-memory computing. J Phys Conf Ser 664:092003. IOP Publishing

    Google Scholar 

  2. Bonawitz K, Ivanov V, Kreuter B, Marcedone A, McMahan HB, Patel S, Ramage D, Segal A, Seth K (2017) Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 1175–1191

    Google Scholar 

  3. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  4. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794

    Google Scholar 

  5. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system supplementary material

    Google Scholar 

  6. Cheng K, Fan T, Jin Y, Liu Y, Chen T, Yang Q (2019) SecureBoost: a lossless federated learning framework. arXiv preprint arXiv:1901.08755

    Google Scholar 

  7. Dang Z, Gu B, Huang H (2020) Large-scale kernel method for vertical federated learning. In: Federated learning. Springer, Cham, pp 66–80

    Chapter  Google Scholar 

  8. de Souza LAC, Rebello GAF, Camilo GF, Guimarães LCB, Duarte OCMB (2020) DFedForest: decentralized federated forest. In: 2020 IEEE international conference on blockchain (Blockchain). IEEE, pp 90–97

    Google Scholar 

  9. Dimitrakopoulos GN, Vrahatis AG, Plagianakos V, Sgarbas K (2018) Pathway analysis using XGBoost classification in biomedical data. In: Proceedings of the 10th Hellenic conference on artificial intelligence, pp 1–6

    Google Scholar 

  10. Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference. Springer, pp 265–284

    Google Scholar 

  11. Fang W, Chen C, Tan J, Yu C, Lu Y, Wang L, Zhou J, Alex X (2020) A hybrid-domain framework for secure gradient tree boosting. ArXiv, abs/2005.08479

    Google Scholar 

  12. Feng Z, Xiong H, Song C, Yang S, Zhao B, Wang L, Chen Z, Yang S, Liu L, Huan J (2019) SecureGBM: secure multi-party gradient boosting. In: 2019 IEEE international conference on Big Data (Big Data). IEEE, pp 1312–1321

    Google Scholar 

  13. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  MathSciNet  Google Scholar 

  14. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    Article  MathSciNet  Google Scholar 

  15. Giacomelli I, Jha S, Kleiman R, Page D, Yoon K (2019) Privacy-preserving collaborative prediction using random forests. AMIA Summits Transl Sci Proc 2019:248

    Google Scholar 

  16. Greenwald M, Khanna S (2001) Space-efficient online computation of quantile summaries. ACM SIGMOD Rec 30(2):58–66

    Article  Google Scholar 

  17. Hard A, Rao K, Mathews R, Ramaswamy S, Beaufays F, Augenstein S, Eichner H, Kiddon C, Ramage D (2018) Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604

    Google Scholar 

  18. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, New York

    Book  Google Scholar 

  19. Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R et al (2019) Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977

    Google Scholar 

  20. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) LightGBM: a highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, pp 3146–3154

    Google Scholar 

  21. Keck T (2017) FastBDT: a speed-optimized multivariate classification algorithm for the belle II experiment. Comput Softw Big Sci 1(1):2

    Article  Google Scholar 

  22. Leung C (2020) Towards privacy-preserving collaborative gradient boosted decision tree learning

    Google Scholar 

  23. Li Q, Wen Z, He B (2020) Practical federated gradient boosting decision trees. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 4642–4649

    Article  Google Scholar 

  24. Li S, Zhang X (2019) Research on orthopedic auxiliary classification and prediction model based on XGBoost algorithm. Neural Comput Appl 32(7):1971–1979

    Article  Google Scholar 

  25. Liu Y, Liu Y, Liu Z, Liang Y, Meng C, Zhang J, Zheng Y (2020) Federated forest. IEEE Trans Big Data

    Google Scholar 

  26. Liu Y, Ma Z, Liu X, Ma S, Nepal S, Deng R (2019) Boosting privately: privacy-preserving federated extreme boosting for mobile crowdsensing. arXiv preprint arXiv:1907.10218

    Google Scholar 

  27. Ludwig H, Baracaldo N, Thomas G, Zhou Y, Anwar A, Rajamoni S, Ong Y, Radhakrishnan J, Verma A, Sinn M et al (2020) IBM federated learning: an enterprise framework white paper v0. 1. arXiv preprint arXiv:2007.10987

    Google Scholar 

  28. Masson C, Rim JE, Lee HK (2019) DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees. arXiv preprint arXiv:1908.10693

    Google Scholar 

  29. McMahan HB, Moore E, Ramage D, Hampson S et al (2016) Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629

    Google Scholar 

  30. Nobre J, Neves RF (2019) Combining principal component analysis, discrete wavelet transform and XGBoost to trade in the financial markets. Expert Syst Appl 125:181–194

    Article  Google Scholar 

  31. Ong YJ, Zhou Y, Baracaldo N, Ludwig H (2020) Adaptive histogram-based gradient boosted trees for federated learning. arXiv preprint arXiv:2012.06670

    Google Scholar 

  32. Pelttari H et al (2020) Federated learning for mortality prediction in intensive care units

    Google Scholar 

  33. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  34. Salzberg SL (1993, 1994) C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., San Mateo

    Google Scholar 

  35. Sjöberg A, Gustavsson E, Koppisetty AC, Jirstrand M (2019) Federated learning of deep neural decision forests. In: International conference on machine learning, optimization, and data science. Springer, pp 700–710

    Google Scholar 

  36. Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowl-Based Syst 10(05):557–570

    Article  MathSciNet  Google Scholar 

  37. Tian Z, Zhang R, Hou X, Liu J, Ren K (2020) FederBoost: private federated learning for GBDT. arXiv preprint arXiv:2011.02796

    Google Scholar 

  38. Truex S, Baracaldo N, Anwar A, Steinke T, Ludwig H, Zhang R (2018) A hybrid approach to privacy-preserving federated learning

    Google Scholar 

  39. Wang Z, Yang Y, Liu Y, Liu X, Gupta BB, Ma J (2020) Cloud-based federated boosting for mobile crowdsensing. arXiv preprint arXiv:2005.05304

    Google Scholar 

  40. Wu Y, Cai S, Xiao X, Chen G, Ooi BC (2020) Privacy preserving vertical federated learning for tree-based models. arXiv preprint arXiv:2008.06170

    Google Scholar 

  41. XingFen W, Xiangbin Y, Yangchun M (2018) Research on user consumption behavior prediction based on improved XGBoost algorithm. In: 2018 IEEE international conference on Big Data (Big Data). IEEE, pp 4169–4175

    Google Scholar 

  42. Yamamoto F, Wang L, Ozawa S (2020) New approaches to federated XGBoost learning for privacy-preserving data analysis. In: International conference on neural information processing. Springer, pp 558–569

    Google Scholar 

  43. Yang M, Song L, Xu J, Li C, Tan G (2019) The tradeoff between privacy and accuracy in anomaly detection using federated XGBoost. arXiv preprint arXiv:1907.07157

    Google Scholar 

  44. Zhang J, Zhao X, Yuan P (2020) Federated security tree algorithm for user privacy protection. J Comput Appl 40(10):2980–2985

    Google Scholar 

  45. Zhang Q, Wang W (2007) A fast algorithm for approximate quantiles in high speed data streams. In: 19th international conference on scientific and statistical database management (SSDBM 2007). IEEE, p 29

    Google Scholar 

  46. Xie L, Liu J, Lu S, Chang T-H, Shi Q (2021) An efficient learning framework for federated XGBoost using secret sharing and distributed optimization. arXiv preprint arXiv:2105.05717

    Google Scholar 

  47. Abay A, Zhou Y, Baracaldo N, Rajamoni S, Chuba E, Ludwig H (2020) Mitigating Bias in Federated Learning. arXiv preprint arXiv:2012.02447

    Google Scholar 

  48. Ravichandran S, Khurana D, Venkatesh B, Edakunni NU (2020) FairXGBoost: fairness-aware classification in XGBoost arXiv preprint arXiv:2009.01442

    Google Scholar 

  49. Chai Z, Ali A, Zawad S, Truex S, Anwar A, Baracaldo N, Zhou Y, Ludwig H, Yan F, Cheng Y (2020) TiFL: a tier-based federated learning system. arXiv preprint arXiv:2001.09249

    Google Scholar 

  50. Chen X, Zhou S, Yang K, Fan H, Feng Z, Chen Z, Wang H, Wang Y (2021) Fed-EINI: an efficient and interpretable inference framework for decision tree ensembles in federated learning. arXiv preprint arXiv:2105.09540

    Google Scholar 

  51. Dua D, Graff C. UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine. http://archive.ics.uci.edu/ml

  52. U.S. Department of Transportation (2009) Airline On-Time Statistics and Delay Causes. https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuya Jeremy Ong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ong, Y.J., Baracaldo, N., Zhou, Y. (2022). Tree-Based Models for Federated Learning Systems. In: Ludwig, H., Baracaldo, N. (eds) Federated Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-96896-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96896-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96895-3

  • Online ISBN: 978-3-030-96896-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics