Skip to main content

SecureBoost\(+\): Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14647))

Included in the following conference series:

  • 766 Accesses

Abstract

Gradient boosting decision tree (GBDT) is an ensemble machine learning algorithm that is widely used in industry. Due to the problem of data isolation and the requirement of privacy, many works try to use vertical federated learning to train machine learning models collaboratively between different data owners. SecureBoost is one of the most popular vertical federated learning algorithms for GBDT. However, to achieve privacy preservation, SecureBoost involves complex training procedures and time-consuming cryptography operations. This causes SecureBoost to be slow to train and does not scale to large-scale data. In this work, we propose SecureBoost+, a large-scale and high-performance vertical federated gradient boosting decision tree framework. SecureBoost+ is secure in the semi-honest model, which is the same as SecureBoost. SecureBoost+ can be scaled up to tens of millions of data samples faster than SecureBoost. SecureBoost+ achieves high performance through several novel optimizations for SecureBoost, including ciphertext operation optimization and the introduction of new training mechanisms. The experimental results show that SecureBoost+ is 6–35x faster than SecureBoost but with the same accuracy and can be scaled up to tens of millions of data samples and thousands of feature dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/FederatedAI/FATE/tree/master/examples/data.

References

  1. Cao, S., Yang, X., Chen, C., Zhou, J., Li, X., Qi, Y.: Titant: online real-time transaction fraud detection in ant financial. arXiv preprint arXiv:1906.07407 (2019)

  2. Chai, D., Wang, L., Chen, K., Yang, Q.: Secure federated matrix factorization. IEEE Intell. Syst. (2020)

    Google Scholar 

  3. Chen, C., et al.: When homomorphic encryption marries secret sharing: secure large-scale sparse logistic regression and applications in risk control. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2652–2662 (2021)

    Google Scholar 

  4. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

    Google Scholar 

  5. Cheng, K., et al.: SecureBoost: a lossless federated learning framework. IEEE Intell. Syst. 36(6), 87–98 (2021)

    Article  Google Scholar 

  6. Dorogush, A.V., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018)

  7. Fu, F., Jiang, J., Shao, Y., Cui, B.: An experimental evaluation of large scale GBDT systems. arXiv preprint arXiv:1907.01882 (2019)

  8. Fu, F., et al.: VF2Boost: very fast vertical federated gradient boosting for cross-enterprise learning. In: Proceedings of the 2021 International Conference on Management of Data, pp. 563–576 (2021)

    Google Scholar 

  9. Hardy, S., et al.: Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017)

  10. He, Y., et al.: A hybrid self-supervised learning framework for vertical federated learning. arXiv preprint arXiv:2208.08934 (2022)

  11. Kairouz, P., et al.: Advances and open problems in federated learning. Found. Trends® Mach. Learn. 14(1–2), 1–210 (2021)

    Google Scholar 

  12. Kang, Y., He, Y., Luo, J., Fan, T., Liu, Y., Yang, Q.: Privacy-preserving federated adversarial domain adaptation over feature groups for interpretability. IEEE Trans. Big Data (2022)

    Google Scholar 

  13. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154 (2017)

    Google Scholar 

  14. Li, Q., Wen, Z., He, B.: Practical federated gradient boosting decision trees. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4642–4649 (2020)

    Google Scholar 

  15. Liu, Y., Fan, T., Chen, T., Xu, Q., Yang, Q.: Fate: an industrial grade platform for collaborative learning with data protection. J. Mach. Learn. Res. 22(226), 1–6 (2021). http://jmlr.org/papers/v22/20-815.html

  16. Liu, Y., Kang, Y., Xing, C., Chen, T., Yang, Q.: A secure federated transfer learning framework. IEEE Intell. Syst. 35(4), 70–82 (2020)

    Article  Google Scholar 

  17. Liu, Y., et al.: Vertical federated learning: concepts, advances and challenges. arXiv preprint arXiv:2211.12814 (2022)

  18. McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)

    Google Scholar 

  19. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48910-X_16

    Chapter  Google Scholar 

  20. Shahbazi, Z., Byun, Y.C.: Product recommendation based on content-based filtering using XGBoost classifier. Int. J. Adv. Sci. Technol 29, 6979–6988 (2019)

    Google Scholar 

  21. Wang, X., He, X., Feng, F., Nie, L., Chua, T.S.: Tem: tree-enhanced embedding model for explainable recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp. 1543–1552 (2018)

    Google Scholar 

  22. Yang, K., Fan, T., Chen, T., Shi, Y., Yang, Q.: A quasi-newton method based vertical federated learning framework for logistic regression. arXiv preprint arXiv:1912.00513 (2019)

  23. Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–19 (2019)

    Article  Google Scholar 

  24. Zhang, C., Li, S., Xia, J., Wang, W., Yan, F., Liu, Y.: BatchCrypt: efficient homomorphic encryption for cross-silo federated learning. In: 2020 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 20), pp. 493–506 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fan, T., Chen, W., Ma, G., Kang, Y., Fan, L., Yang, Q. (2024). SecureBoost\(+\): Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14647. Springer, Singapore. https://doi.org/10.1007/978-981-97-2259-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2259-4_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2261-7

  • Online ISBN: 978-981-97-2259-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics