SecureBoost $$+$$ : Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

Fan, Tao; Chen, Weijing; Ma, Guoqiang; Kang, Yan; Fan, Lixin; Yang, Qiang

doi:10.1007/978-981-97-2259-4_18

Tao Fan^13,14,
Weijing Chen¹⁴,
Guoqiang Ma¹⁴,
Yan Kang¹⁴,
Lixin Fan¹⁴ &
…
Qiang Yang^13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14647))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

766 Accesses

Abstract

Gradient boosting decision tree (GBDT) is an ensemble machine learning algorithm that is widely used in industry. Due to the problem of data isolation and the requirement of privacy, many works try to use vertical federated learning to train machine learning models collaboratively between different data owners. SecureBoost is one of the most popular vertical federated learning algorithms for GBDT. However, to achieve privacy preservation, SecureBoost involves complex training procedures and time-consuming cryptography operations. This causes SecureBoost to be slow to train and does not scale to large-scale data. In this work, we propose SecureBoost+, a large-scale and high-performance vertical federated gradient boosting decision tree framework. SecureBoost+ is secure in the semi-honest model, which is the same as SecureBoost. SecureBoost+ can be scaled up to tens of millions of data samples faster than SecureBoost. SecureBoost+ achieves high performance through several novel optimizations for SecureBoost, including ciphertext operation optimization and the introduction of new training mechanisms. The experimental results show that SecureBoost+ is 6–35x faster than SecureBoost but with the same accuracy and can be scaled up to tens of millions of data samples and thousands of feature dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Gradient Boosting Forest: a Two-Stage Ensemble Method Enabling Federated Learning of GBDTs

New Approaches to Federated XGBoost Learning for Privacy-Preserving Data Analysis

SecureCut: Federated Gradient Boosting Decision Trees with Efficient Machine Unlearning

Notes

1.
https://github.com/FederatedAI/FATE/tree/master/examples/data.

References

Cao, S., Yang, X., Chen, C., Zhou, J., Li, X., Qi, Y.: Titant: online real-time transaction fraud detection in ant financial. arXiv preprint arXiv:1906.07407 (2019)
Chai, D., Wang, L., Chen, K., Yang, Q.: Secure federated matrix factorization. IEEE Intell. Syst. (2020)
Google Scholar
Chen, C., et al.: When homomorphic encryption marries secret sharing: secure large-scale sparse logistic regression and applications in risk control. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2652–2662 (2021)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Google Scholar
Cheng, K., et al.: SecureBoost: a lossless federated learning framework. IEEE Intell. Syst. 36(6), 87–98 (2021)
Article Google Scholar
Dorogush, A.V., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018)
Fu, F., Jiang, J., Shao, Y., Cui, B.: An experimental evaluation of large scale GBDT systems. arXiv preprint arXiv:1907.01882 (2019)
Fu, F., et al.: VF2Boost: very fast vertical federated gradient boosting for cross-enterprise learning. In: Proceedings of the 2021 International Conference on Management of Data, pp. 563–576 (2021)
Google Scholar
Hardy, S., et al.: Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017)
He, Y., et al.: A hybrid self-supervised learning framework for vertical federated learning. arXiv preprint arXiv:2208.08934 (2022)
Kairouz, P., et al.: Advances and open problems in federated learning. Found. Trends® Mach. Learn. 14(1–2), 1–210 (2021)
Google Scholar
Kang, Y., He, Y., Luo, J., Fan, T., Liu, Y., Yang, Q.: Privacy-preserving federated adversarial domain adaptation over feature groups for interpretability. IEEE Trans. Big Data (2022)
Google Scholar
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154 (2017)
Google Scholar
Li, Q., Wen, Z., He, B.: Practical federated gradient boosting decision trees. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4642–4649 (2020)
Google Scholar
Liu, Y., Fan, T., Chen, T., Xu, Q., Yang, Q.: Fate: an industrial grade platform for collaborative learning with data protection. J. Mach. Learn. Res. 22(226), 1–6 (2021). http://jmlr.org/papers/v22/20-815.html
Liu, Y., Kang, Y., Xing, C., Chen, T., Yang, Q.: A secure federated transfer learning framework. IEEE Intell. Syst. 35(4), 70–82 (2020)
Article Google Scholar
Liu, Y., et al.: Vertical federated learning: concepts, advances and challenges. arXiv preprint arXiv:2211.12814 (2022)
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)
Google Scholar
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48910-X_16
Chapter Google Scholar
Shahbazi, Z., Byun, Y.C.: Product recommendation based on content-based filtering using XGBoost classifier. Int. J. Adv. Sci. Technol 29, 6979–6988 (2019)
Google Scholar
Wang, X., He, X., Feng, F., Nie, L., Chua, T.S.: Tem: tree-enhanced embedding model for explainable recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp. 1543–1552 (2018)
Google Scholar
Yang, K., Fan, T., Chen, T., Shi, Y., Yang, Q.: A quasi-newton method based vertical federated learning framework for logistic regression. arXiv preprint arXiv:1912.00513 (2019)
Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–19 (2019)
Article Google Scholar
Zhang, C., Li, S., Xia, J., Wang, W., Yan, F., Liu, Y.: BatchCrypt: efficient homomorphic encryption for cross-silo federated learning. In: 2020 $\{$USENIX$\}$ Annual Technical Conference ($\{$USENIX$\}$$\{$ATC$\}$ 20), pp. 493–506 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, HKUST, Hong Kong, China
Tao Fan & Qiang Yang
WeBank, Shenzhen, China
Tao Fan, Weijing Chen, Guoqiang Ma, Yan Kang, Lixin Fan & Qiang Yang

Authors

Tao Fan
View author publications
You can also search for this author in PubMed Google Scholar
Weijing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guoqiang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yan Kang
View author publications
You can also search for this author in PubMed Google Scholar
Lixin Fan
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Fan .

Editor information

Editors and Affiliations

Academia Sinica, Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fan, T., Chen, W., Ma, G., Kang, Y., Fan, L., Yang, Q. (2024). SecureBoost$+$: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14647. Springer, Singapore. https://doi.org/10.1007/978-981-97-2259-4_18

Download citation

DOI: https://doi.org/10.1007/978-981-97-2259-4_18
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2261-7
Online ISBN: 978-981-97-2259-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SecureBoost\(+\): Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Gradient Boosting Forest: a Two-Stage Ensemble Method Enabling Federated Learning of GBDTs

New Approaches to Federated XGBoost Learning for Privacy-Preserving Data Analysis

SecureCut: Federated Gradient Boosting Decision Trees with Efficient Machine Unlearning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

SecureBoost\(+\): Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Gradient Boosting Forest: a Two-Stage Ensemble Method Enabling Federated Learning of GBDTs

New Approaches to Federated XGBoost Learning for Privacy-Preserving Data Analysis

SecureCut: Federated Gradient Boosting Decision Trees with Efficient Machine Unlearning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation