Abstract
Gradient boosting decision tree (GBDT) is an ensemble machine learning algorithm that is widely used in industry. Due to the problem of data isolation and the requirement of privacy, many works try to use vertical federated learning to train machine learning models collaboratively between different data owners. SecureBoost is one of the most popular vertical federated learning algorithms for GBDT. However, to achieve privacy preservation, SecureBoost involves complex training procedures and time-consuming cryptography operations. This causes SecureBoost to be slow to train and does not scale to large-scale data. In this work, we propose SecureBoost+, a large-scale and high-performance vertical federated gradient boosting decision tree framework. SecureBoost+ is secure in the semi-honest model, which is the same as SecureBoost. SecureBoost+ can be scaled up to tens of millions of data samples faster than SecureBoost. SecureBoost+ achieves high performance through several novel optimizations for SecureBoost, including ciphertext operation optimization and the introduction of new training mechanisms. The experimental results show that SecureBoost+ is 6–35x faster than SecureBoost but with the same accuracy and can be scaled up to tens of millions of data samples and thousands of feature dimensions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cao, S., Yang, X., Chen, C., Zhou, J., Li, X., Qi, Y.: Titant: online real-time transaction fraud detection in ant financial. arXiv preprint arXiv:1906.07407 (2019)
Chai, D., Wang, L., Chen, K., Yang, Q.: Secure federated matrix factorization. IEEE Intell. Syst. (2020)
Chen, C., et al.: When homomorphic encryption marries secret sharing: secure large-scale sparse logistic regression and applications in risk control. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2652–2662 (2021)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Cheng, K., et al.: SecureBoost: a lossless federated learning framework. IEEE Intell. Syst. 36(6), 87–98 (2021)
Dorogush, A.V., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018)
Fu, F., Jiang, J., Shao, Y., Cui, B.: An experimental evaluation of large scale GBDT systems. arXiv preprint arXiv:1907.01882 (2019)
Fu, F., et al.: VF2Boost: very fast vertical federated gradient boosting for cross-enterprise learning. In: Proceedings of the 2021 International Conference on Management of Data, pp. 563–576 (2021)
Hardy, S., et al.: Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017)
He, Y., et al.: A hybrid self-supervised learning framework for vertical federated learning. arXiv preprint arXiv:2208.08934 (2022)
Kairouz, P., et al.: Advances and open problems in federated learning. Found. Trends® Mach. Learn. 14(1–2), 1–210 (2021)
Kang, Y., He, Y., Luo, J., Fan, T., Liu, Y., Yang, Q.: Privacy-preserving federated adversarial domain adaptation over feature groups for interpretability. IEEE Trans. Big Data (2022)
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154 (2017)
Li, Q., Wen, Z., He, B.: Practical federated gradient boosting decision trees. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4642–4649 (2020)
Liu, Y., Fan, T., Chen, T., Xu, Q., Yang, Q.: Fate: an industrial grade platform for collaborative learning with data protection. J. Mach. Learn. Res. 22(226), 1–6 (2021). http://jmlr.org/papers/v22/20-815.html
Liu, Y., Kang, Y., Xing, C., Chen, T., Yang, Q.: A secure federated transfer learning framework. IEEE Intell. Syst. 35(4), 70–82 (2020)
Liu, Y., et al.: Vertical federated learning: concepts, advances and challenges. arXiv preprint arXiv:2211.12814 (2022)
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48910-X_16
Shahbazi, Z., Byun, Y.C.: Product recommendation based on content-based filtering using XGBoost classifier. Int. J. Adv. Sci. Technol 29, 6979–6988 (2019)
Wang, X., He, X., Feng, F., Nie, L., Chua, T.S.: Tem: tree-enhanced embedding model for explainable recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp. 1543–1552 (2018)
Yang, K., Fan, T., Chen, T., Shi, Y., Yang, Q.: A quasi-newton method based vertical federated learning framework for logistic regression. arXiv preprint arXiv:1912.00513 (2019)
Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–19 (2019)
Zhang, C., Li, S., Xia, J., Wang, W., Yan, F., Liu, Y.: BatchCrypt: efficient homomorphic encryption for cross-silo federated learning. In: 2020 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 20), pp. 493–506 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Fan, T., Chen, W., Ma, G., Kang, Y., Fan, L., Yang, Q. (2024). SecureBoost\(+\): Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14647. Springer, Singapore. https://doi.org/10.1007/978-981-97-2259-4_18
Download citation
DOI: https://doi.org/10.1007/978-981-97-2259-4_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2261-7
Online ISBN: 978-981-97-2259-4
eBook Packages: Computer ScienceComputer Science (R0)