Abstract
In a non-stationary data stream, concept drift occurs when different chunks of incoming data have different distributions. Hence, over time, the global optimization point of a learning model might permanently drift to the point where the model no longer adequately performs the task it was designed for. This phenomenon needs to be addressed to maintain the integrity and effectiveness of a model over the long term. In this paper, we propose a simple but effective drift learning algorithm called elastic Gradient Boosting Decision Tree (eGBDT). Since the prediction of a GBDT model is the sum output of a list of trees, we can easily append new trees to perform incremental learning or delete the last few trees to roll back to a previously known optimization point. The proposed eGBDT incrementally fits new data and detect drift by searching for the tree with the lowest residual. If the rollback deletions required would exceed the initial number of trees, a retraining process is triggered. Comparisons of eGBDT with five state-of-the-art methods on eight data sets show the efficacy of eGBDT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SIAM, pp. 443–448. SIAM (2007)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 135–150. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15880-3_15
Breiman, L.: Bias, variance, and arcing classifiers., Technical report 460, Statistics Department, University of California, Berkeley (1996)
Brzeziński, D., Stefanowski, J.: Accuracy updated ensemble for data streams with concept drift. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds.) HAIS 2011. LNCS (LNAI), vol. 6679, pp. 155–163. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21222-2_19
Brzeziński, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 81–94 (2013)
Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2012)
Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015)
Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)
Feng, J., Xu, Y.X., Jiang, Y., Zhou, Z.H.: Soft gradient boosting machine. arXiv preprint arXiv:2006.04059 (2020)
Frías-Blanco, I., del Campo-Ávila, J., Ramos-Jimenez, G., Morales-Bueno, R., Ortiz-Díaz, A., Caballero-Mota, Y.: Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans. Knowl. Data Eng. 27(3), 810–823 (2014)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44 (2014)
Hu, H., Sun, W., Venkatraman, A., Hebert, M., Bagnell, A.: Gradient boosting on stochastic data streams. In: AISTATS, pp. 595–603. PMLR (2017)
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: an ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)
Liu, A., Lu, J., Liu, F., Zhang, G.: Accumulating regional density dissimilarity for concept drift detection in data streams. Pattern Recogn. 76, 256–272 (2018)
Liu, A., Lu, J., Zhang, G.: Concept drift detection via equal intensity K-means space partitioning. IEEE Trans. Cybern. (2020). https://doi.org/10.1109/TCYB.2020.2983962
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31(12), 2346–2363 (2018)
Lu, J., Zuo, H., Zhang, G.: Fuzzy multiple-source transfer learning. IEEE Trans. Fuzzy Syst. (2019). https://doi.org/10.1109/TFUZZ.2019.2952792
Lu, N., Lu, J., Zhang, G., De Mantaras, R.L.: A concept drift-tolerant case-base editing technique. Artif. Intell. 230, 108–133 (2016)
Lu, N., Zhang, G., Lu, J.: Concept drift detection via competence models. Artif. Intell. 209, 11–28 (2014)
Oza, N.C.: Online bagging and boosting. In: SMC, pp. 2340–2345. IEEE (2005)
Schlimmer, J.C., Granger, R.H.: Incremental learning from noisy data. Mach. Learn. 1(3), 317–354 (1986)
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: ACM SIGKDD, pp. 377–382. ACM (2001)
Sun, Y., Tang, K., Zhu, Z., Yao, X.: Concept drift adaptation by exploiting historical knowledge. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4822–4832 (2018)
Acknowledgments
This work was supported by the Australian Research Council through the Discovery Project under Grant DP190101733.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, K., Liu, A., Lu, J., Zhang, G., Xiong, L. (2020). An Elastic Gradient Boosting Decision Tree for Concept Drift Learning. In: Gallagher, M., Moustafa, N., Lakshika, E. (eds) AI 2020: Advances in Artificial Intelligence. AI 2020. Lecture Notes in Computer Science(), vol 12576. Springer, Cham. https://doi.org/10.1007/978-3-030-64984-5_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-64984-5_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64983-8
Online ISBN: 978-3-030-64984-5
eBook Packages: Computer ScienceComputer Science (R0)