Skip to main content
Log in

Regularized and incremental decision trees for data streams

  • Published:
Annals of Telecommunications Aims and scope Submit manuscript

Abstract

Decision trees are a widely used family of methods for learning predictive models from both batch and streaming data. Despite depicting positive results in a multitude of applications, incremental decision trees continuously grow in terms of nodes as new data becomes available, i.e., they eventually split on all features available, and also multiple times using the same feature, thus leading to unnecessary complexity and overfitting. With this behavior, incremental trees lose the ability to generalize well, be human-understandable, and be computationally efficient. To tackle these issues, we proposed in a previous study a regularization scheme for Hoeffding decision trees that (i) uses a penalty factor to control the gain obtained by creating a new split node using a feature that has not been used thus far and (ii) uses information from previous splits in the current branch to determine whether the gain observed indeed justifies a new split. In this paper, we extend this analysis and apply the proposed regularization scheme to other types of incremental decision trees and report the results in both synthetic and real-world scenarios. The main interest is to verify whether and how the proposed regularization scheme affects the different types of incremental trees. Results show that in addition to the original Hoeffding Tree, the Adaptive Random Forest also benefits from regularization, yet, McDiarmid Trees and Extremely Fast Decision Trees observe declines in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In practice, depending on the metric J being used, we should, instead, target its minimization. For instance, in CART-based trees [17], our goal would be to minimize the Gini impurity metric instead of maximizing it, and as a result, the process should be adapted.

  2. For instance, the proposed scheme is the same for McDiarmid trees, except that the McDiarmid bound earlier reported in Eq 3 instead of the Hoeffding bound given in Eq 2.

References

  1. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  2. Barddal JP, Gomes HM, Enembreck F, Pfahringer B, Albert Bifet (2016) On dynamic feature weighting for feature drifting data streams. In: ECML/PKDD’16, Lecture Notes in Computer Science. Springer, New York

  3. Bahri M., Maniu S., Bifet A. (2018) A sketch-based naive bayes algorithms for evolving data streams. In: 2018 IEEE International Conference on Big Data (Big Data), pp 604–613

  4. Krawczyk B., Wozniak M. (2015) Weighted naïve bayes classifier with forgetting for drifting data streams. In: 2015 IEEE International conference on systems, man, and cybernetics, pp 2147–2152

  5. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’00, pages 71–80, New York, NY, USA. ACM. ISBN 1-58113-233-6. https://doi.org/10.1145/347090.347107

  6. Rutkowski L, Pietruczuk L, Duda P, Jaworski M (2013) Decision trees for mining data streams based on the mcdiarmid’s bound. IEEE Trans Know Data Eng 25(6):1272–1279. ISSN 1041-4347. https://doi.org/10.1109/TKDE.2012.66

    Article  Google Scholar 

  7. Amezzane I, Fakhri Y, Aroussi ME, Bakhouya M (2019) Comparative study of batch and stream learning for online smartphone-based human activity recognition. In: Ahmed MB, Boudhir AA, Younes A (eds) Innovations in Smart Cities Applications Edition 2, pp 557–571, Cham. Springer International Publishing. ISBN 978-3-030-11196-0

  8. Bifet A, Frank E, Holmes G, Pfahringer B (2012) Ensembles of restricted hoeffding trees. ACM Trans Intell Syst Technol 3(2):30:1–30:20. ISSN 2157-6904. https://doi.org/10.1145/2089094.2089106

    Article  Google Scholar 

  9. Yang H., Fong S. (2011) Optimized very fast decision tree with balanced classification accuracy and compact tree size, pp 57–64

  10. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Statist Soc Series B (Methodological) 58(1):267–288. ISSN 00359246. http://www.jstor.org/stable/2346178

    Article  MathSciNet  MATH  Google Scholar 

  11. Barddal JP, Enembreck F (2019) Learning regularized hoeffding trees from data streams. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, SAC ’19, pages 574–581, New York, NY, USA. ACM. ISBN 978-1-4503-5933-7. https://doi.org/10.1145/3297280.3297334

  12. Manapragada C, Webb GI, Salehi M (2018) Extremely fast decision tree. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, pages 1953–1962, New York, NY, USA. ACM. ISBN 978-1-4503-5552-0. https://doi.org/10.1145/3219819.3220005

  13. Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfharinger B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9):1469–1495. ISSN 1573-0565. https://doi.org/10.1007/s10994-017-5642-8

    Article  MathSciNet  MATH  Google Scholar 

  14. Ikonomovska E, Gama J, Džeroski S (2011a) Learning model trees from evolving data streams. Data Mining Know Discovery 23(1):128–168. ISSN 1573-756X. https://doi.org/10.1007/s10618-010-0201-y

    Article  MathSciNet  MATH  Google Scholar 

  15. Ikonomovska E, Gama J, Zenko B, Dzeroski S (2011b) Speeding-up hoeffding-based regression trees with options. In: ICML, pp 537–544

  16. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101. ISSN 0885-6125. https://doi.org/10.1023/A:1018046501280

    Google Scholar 

  17. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth and brooks, Monterey CA

  18. da Costa VGT, de Leon Ferreira de Carvalho ACP, Barbon Jr. S (2018) Strict very fast decision tree: a memory conservative algorithm for data stream mining. Patt Recog Lett 116:22–28. ISSN 0167-8655. https://doi.org/10.1016/j.patrec.2018.09.004. http://www.sciencedirect.com/science/article/pii/S0167865518305580

    Article  Google Scholar 

  19. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’01, pages 97–106, New York, NY, USA. ACM. ISBN 1-58113-391-X. https://doi.org/10.1145/502512.502529

  20. Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. Springe, Berlin, pp 249–260. ISBN 978-3-642-03915-7. https://doi.org/10.1007/978-3-642-03915-7_22

    Google Scholar 

  21. Breiman L (2001) Random forests. Mach Learn 45(1):5–32. ISSN 0885-6125. https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  22. Jankowski D, Jackowski K (2016) Learning decision trees from data streams with concept drift, vol 80, pp 1682–1691. ISSN 1877-0509 https://doi.org/10.1016/j.procs.2016.05.508, http://www.sciencedirect.com/science/article/pii/S1877050916309954http://www.sciencedirect.com/science/article/pii/S1877050916309954. International Conference on Computational Science 2016, ICCS 2016, 6–8 June 2016, San Diego, California, USA

    Google Scholar 

  23. Deng H, Runger G (2012) Feature selection via regularized trees. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp 1–8, DOI https://doi.org/10.1109/IJCNN.2012.6252640

  24. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Bazzan AC , Labidi S (eds) Advances in Artificial Intelligence – SBIA 2004, volume 3171 of Lecture Notes in Computer Science. ISBN 978-3-540-23237-7. https://doi.org/10.1007/978-3-540-28645-5_29. Springer, Berlin, pp 286–295

  25. Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. Know Data Eng IEEE Trans 5(6):914–925. ISSN 1041-4347. https://doi.org/10.1109/69.250074

    Article  Google Scholar 

  26. Enembreck F, Ávila BC, Scalabrin EE, Barthès JPA (2007) Learning drifting negotiations. Appl Artif Intell 21(9):861–881. http://dblp.uni-trier.de/db/journals/aai/aai21.html#EnembreckASB07

    Article  Google Scholar 

  27. Harries M (1999) New South Wales. Splice-2 comparative evaluation: Electricity pricing

  28. Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput Elect Agri 24(3):131–151. ISSN 0168-1699. https://doi.org/10.1016/S0168-1699(99)00046-0. http://www.sciencedirect.com/science/article/pii/S0168169999000460

    Article  Google Scholar 

  29. Katakis I, Tsoumakas G, Vlahavas I (2006) Dynamic feature space and incremental feature selection for the classification of textual data streams. In: in ECML/PKDD-2006 International Workshop on Knowledge Discovery from Data Streams 2006. Springer, New York, p 107

  30. Barddal JP, Gomes HM, Enembreck F (2015) A survey on feature drift adaptation. In: Proceedings of the International Conference on Tools with Artificial Intelligence. IEEE

  31. Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58 (301):13–30. http://www.jstor.org/stable/2282952?

    Article  MathSciNet  MATH  Google Scholar 

  32. Gomes HM, Barddal JP, Ferreira LEB, Bifet A (2018) Adaptive random forests for data stream regression. In: 26th European Symposium on Artificial Neural Networks, ESANN 2018, Bruges, Belgium, April 25-27, 2018. http://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2018-183.pdf

  33. Britto AS, Sabourin R, Oliveira LES (2014) Dynamic selection of classifiers—a comprehensive review. Patt Recog 47(11):3665–3680. ISSN 0031-3203. https://doi.org/10.1016/j.patcog.2014.05.003. http://www.sciencedirect.com/science/article/pii/S0031320314001885

    Article  Google Scholar 

  34. Cruz RMO, Sabourin R, Cavalcanti GDC (2014) Analyzing dynamic ensemble selection techniques using dissimilarity analysis. In: Gayar NE, Schwenker F, Suen C (eds) Artificial Neural Networks in Pattern Recognition, pp 59–70, Cham. Springer International Publishing. ISBN 978-3-319-11656-3

  35. Almeida PRLD, Oliveira LS, Britto ADS, Sabourin R (2016) Handling concept drifts using dynamic selection of classifiers. In: 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), pp 989–995. https://doi.org/10.1109/ICTAI.2016.0153

  36. Zyblewski P, Ksieniewicz P, Woźniak M (2019) Classifier selection for highly imbalanced data streams with minority driven ensemble. In: Rutkowski L, Scherer R, Korytkowski M, Pedrycz W, Tadeusiewicz R, Zurada JM (eds) Artificial Intelligence and Soft Computing, pp 626–635, Cham. Springer International Publishing. ISBN 978-3-030-20912-4

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers from both ACM SAC 2019 for the constructive comments yielded on our original manuscript and the reviewers of the Annals of Telecommunications for the feedback in this manuscript. This research did not receive any kind of financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean Paul Barddal.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barddal, J.P., Enembreck, F. Regularized and incremental decision trees for data streams. Ann. Telecommun. 75, 493–503 (2020). https://doi.org/10.1007/s12243-020-00782-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12243-020-00782-3

Keywords

Navigation