Skip to main content
Log in

Hybrid decision trees for data streams based on Incremental Flexible Naive Bayes prediction at leaf nodes

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Mining data over streams in one pass and using constant memory is a challenging task. Decision trees are one of the most popular classifiers for both batch and incremental learning due to their high degree of interpretability, ease of construction and good accuracy. The most popular decision tree for stream classification is Hoeffding Tree based on Hoeffding bound. Literature shows a few variants of decision trees based on different bounds. The default class prediction method adopted in decision tree is “majority class” approach. Later, the accuracy of prediction was scaled up by a hybrid decision tree where Naive Bayes classifier was used for prediction. Kernel Density Estimation (KDE) is employed in Flexible Naive Bayes for classification. However, it is suitable for modeling static data set. This paper proposes an Incremental Flexible Naive Bayes (IFNB) based hybrid decision tree paradigm that uses KDE to model continuous attributes at leaf nodes of the tree for improving the class prediction accuracy. Experimental results on both synthetic and real dataset show that the proposed IFNB based leaf classifiers achieves improvement over the class prediction methods adopted in existing decision trees for data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Aggarwal CC (2007) Data streams: models and algorithms, vol 31. Springer, Berlin

    Book  Google Scholar 

  2. Bifet A (2010, July) Adaptive stream mining: pattern learning and mining from evolving data streams. In: Proceedings of the 2010 conference on adaptive stream mining: pattern learning and mining from evolving data streams. Ios Press, pp 1–212

  3. Bifet A, Holmes G, Pfahringer B, Read J, Kranen P, Kremer H, Seidl T (2011, September). MOA: a real-time analytics open source framework. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp. 617–620

    Chapter  Google Scholar 

  4. Bifet A, Kirkby R (2009) Data stream mining a practical approach. Technical report, Univ. of Waikato

  5. Breiman L, Friedman JH, Olshen RA, Stone CJ (1993) Classification and regression trees. Chapman and Hall, London

    MATH  Google Scholar 

  6. Cazzolato MT, Ribeiro MX (2013, June) A statistical decision tree algorithm for medical data stream mining. In: Proceedings of the 26th IEEE international symposium on computer-based medical systems. IEEE, pp 389–392

  7. Czarnowski I, Jędrzejowicz P (2014) Ensemble classifier for mining data streams. Procedia Computer Science 35:397–406

    Article  Google Scholar 

  8. Domingos P, Hulten G (2000, August) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80

  9. Domingos P, Hulten G (2003) A general framework for mining massive data streams. J Comput Graph Stat 12(4):945–949

    Article  MathSciNet  Google Scholar 

  10. Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. SIGMOD Rec 34(2):18–26

    Article  Google Scholar 

  11. Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Anal 10(1):23–45

    Article  Google Scholar 

  12. Gama J (2012) A survey on learning from data streams: current and future trends. Prog Artif Intell 1(1):45–55

    Article  Google Scholar 

  13. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  14. He Y, Mao Y, Chen W, Chen Y (2015) Nonlinear metric learning with kernel density estimation. IEEE Trans Knowl Data Eng 27(6):1602–1614

    Article  Google Scholar 

  15. Heinz C, Seeger B (2008) Cluster kernels: resource-aware kernel density estimators over streaming data. IEEE Trans Knowl Data Eng 20(7):880–893

    Article  Google Scholar 

  16. Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30

    Article  MathSciNet  Google Scholar 

  17. Hoens TR, Polikar R, Chawla NV (2012) Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell 1(1):89–101

    Article  Google Scholar 

  18. Jankowski Dariusz, Jackowski Konrad, Cyganek Bogusław (2016) Learning decision trees from data streams with concept drift. Proc Comput Sci 80:1682–1691

    Article  Google Scholar 

  19. Jin R, Agrawal G (2003, August) Efficient decision tree construction on streaming data. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 571–576

  20. John GH, Langley P (1995, August) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 338–345

  21. Jun Y, Mingyou B, Guannan W, Xianjiang S (2017, July) Fault diagnosis of planetary gearbox using wavelet packet transform and flexible naive Bayesian classifier. In: 2017 36th Chinese control conference (CCC). IEEE, pp 7207–7211

  22. Kobos M, Mańdziuk J (2009, September) Classification based on combination of kernel density estimators. In: International conference on artificial neural networks. Springer, Berlin, pp 125–134

    Chapter  Google Scholar 

  23. Li F, Liu Q (2008, December) An improved algorithm of decision trees for streaming data based on VFDT. In: 2008 international symposium on information science and engineering, vol 1. IEEE, pp 597–600

  24. McDiarmid C (1989) On the method of bounded differences. Surv Combin 141(1):148–188

    MathSciNet  MATH  Google Scholar 

  25. Muthukrishnan S (2005) Data streams: Algorithms and applications. Now Publishers Inc, Breda

    MATH  Google Scholar 

  26. Pérez A, Larrañaga P, Inza I (2009) Bayesian classifiers based on kernel density estimation: flexible classifiers. Int J Approx Reason 50(2):341–362

    Article  Google Scholar 

  27. Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  28. Ram P, Gray AG (2011, August) Density estimation trees. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 627–635

  29. Rutkowski L, Pietruczuk L, Duda P, Jaworski M (2013) Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Trans Knowl Data Eng 25(6):1272–1279

    Article  Google Scholar 

  30. Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2014) Decision trees for mining data streams based on the gaussian approximation. IEEE Trans Knowl Data Eng 26(1):108–119

    Article  Google Scholar 

  31. Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2015) A new method for data stream mining based on the misclassification error. IEEE Trans Neural Netw Learn Syst 26(5):1048–1059

    Article  MathSciNet  Google Scholar 

  32. Scott DW (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, Hoboken

    Book  Google Scholar 

  33. Silverman BW (1986) Density estimation for statistics and data analysis, vol 26. CRC Press, Boca Raton

    Book  Google Scholar 

  34. Smyth P, Gray A, Fayyad UM (1995, July) Retrofitting decision tree classifiers using kernel density estimation. In: ICML, pp 506–514

  35. Su L, Han W, Yang S, Zou P, Jia Y (2007, September). Continuous adaptive outlier detection on distributed data streams. In: International conference on high performance computing and communications. Springer, Berlin, pp 74–85

    Chapter  Google Scholar 

  36. Wand MP, Jones MC (1995) Kernel smoothing. Chapman and Hall, London

    Book  Google Scholar 

  37. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington

    MATH  Google Scholar 

  38. Yu WG, Cai YH (2012, July) A weighted flexible naive Bayesian classifier for continuous attributes. In: 2012 International conference on machine learning and cybernetics (ICMLC), vol 2. IEEE, pp 756–761

  39. Zephyr Bio-Harness. http://www.zephyr-technology.com

  40. Zephyr Pressure Monitor http://www.zephyranywherestore.com/Automatic-Bluetooth-Pressure-Monitor-HPL-108/dp/B009ZUG2Z8

  41. Zhou A, Cai Z, Wei L, Qian W (2003, March) M-kernel merging: towards density estimation over data streams. In: Proceedings of eighth international conference on database systems for advanced applications, 2003 (DASFAA 2003). IEEE, pp 285–292

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Sweetlin Hemalatha.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sweetlin Hemalatha, C., Pathak, R. & Vaidehi, V. Hybrid decision trees for data streams based on Incremental Flexible Naive Bayes prediction at leaf nodes. Evol. Intel. 12, 515–526 (2019). https://doi.org/10.1007/s12065-019-00252-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-019-00252-3

Keywords

Navigation