Hybrid decision trees for data streams based on Incremental Flexible Naive Bayes prediction at leaf nodes

Sweetlin Hemalatha, C.; Pathak, Ravi; Vaidehi, V.

doi:10.1007/s12065-019-00252-3

Hybrid decision trees for data streams based on Incremental Flexible Naive Bayes prediction at leaf nodes

Research Paper
Published: 13 June 2019

Volume 12, pages 515–526, (2019)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

C. Sweetlin Hemalatha¹,
Ravi Pathak² &
V. Vaidehi¹

267 Accesses
6 Citations
Explore all metrics

Abstract

Mining data over streams in one pass and using constant memory is a challenging task. Decision trees are one of the most popular classifiers for both batch and incremental learning due to their high degree of interpretability, ease of construction and good accuracy. The most popular decision tree for stream classification is Hoeffding Tree based on Hoeffding bound. Literature shows a few variants of decision trees based on different bounds. The default class prediction method adopted in decision tree is “majority class” approach. Later, the accuracy of prediction was scaled up by a hybrid decision tree where Naive Bayes classifier was used for prediction. Kernel Density Estimation (KDE) is employed in Flexible Naive Bayes for classification. However, it is suitable for modeling static data set. This paper proposes an Incremental Flexible Naive Bayes (IFNB) based hybrid decision tree paradigm that uses KDE to model continuous attributes at leaf nodes of the tree for improving the class prediction accuracy. Experimental results on both synthetic and real dataset show that the proposed IFNB based leaf classifiers achieves improvement over the class prediction methods adopted in existing decision trees for data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Water quality prediction using machine learning models based on grid search method

Article Open access 29 September 2023

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

References

Aggarwal CC (2007) Data streams: models and algorithms, vol 31. Springer, Berlin
Book Google Scholar
Bifet A (2010, July) Adaptive stream mining: pattern learning and mining from evolving data streams. In: Proceedings of the 2010 conference on adaptive stream mining: pattern learning and mining from evolving data streams. Ios Press, pp 1–212
Bifet A, Holmes G, Pfahringer B, Read J, Kranen P, Kremer H, Seidl T (2011, September). MOA: a real-time analytics open source framework. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp. 617–620
Chapter Google Scholar
Bifet A, Kirkby R (2009) Data stream mining a practical approach. Technical report, Univ. of Waikato
Breiman L, Friedman JH, Olshen RA, Stone CJ (1993) Classification and regression trees. Chapman and Hall, London
MATH Google Scholar
Cazzolato MT, Ribeiro MX (2013, June) A statistical decision tree algorithm for medical data stream mining. In: Proceedings of the 26th IEEE international symposium on computer-based medical systems. IEEE, pp 389–392
Czarnowski I, Jędrzejowicz P (2014) Ensemble classifier for mining data streams. Procedia Computer Science 35:397–406
Article Google Scholar
Domingos P, Hulten G (2000, August) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80
Domingos P, Hulten G (2003) A general framework for mining massive data streams. J Comput Graph Stat 12(4):945–949
Article MathSciNet Google Scholar
Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. SIGMOD Rec 34(2):18–26
Article Google Scholar
Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Anal 10(1):23–45
Article Google Scholar
Gama J (2012) A survey on learning from data streams: current and future trends. Prog Artif Intell 1(1):45–55
Article Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
MATH Google Scholar
He Y, Mao Y, Chen W, Chen Y (2015) Nonlinear metric learning with kernel density estimation. IEEE Trans Knowl Data Eng 27(6):1602–1614
Article Google Scholar
Heinz C, Seeger B (2008) Cluster kernels: resource-aware kernel density estimators over streaming data. IEEE Trans Knowl Data Eng 20(7):880–893
Article Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30
Article MathSciNet Google Scholar
Hoens TR, Polikar R, Chawla NV (2012) Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell 1(1):89–101
Article Google Scholar
Jankowski Dariusz, Jackowski Konrad, Cyganek Bogusław (2016) Learning decision trees from data streams with concept drift. Proc Comput Sci 80:1682–1691
Article Google Scholar
Jin R, Agrawal G (2003, August) Efficient decision tree construction on streaming data. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 571–576
John GH, Langley P (1995, August) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 338–345
Jun Y, Mingyou B, Guannan W, Xianjiang S (2017, July) Fault diagnosis of planetary gearbox using wavelet packet transform and flexible naive Bayesian classifier. In: 2017 36th Chinese control conference (CCC). IEEE, pp 7207–7211
Kobos M, Mańdziuk J (2009, September) Classification based on combination of kernel density estimators. In: International conference on artificial neural networks. Springer, Berlin, pp 125–134
Chapter Google Scholar
Li F, Liu Q (2008, December) An improved algorithm of decision trees for streaming data based on VFDT. In: 2008 international symposium on information science and engineering, vol 1. IEEE, pp 597–600
McDiarmid C (1989) On the method of bounded differences. Surv Combin 141(1):148–188
MathSciNet MATH Google Scholar
Muthukrishnan S (2005) Data streams: Algorithms and applications. Now Publishers Inc, Breda
MATH Google Scholar
Pérez A, Larrañaga P, Inza I (2009) Bayesian classifiers based on kernel density estimation: flexible classifiers. Int J Approx Reason 50(2):341–362
Article Google Scholar
Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam
Google Scholar
Ram P, Gray AG (2011, August) Density estimation trees. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 627–635
Rutkowski L, Pietruczuk L, Duda P, Jaworski M (2013) Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Trans Knowl Data Eng 25(6):1272–1279
Article Google Scholar
Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2014) Decision trees for mining data streams based on the gaussian approximation. IEEE Trans Knowl Data Eng 26(1):108–119
Article Google Scholar
Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2015) A new method for data stream mining based on the misclassification error. IEEE Trans Neural Netw Learn Syst 26(5):1048–1059
Article MathSciNet Google Scholar
Scott DW (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, Hoboken
Book Google Scholar
Silverman BW (1986) Density estimation for statistics and data analysis, vol 26. CRC Press, Boca Raton
Book Google Scholar
Smyth P, Gray A, Fayyad UM (1995, July) Retrofitting decision tree classifiers using kernel density estimation. In: ICML, pp 506–514
Su L, Han W, Yang S, Zou P, Jia Y (2007, September). Continuous adaptive outlier detection on distributed data streams. In: International conference on high performance computing and communications. Springer, Berlin, pp 74–85
Chapter Google Scholar
Wand MP, Jones MC (1995) Kernel smoothing. Chapman and Hall, London
Book Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington
MATH Google Scholar
Yu WG, Cai YH (2012, July) A weighted flexible naive Bayesian classifier for continuous attributes. In: 2012 International conference on machine learning and cybernetics (ICMLC), vol 2. IEEE, pp 756–761
Zephyr Bio-Harness. http://www.zephyr-technology.com
Zephyr Pressure Monitor http://www.zephyranywherestore.com/Automatic-Bluetooth-Pressure-Monitor-HPL-108/dp/B009ZUG2Z8
Zhou A, Cai Z, Wei L, Qian W (2003, March) M-kernel merging: towards density estimation over data streams. In: Proceedings of eighth international conference on database systems for advanced applications, 2003 (DASFAA 2003). IEEE, pp 285–292

Download references

Author information

Authors and Affiliations

School of Computing Science and Engineering, VIT, Chennai, India
C. Sweetlin Hemalatha & V. Vaidehi
Global Biodiversity Information Facility (GBIF), Secretariat Copenhagen, Copenhagen, Denmark
Ravi Pathak

Authors

C. Sweetlin Hemalatha
View author publications
You can also search for this author in PubMed Google Scholar
Ravi Pathak
View author publications
You can also search for this author in PubMed Google Scholar
V. Vaidehi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Sweetlin Hemalatha.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sweetlin Hemalatha, C., Pathak, R. & Vaidehi, V. Hybrid decision trees for data streams based on Incremental Flexible Naive Bayes prediction at leaf nodes. Evol. Intel. 12, 515–526 (2019). https://doi.org/10.1007/s12065-019-00252-3

Download citation

Received: 26 June 2018
Revised: 18 March 2019
Accepted: 27 May 2019
Published: 13 June 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s12065-019-00252-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid decision trees for data streams based on Incremental Flexible Naive Bayes prediction at leaf nodes

Abstract

Access this article

Similar content being viewed by others

Water quality prediction using machine learning models based on grid search method

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid decision trees for data streams based on Incremental Flexible Naive Bayes prediction at leaf nodes

Abstract

Access this article

Similar content being viewed by others

Water quality prediction using machine learning models based on grid search method

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation