Abstract
Mining data over streams in one pass and using constant memory is a challenging task. Decision trees are one of the most popular classifiers for both batch and incremental learning due to their high degree of interpretability, ease of construction and good accuracy. The most popular decision tree for stream classification is Hoeffding Tree based on Hoeffding bound. Literature shows a few variants of decision trees based on different bounds. The default class prediction method adopted in decision tree is “majority class” approach. Later, the accuracy of prediction was scaled up by a hybrid decision tree where Naive Bayes classifier was used for prediction. Kernel Density Estimation (KDE) is employed in Flexible Naive Bayes for classification. However, it is suitable for modeling static data set. This paper proposes an Incremental Flexible Naive Bayes (IFNB) based hybrid decision tree paradigm that uses KDE to model continuous attributes at leaf nodes of the tree for improving the class prediction accuracy. Experimental results on both synthetic and real dataset show that the proposed IFNB based leaf classifiers achieves improvement over the class prediction methods adopted in existing decision trees for data streams.
Similar content being viewed by others
References
Aggarwal CC (2007) Data streams: models and algorithms, vol 31. Springer, Berlin
Bifet A (2010, July) Adaptive stream mining: pattern learning and mining from evolving data streams. In: Proceedings of the 2010 conference on adaptive stream mining: pattern learning and mining from evolving data streams. Ios Press, pp 1–212
Bifet A, Holmes G, Pfahringer B, Read J, Kranen P, Kremer H, Seidl T (2011, September). MOA: a real-time analytics open source framework. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp. 617–620
Bifet A, Kirkby R (2009) Data stream mining a practical approach. Technical report, Univ. of Waikato
Breiman L, Friedman JH, Olshen RA, Stone CJ (1993) Classification and regression trees. Chapman and Hall, London
Cazzolato MT, Ribeiro MX (2013, June) A statistical decision tree algorithm for medical data stream mining. In: Proceedings of the 26th IEEE international symposium on computer-based medical systems. IEEE, pp 389–392
Czarnowski I, Jędrzejowicz P (2014) Ensemble classifier for mining data streams. Procedia Computer Science 35:397–406
Domingos P, Hulten G (2000, August) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80
Domingos P, Hulten G (2003) A general framework for mining massive data streams. J Comput Graph Stat 12(4):945–949
Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. SIGMOD Rec 34(2):18–26
Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Anal 10(1):23–45
Gama J (2012) A survey on learning from data streams: current and future trends. Prog Artif Intell 1(1):45–55
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
He Y, Mao Y, Chen W, Chen Y (2015) Nonlinear metric learning with kernel density estimation. IEEE Trans Knowl Data Eng 27(6):1602–1614
Heinz C, Seeger B (2008) Cluster kernels: resource-aware kernel density estimators over streaming data. IEEE Trans Knowl Data Eng 20(7):880–893
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30
Hoens TR, Polikar R, Chawla NV (2012) Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell 1(1):89–101
Jankowski Dariusz, Jackowski Konrad, Cyganek Bogusław (2016) Learning decision trees from data streams with concept drift. Proc Comput Sci 80:1682–1691
Jin R, Agrawal G (2003, August) Efficient decision tree construction on streaming data. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 571–576
John GH, Langley P (1995, August) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 338–345
Jun Y, Mingyou B, Guannan W, Xianjiang S (2017, July) Fault diagnosis of planetary gearbox using wavelet packet transform and flexible naive Bayesian classifier. In: 2017 36th Chinese control conference (CCC). IEEE, pp 7207–7211
Kobos M, Mańdziuk J (2009, September) Classification based on combination of kernel density estimators. In: International conference on artificial neural networks. Springer, Berlin, pp 125–134
Li F, Liu Q (2008, December) An improved algorithm of decision trees for streaming data based on VFDT. In: 2008 international symposium on information science and engineering, vol 1. IEEE, pp 597–600
McDiarmid C (1989) On the method of bounded differences. Surv Combin 141(1):148–188
Muthukrishnan S (2005) Data streams: Algorithms and applications. Now Publishers Inc, Breda
Pérez A, Larrañaga P, Inza I (2009) Bayesian classifiers based on kernel density estimation: flexible classifiers. Int J Approx Reason 50(2):341–362
Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam
Ram P, Gray AG (2011, August) Density estimation trees. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 627–635
Rutkowski L, Pietruczuk L, Duda P, Jaworski M (2013) Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Trans Knowl Data Eng 25(6):1272–1279
Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2014) Decision trees for mining data streams based on the gaussian approximation. IEEE Trans Knowl Data Eng 26(1):108–119
Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2015) A new method for data stream mining based on the misclassification error. IEEE Trans Neural Netw Learn Syst 26(5):1048–1059
Scott DW (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, Hoboken
Silverman BW (1986) Density estimation for statistics and data analysis, vol 26. CRC Press, Boca Raton
Smyth P, Gray A, Fayyad UM (1995, July) Retrofitting decision tree classifiers using kernel density estimation. In: ICML, pp 506–514
Su L, Han W, Yang S, Zou P, Jia Y (2007, September). Continuous adaptive outlier detection on distributed data streams. In: International conference on high performance computing and communications. Springer, Berlin, pp 74–85
Wand MP, Jones MC (1995) Kernel smoothing. Chapman and Hall, London
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington
Yu WG, Cai YH (2012, July) A weighted flexible naive Bayesian classifier for continuous attributes. In: 2012 International conference on machine learning and cybernetics (ICMLC), vol 2. IEEE, pp 756–761
Zephyr Bio-Harness. http://www.zephyr-technology.com
Zephyr Pressure Monitor http://www.zephyranywherestore.com/Automatic-Bluetooth-Pressure-Monitor-HPL-108/dp/B009ZUG2Z8
Zhou A, Cai Z, Wei L, Qian W (2003, March) M-kernel merging: towards density estimation over data streams. In: Proceedings of eighth international conference on database systems for advanced applications, 2003 (DASFAA 2003). IEEE, pp 285–292
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sweetlin Hemalatha, C., Pathak, R. & Vaidehi, V. Hybrid decision trees for data streams based on Incremental Flexible Naive Bayes prediction at leaf nodes. Evol. Intel. 12, 515–526 (2019). https://doi.org/10.1007/s12065-019-00252-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-019-00252-3