Naive Bayes Classifiers That Perform Well with Continuous Variables

Bouckaert, Remco R.

doi:10.1007/978-3-540-30549-1_106

Remco R. Bouckaert²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3339))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

2897 Accesses
24 Citations

Abstract

There are three main methods for handling continuous variables in naive Bayes classifiers, namely, the normal method (parametric approach), the kernel method (non parametric approach) and discretization. In this article, we perform a methodologically sound comparison of the three methods, which shows large mutual differences of each of the methods and no single method being universally better. This suggests that a method for selecting one of the three approaches to continuous variables could improve overall performance of the naive Bayes classifier. We present three methods that can be implemented efficiently v-fold cross validation for the normal, kernel and discretization method. Empirical evidence suggests that selection using 10 fold cross validation (especially when repeated 10 times) can largely and significantly improve over all performance of naive Bayes classifiers and consistently outperform any of the three popular methods for dealing with continuous variables on their own. This is remarkable, since selection among more classifiers does not consistently result in better accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Irvine (1998)
Google Scholar
Bouckaert, R.R., Frank, E.: Evaluating the replicability of significance tests for comparing learning algorithms. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 3–12. Springer, Heidelberg (2004)
Chapter Google Scholar
Bouckaert, R.R.: Naive Bayes Classifiers that Perform Well with Continuous Variables. Technicl Report, Computer Science Department, University of Waikato
Google Scholar
Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7), 1895–1924 (1998)
Article Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–130 (1997)
Article MATH Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: ICML, pp. 194–202 (1995)
Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuousvalued attributes for classification learning. In: IJCAI, pp. 1022–1027 (1993)
Google Scholar
Hsu, C.N., Huang, H.J., Wong, T.T.: Why Discretization Works for Naive Bayes Classifiers. In: ICML, pp. 399–406 (2000)
Google Scholar
John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: UAI, pp. 338–345 (1995)
Google Scholar
Nadeau, C., Bengio, Y.: Inference for the generalization error. NIPS (2000)
Google Scholar
Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Yang, Y., Webb, G.I.: A Comparative Study of Discretization Methods for Naive-Bayes Classifiers. In: Proceedings of PKAW 2002, pp. 159–173 (2002)
Google Scholar
Yang, Y., Webb, G.I.: Discretization For Naive-Bayes Learning: Managing Discretization Bias And Variance. Techn Rep 2003/131, Monash University (2003)
Google Scholar
Yang, Y., Webb, G.I.: On Why Discretization Works for Naive-Bayes Classifiers. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 440–452. Springer, Heidelberg (2003)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Waikato & Xtal Mountain Information Technology, New Zealand
Remco R. Bouckaert

Authors

Remco R. Bouckaert
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Information Technology, Monash University, VIC 3800, Australia
Geoffrey I. Webb
Science, Engineering and Technology Portfolio, Royal Melbourne Institute of Technology, VIC 3001, Melbourne, Australia
Xinghuo Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bouckaert, R.R. (2004). Naive Bayes Classifiers That Perform Well with Continuous Variables. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_106

Download citation

DOI: https://doi.org/10.1007/978-3-540-30549-1_106
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24059-4
Online ISBN: 978-3-540-30549-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics