Skip to main content

Naive Bayes Classifiers That Perform Well with Continuous Variables

  • Conference paper
Book cover AI 2004: Advances in Artificial Intelligence (AI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3339))

Included in the following conference series:

Abstract

There are three main methods for handling continuous variables in naive Bayes classifiers, namely, the normal method (parametric approach), the kernel method (non parametric approach) and discretization. In this article, we perform a methodologically sound comparison of the three methods, which shows large mutual differences of each of the methods and no single method being universally better. This suggests that a method for selecting one of the three approaches to continuous variables could improve overall performance of the naive Bayes classifier. We present three methods that can be implemented efficiently v-fold cross validation for the normal, kernel and discretization method. Empirical evidence suggests that selection using 10 fold cross validation (especially when repeated 10 times) can largely and significantly improve over all performance of naive Bayes classifiers and consistently outperform any of the three popular methods for dealing with continuous variables on their own. This is remarkable, since selection among more classifiers does not consistently result in better accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Irvine (1998)

    Google Scholar 

  2. Bouckaert, R.R., Frank, E.: Evaluating the replicability of significance tests for comparing learning algorithms. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 3–12. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Bouckaert, R.R.: Naive Bayes Classifiers that Perform Well with Continuous Variables. Technicl Report, Computer Science Department, University of Waikato

    Google Scholar 

  4. Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7), 1895–1924 (1998)

    Article  Google Scholar 

  5. Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–130 (1997)

    Article  MATH  Google Scholar 

  6. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: ICML, pp. 194–202 (1995)

    Google Scholar 

  7. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuousvalued attributes for classification learning. In: IJCAI, pp. 1022–1027 (1993)

    Google Scholar 

  8. Hsu, C.N., Huang, H.J., Wong, T.T.: Why Discretization Works for Naive Bayes Classifiers. In: ICML, pp. 399–406 (2000)

    Google Scholar 

  9. John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: UAI, pp. 338–345 (1995)

    Google Scholar 

  10. Nadeau, C., Bengio, Y.: Inference for the generalization error. NIPS (2000)

    Google Scholar 

  11. Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  12. Yang, Y., Webb, G.I.: A Comparative Study of Discretization Methods for Naive-Bayes Classifiers. In: Proceedings of PKAW 2002, pp. 159–173 (2002)

    Google Scholar 

  13. Yang, Y., Webb, G.I.: Discretization For Naive-Bayes Learning: Managing Discretization Bias And Variance. Techn Rep 2003/131, Monash University (2003)

    Google Scholar 

  14. Yang, Y., Webb, G.I.: On Why Discretization Works for Naive-Bayes Classifiers. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 440–452. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bouckaert, R.R. (2004). Naive Bayes Classifiers That Perform Well with Continuous Variables. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_106

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30549-1_106

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24059-4

  • Online ISBN: 978-3-540-30549-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics