Skip to main content
Log in

Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter?

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Within the framework of Bayesian networks (BNs), most classifiers assume that the variables involved are of a discrete nature, but this assumption rarely holds in real problems. Despite the loss of information discretization entails, it is a direct easy-to-use mechanism that can offer some benefits: sometimes discretization improves the run time for certain algorithms; it provides a reduction in the value set and then a reduction in the noise which might be present in the data; in other cases, there are some Bayesian methods that can only deal with discrete variables. Hence, even though there are many ways to deal with continuous variables other than discretization, it is still commonly used. This paper presents a study of the impact of using different discretization strategies on a set of representative BN classifiers, with a significant sample consisting of 26 datasets. For this comparison, we have chosen Naive Bayes (NB) together with several other semi-Naive Bayes classifiers: Tree-Augmented Naive Bayes (TAN), k-Dependence Bayesian (KDB), Aggregating One-Dependence Estimators (AODE) and Hybrid AODE (HAODE). Also, we have included an augmented Bayesian network created by using a hill climbing algorithm (BNHC). With this comparison we analyse to what extent the type of discretization method affects classifier performance in terms of accuracy and bias-variance discretization. Our main conclusion is that even if a discretization method produces different results for a particular dataset, it does not really have an effect when classifiers are being compared. That is, given a set of datasets, accuracy values might vary but the classifier ranking is generally maintained. This is a very useful outcome, assuming that the type of discretization applied is not decisive future experiments can be d times faster, d being the number of discretization methods considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Asuncion A, Newman DJ (2007) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html

  2. Bouckaert RR (2005) Bayesian network classifiers in Weka. Technical report

  3. Buntine W (1996) A guide to the literature on learning probabilistic networks from data. IEEE Trans Knowl Data Eng 8:195–210

    Article  Google Scholar 

  4. Casella G, Berger R (2001) Statistical inference. Duxbury Resource Center

  5. Chmielewski MR, Grzrmala-Busse J (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15(4):319–331

    Article  MATH  Google Scholar 

  6. Chow C, Liu C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467

    Article  MathSciNet  MATH  Google Scholar 

  7. Collection of Datasets avalaibles from the Weka Official Homepage (2008) http://www.cs.waikato.ac.nz/ml/weka/

  8. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  Google Scholar 

  9. Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proc of the 12th int conf on mach learn, pp 194–202

    Google Scholar 

  10. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc of the 13th int joint conf on AI, pp 1022–1027

    Google Scholar 

  11. Fisher RA (1959) Statistical methods and scientific inference, 2nd edn. Oliver and Boyd, Edinburgh

    Google Scholar 

  12. Flores JL, Inza I, Larran̈aga P (2007) Wrapper discretization by means of estimation of distribution algorithms. Intell Data Anal 11(5):525–545

    Google Scholar 

  13. Flores MJ, Gámez JA, Martínez AM, Puerta JM (2009) GAODE and HAODE: two proposals based on AODE to deal with continuous variables. In: ICML. ACM int conf proc series, vol 382, p 40

    Google Scholar 

  14. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163

    Article  MATH  Google Scholar 

  15. García S, Herrera F (2009) An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694

    Google Scholar 

  16. Hsu C-N, Huang H-J, Wong T-T (2000) Why discretization works for Naive Bayesian classifiers. In: ICML ’00: Proc of the 7th int conf on mach learn, San Francisco, CA, USA. Morgan Kaufmann, San Mateo, pp 399–406

    Google Scholar 

  17. Hsu C-N, Huang H-J, Wong T-T (2003) Implications of the Dirichlet assumption for discretization of continuous variables in Naive Bayesian classifiers. Mach Learn 53(3):235–263

    Article  MATH  Google Scholar 

  18. Iman R, Davenport J (1980) Approximations of the critical region of the Friedman statistic. Commun Stat, Theory Methods 9(6):571–595

    Article  Google Scholar 

  19. John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers, pp 338–345

    Google Scholar 

  20. Keogh E, Pazzani M (1999) Learning augmented Bayesian classifiers: a comparison of distribution-based and classification-based approaches. In: Proc of the 7th int workshop on AI and statistics, pp 225–230

    Google Scholar 

  21. Nemenyi PB (1963) Distribution-free multiple comparisons. PhD thesis

  22. Sahami M (1996) Learning limited dependence Bayesian classifiers. In: Proc of the 2nd int conf on knowledge discovery in databases, pp 335–338

    Google Scholar 

  23. Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2):159–196

    Article  Google Scholar 

  24. Webb GI, Conilione P (2002) Estimating bias and variance from data

  25. Webb GI, Boughton JR, Wang Z (2005) Not so Naive Bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24

    Article  MATH  Google Scholar 

  26. Weiss NA (2002) Introductory statistics, 6th edn. Greg Tobin

  27. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Mateo

    MATH  Google Scholar 

  28. Yang Y, Webb GI (2009) Discretization for Naive-Bayes learning: managing discretization bias and variance. Mach Learn 74(1):39–74

    Article  Google Scholar 

  29. Zheng Z, Webb GI (2000) Lazy learning of Bayesian rules. Mach Learn 41(1):53–84

    Article  Google Scholar 

  30. Zheng F, Webb GI (2005) A comparative study of semi-Naive Bayes methods in classification learning. In: Proc of the 4th Australasian data mining conf (AusDM05), Sydney. University of Technology, Sydney, pp 141–156

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana M. Martínez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Flores, M.J., Gámez, J.A., Martínez, A.M. et al. Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter?. Appl Intell 34, 372–385 (2011). https://doi.org/10.1007/s10489-011-0286-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-011-0286-z

Keywords

Navigation