Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter?

Flores, M. Julia; Gámez, José A.; Martínez, Ana M.; Puerta, José M.

doi:10.1007/s10489-011-0286-z

Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter?

Published: 06 April 2011

Volume 34, pages 372–385, (2011)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

M. Julia Flores¹,
José A. Gámez¹,
Ana M. Martínez¹ &
…
José M. Puerta¹

407 Accesses
28 Citations
Explore all metrics

Abstract

Within the framework of Bayesian networks (BNs), most classifiers assume that the variables involved are of a discrete nature, but this assumption rarely holds in real problems. Despite the loss of information discretization entails, it is a direct easy-to-use mechanism that can offer some benefits: sometimes discretization improves the run time for certain algorithms; it provides a reduction in the value set and then a reduction in the noise which might be present in the data; in other cases, there are some Bayesian methods that can only deal with discrete variables. Hence, even though there are many ways to deal with continuous variables other than discretization, it is still commonly used. This paper presents a study of the impact of using different discretization strategies on a set of representative BN classifiers, with a significant sample consisting of 26 datasets. For this comparison, we have chosen Naive Bayes (NB) together with several other semi-Naive Bayes classifiers: Tree-Augmented Naive Bayes (TAN), k-Dependence Bayesian (KDB), Aggregating One-Dependence Estimators (AODE) and Hybrid AODE (HAODE). Also, we have included an augmented Bayesian network created by using a hill climbing algorithm (BNHC). With this comparison we analyse to what extent the type of discretization method affects classifier performance in terms of accuracy and bias-variance discretization. Our main conclusion is that even if a discretization method produces different results for a particular dataset, it does not really have an effect when classifiers are being compared. That is, given a set of datasets, accuracy values might vary but the classifier ranking is generally maintained. This is a very useful outcome, assuming that the type of discretization applied is not decisive future experiments can be d times faster, d being the number of discretization methods considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian network classifiers using ensembles and smoothing

Article 30 March 2020

Handling Different Levels of Granularity within Naive Bayes Classifiers

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

Article 22 May 2018

References

Asuncion A, Newman DJ (2007) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html
Bouckaert RR (2005) Bayesian network classifiers in Weka. Technical report
Buntine W (1996) A guide to the literature on learning probabilistic networks from data. IEEE Trans Knowl Data Eng 8:195–210
Article Google Scholar
Casella G, Berger R (2001) Statistical inference. Duxbury Resource Center
Chmielewski MR, Grzrmala-Busse J (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15(4):319–331
Article MATH Google Scholar
Chow C, Liu C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467
Article MathSciNet MATH Google Scholar
Collection of Datasets avalaibles from the Weka Official Homepage (2008) http://www.cs.waikato.ac.nz/ml/weka/
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proc of the 12th int conf on mach learn, pp 194–202
Google Scholar
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc of the 13th int joint conf on AI, pp 1022–1027
Google Scholar
Fisher RA (1959) Statistical methods and scientific inference, 2nd edn. Oliver and Boyd, Edinburgh
Google Scholar
Flores JL, Inza I, Larran̈aga P (2007) Wrapper discretization by means of estimation of distribution algorithms. Intell Data Anal 11(5):525–545
Google Scholar
Flores MJ, Gámez JA, Martínez AM, Puerta JM (2009) GAODE and HAODE: two proposals based on AODE to deal with continuous variables. In: ICML. ACM int conf proc series, vol 382, p 40
Google Scholar
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163
Article MATH Google Scholar
García S, Herrera F (2009) An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694
Google Scholar
Hsu C-N, Huang H-J, Wong T-T (2000) Why discretization works for Naive Bayesian classifiers. In: ICML ’00: Proc of the 7th int conf on mach learn, San Francisco, CA, USA. Morgan Kaufmann, San Mateo, pp 399–406
Google Scholar
Hsu C-N, Huang H-J, Wong T-T (2003) Implications of the Dirichlet assumption for discretization of continuous variables in Naive Bayesian classifiers. Mach Learn 53(3):235–263
Article MATH Google Scholar
Iman R, Davenport J (1980) Approximations of the critical region of the Friedman statistic. Commun Stat, Theory Methods 9(6):571–595
Article Google Scholar
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers, pp 338–345
Google Scholar
Keogh E, Pazzani M (1999) Learning augmented Bayesian classifiers: a comparison of distribution-based and classification-based approaches. In: Proc of the 7th int workshop on AI and statistics, pp 225–230
Google Scholar
Nemenyi PB (1963) Distribution-free multiple comparisons. PhD thesis
Sahami M (1996) Learning limited dependence Bayesian classifiers. In: Proc of the 2nd int conf on knowledge discovery in databases, pp 335–338
Google Scholar
Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2):159–196
Article Google Scholar
Webb GI, Conilione P (2002) Estimating bias and variance from data
Webb GI, Boughton JR, Wang Z (2005) Not so Naive Bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24
Article MATH Google Scholar
Weiss NA (2002) Introductory statistics, 6th edn. Greg Tobin
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Mateo
MATH Google Scholar
Yang Y, Webb GI (2009) Discretization for Naive-Bayes learning: managing discretization bias and variance. Mach Learn 74(1):39–74
Article Google Scholar
Zheng Z, Webb GI (2000) Lazy learning of Bayesian rules. Mach Learn 41(1):53–84
Article Google Scholar
Zheng F, Webb GI (2005) A comparative study of semi-Naive Bayes methods in classification learning. In: Proc of the 4th Australasian data mining conf (AusDM05), Sydney. University of Technology, Sydney, pp 141–156
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Systems Department, Intelligent Systems & Data Mining—SIMD, I3A, University of Castilla-La Mancha, Albacete, Spain
M. Julia Flores, José A. Gámez, Ana M. Martínez & José M. Puerta

Authors

M. Julia Flores
View author publications
You can also search for this author in PubMed Google Scholar
José A. Gámez
View author publications
You can also search for this author in PubMed Google Scholar
Ana M. Martínez
View author publications
You can also search for this author in PubMed Google Scholar
José M. Puerta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ana M. Martínez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Flores, M.J., Gámez, J.A., Martínez, A.M. et al. Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter?. Appl Intell 34, 372–385 (2011). https://doi.org/10.1007/s10489-011-0286-z

Download citation

Published: 06 April 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s10489-011-0286-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter?

Abstract

Access this article

Similar content being viewed by others

Bayesian network classifiers using ensembles and smoothing

Handling Different Levels of Granularity within Naive Bayes Classifiers

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter?

Abstract

Access this article

Similar content being viewed by others

Bayesian network classifiers using ensembles and smoothing

Handling Different Levels of Granularity within Naive Bayes Classifiers

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation