Skip to main content
Log in

Alternative prior assumptions for improving the performance of naïve Bayesian classifiers

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The prior distribution of an attribute in a naïve Bayesian classifier is typically assumed to be a Dirichlet distribution, and this is called the Dirichlet assumption. The variables in a Dirichlet random vector can never be positively correlated and must have the same confidence level as measured by normalized variance. Both the generalized Dirichlet and the Liouville distributions include the Dirichlet distribution as a special case. These two multivariate distributions, also defined on the unit simplex, are employed to investigate the impact of the Dirichlet assumption in naïve Bayesian classifiers. We propose methods to construct appropriate generalized Dirichlet and Liouville priors for naïve Bayesian classifiers. Our experimental results on 18 data sets reveal that the generalized Dirichlet distribution has the best performance among the three distribution families. Not only is the Dirichlet assumption inappropriate, but also forcing the variables in a prior to be all positively correlated can deteriorate the performance of the naïve Bayesian classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison J (1986) The statistical analysis of compositional data. John Wiley, New York

    MATH  Google Scholar 

  • Anderson DR, Sweeney DJ, Williams TA, Chen JC (2006) Statistics for business and economics: a practical approach, Chap. 7, Thomson Learning

  • Bier VM, Yi W (1995) A Bayesian method for analyzing dependencies in precursor data. Int J Forecast 11: 25–41

    Article  Google Scholar 

  • Blake C, Merz C (1998) UCI machine learning repository, http://www.ics.uci.edu/~mlearn/MLRepository.html.

  • Cestnik B, Bratko I (1991) On estimating probabilities in tree pruning. Machine Learning–EWSL-91, European Working Session on Learning. Springer-Verlag, Berlin, Germany, pp 138–150

  • Connor RJ, Mosimann JE (1969) Concepts of independence for proportions with a generalization of the Dirichlet distribution. J Am Stat Assoc 64: 194–206

    Article  MATH  MathSciNet  Google Scholar 

  • Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29: 103–130

    Article  MATH  Google Scholar 

  • Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proceedings of the 12th international conference on machine learning. Morgan Kaufmann, San Francisco, CA, pp 194–202

  • Fang KT, Kotz S, Ng KW (1990) Symmetric multivariate and related distributions. Chapman and Hall, New York

    MATH  Google Scholar 

  • Hsu CN, Huang HJ, Wong TT (2003) Implications of the Dirichlet assumption for discretization of continuous attributes in naïve Bayesian classifiers. Mach Learn 53: 235–263

    Article  MATH  Google Scholar 

  • Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96: 161–173

    Article  MATH  MathSciNet  Google Scholar 

  • Kohavi R, Sahami M (1996) Error-based and entropy-based discretization of continuous features. In: Proceedings of the second international conference on knowledge discovery and data mining, Portland, OR, pp 114–119

  • Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. AI Research Branch, NASA Ames Research Center, Moffett Field, CA 94035, USA

  • Lochner RH (1975) A generalized Dirichlet distribution in Bayesian life testing. J Roy Stat Soc Series B 37: 103–113

    MATH  MathSciNet  Google Scholar 

  • Mitchell TM (1997) Machine learning. McGraw-Hill

  • Wilks SS (1962) Mathematical Statistics. John Wiley, New York

    MATH  Google Scholar 

  • Wong TT (1998) Generalized Dirichlet distribution in Bayesian analysis. Appl Math Comput 97: 165–181

    Article  MATH  MathSciNet  Google Scholar 

  • Wong TT (2005) A Bayesian approach employing generalized Dirichlet priors in predicting microchip yields. J Chin Inst Ind Eng 22: 210–217

    Google Scholar 

  • Wong TT (2007) Perfect aggregation of Bayesian analysis on compositional data. Stat Papers 48: 265–282

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tzu-Tsung Wong.

Additional information

Responsible editor: Charles Elkan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wong, TT. Alternative prior assumptions for improving the performance of naïve Bayesian classifiers. Data Min Knowl Disc 18, 183–213 (2009). https://doi.org/10.1007/s10618-008-0101-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-008-0101-6

Keywords

Navigation