Constrained domain maximum likelihood estimation for naive Bayes text classification

Andrés-Ferrer, Jesús; Juan, Alfons

doi:10.1007/s10044-009-0149-y

Constrained domain maximum likelihood estimation for naive Bayes text classification

THEORETICAL ADVANCES
Published: 26 February 2009

Volume 13, pages 189–196, (2010)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Jesús Andrés-Ferrer¹ &
Alfons Juan¹

273 Accesses
9 Citations
Explore all metrics

Abstract

The naive Bayes assumption in text classification has the advantage of greatly simplifying maximum likelihood estimation of unknown class-conditional word occurrence probabilities. However, these estimates are usually modified by application of a heuristic parameter smoothing technique to avoid (over-fitted) null estimates. In this work, we advocate the reduction of the parameter domain instead of parameter smoothing. This leads to a constrained domain maximum likelihood estimation problem for which we provide an iterative algorithm that solves it optimally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to Classify Text Using a Few Labeled Examples

Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization

Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data

Article 25 May 2020

References

Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, London
Hoare Z (2007) Landscapes of naïve bayes classifiers. Pattern Anal Appl 11(1):59–72
MathSciNet Google Scholar
Juan A, Ney H (2002) Reversing and smoothing the multinomial naive Bayes text classifier. In Proceedings of PRIS 2002, pp 200–212
Juan A, Vilar D, Ney H (2007) Bridging the gap between Naive Bayes and maximum entropy. In Proceedings of the PRIS 2007, Funchal (Portugal), pp 59–65
Lewis DD (1998) Naive Bayes at forty: the independence assumption in information retrieval. In: Proceedings of the ECML’98, pp 4–15
McCallum A (1998) Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.umass.edu/∼mccallum/bow/rainbow
McCallum A (2002) Industry sector data set. http://www.cs.umass.edu/∼mccallum/code-data.html
McCallum A, Nigam K (1998) A comparison of event models for Naive Bayes text classification. In: Proceedings of AAAI/ICML-98 workshop on learning for text categorization, pp 41–48
Rennie J (2001) Original 20 newsgroups data set, people.csail.mit.edu/ jrennie/ 20Newsgroups
Vidal E et al (2000) Final report of ESPRIT Research Project 30268 (EuTrans)
Vilar D, Ney H, Juan A, Vidal E (2004) Effect of feature smoothing methods in text classification tasks. In: Proceedings of PRIS 2004, pp 108–117

Download references

Acknowledgements

Work partially supported by the Spanish research programme Consolider Ingenio 2010: MIPRCV (CSD2007-00018), by the EC (FEDER), the Spanish MEC under grant TIN2006-15694-CO2-01 and the Valencian “Conselleria d’Empresa, Universitat i Ciència” under grant CTBPRA/2005/004.

Author information

Authors and Affiliations

DSIC/ITI, Universidad Politécnica de Valencia (UPV), Valencia, Spain
Jesús Andrés-Ferrer & Alfons Juan

Authors

Jesús Andrés-Ferrer
View author publications
You can also search for this author in PubMed Google Scholar
Alfons Juan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jesús Andrés-Ferrer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Andrés-Ferrer, J., Juan, A. Constrained domain maximum likelihood estimation for naive Bayes text classification. Pattern Anal Applic 13, 189–196 (2010). https://doi.org/10.1007/s10044-009-0149-y

Download citation

Received: 07 April 2008
Accepted: 15 December 2008
Published: 26 February 2009
Issue Date: May 2010
DOI: https://doi.org/10.1007/s10044-009-0149-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constrained domain maximum likelihood estimation for naive Bayes text classification

Abstract

Access this article

Similar content being viewed by others

Learning to Classify Text Using a Few Labeled Examples

Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization

Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constrained domain maximum likelihood estimation for naive Bayes text classification

Abstract

Access this article

Similar content being viewed by others

Learning to Classify Text Using a Few Labeled Examples

Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization

Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation