Smoothing Multinomial Naïve Bayes in the Presence of Imbalance

Liu, Alexander Y.; Martin, Cheryl E.

doi:10.1007/978-3-642-23199-5_4

Alexander Y. Liu²⁰ &
Cheryl E. Martin²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6871))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

2228 Accesses
3 Altmetric

Abstract

Multinomial naïve Bayes is a popular classifier used for a wide variety of applications. When applied to text classification, this classifier requires some form of smoothing when estimating parameters. Typically, Laplace smoothing is used, and researchers have proposed several other successful forms of smoothing. In this paper, we show that common preprocessing techniques for text categorization have detrimental effects when using several of these well-known smoothing methods. We also introduce a new form of smoothing for which these detrimental effects are less severe: ROSE smoothing, which can be derived from methods for cost-sensitive learning and imbalanced datasets. We show empirically on text data that ROSE smoothing performs well compared to known methods of smoothing, and is the only method tested that performs well regardless of the type of text preprocessing used. It is particularly effective compared to existing methods when the data is imbalanced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A discriminative model selection approach and its application to text classification

Article 15 July 2017

Bayes Classification Using an Approximation to the Joint Probability Distribution of the Attributes

Constrained Naïve Bayes with application to unbalanced data classification

Article Open access 20 October 2021

References

McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: The AAAI 1998 Workshop on Learning for Text Categorization, pp. 41–48. AAAI Press, Menlo Park (1998)
Google Scholar
He, F., Ding, X.: Improving Naive Bayes Text Classifier Using Smoothing Methods. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 703–707. Springer, Heidelberg (2007)
Chapter Google Scholar
Frank, E., Bouckaert, R.R.: Naive Bayes for Text Classification with Unbalanced Classes. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 503–510. Springer, Heidelberg (2006)
Chapter Google Scholar
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: The 34th Annual Meeting of the Association for Computational Linguistics (1996)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 429–449 (2002)
MATH Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Liu, A., Martin, C., La Cour, B., Ghosh, J.: Effects of oversampling versus cost-sensitive learning for Bayesian and SVM classifiers. Data Mining: Special Issue in Annals of Information Systems 8, 159–192 (2010)
Article Google Scholar
Weiss, G.M., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unequal error costs? In: The 2007 International Conference on Data Mining, DMIN 2007 (2007)
Google Scholar
Karypis, G.: CLUTO - A Clustering Toolkit. TR 02-017, University of Minnesota, Department of Computer Science and Engineering (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Applied Research Laboratories, The University of Texas, P.O. Box 8029, Austin, Texas, 78713-8029, USA
Alexander Y. Liu & Cheryl E. Martin

Authors

Alexander Y. Liu
View author publications
You can also search for this author in PubMed Google Scholar
Cheryl E. Martin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intitute of Computer Vision and Applied Computer Sciences, IBaI, Kohlenstraße 2, 04107, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, A.Y., Martin, C.E. (2011). Smoothing Multinomial Naïve Bayes in the Presence of Imbalance. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-23199-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23198-8
Online ISBN: 978-3-642-23199-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Smoothing Multinomial Naïve Bayes in the Presence of Imbalance

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A discriminative model selection approach and its application to text classification

Bayes Classification Using an Approximation to the Joint Probability Distribution of the Attributes

Constrained Naïve Bayes with application to unbalanced data classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Smoothing Multinomial Naïve Bayes in the Presence of Imbalance

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A discriminative model selection approach and its application to text classification

Bayes Classification Using an Approximation to the Joint Probability Distribution of the Attributes

Constrained Naïve Bayes with application to unbalanced data classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation