On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification

Schneider, Karl-Michael

doi:10.1007/978-3-540-30228-5_42

On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification

Karl-Michael Schneider⁵

Conference paper
First Online: 20 October 2004

769 Accesses
25 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3230))

Abstract

The Naive Bayes classifier exists in different versions. One version, called multi-variate Bernoulli or binary independence model, uses binary word occurrence vectors, while the multinomial model uses word frequency counts. Many publications cite this difference as the main reason for the superior performance of the multinomial Naive Bayesclassifier. We argue that this is not true. We show that when all word frequency information is eliminated from the document vectors, the multinomial Naive Bayes model performs even better. Moreover, we argue that the main reason for the difference in performance is the way that negative evidence, i.e. evidence from words that do not occur in a document, is incorporated in the model. Therefore, this paper aims at a better understanding and a clarification of the difference between the two probabilistic models of Naive Bayes.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29, 103–130 (1997)
Article Google Scholar
Lewis, D.D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Chapter Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: Learning for Text Categorization: Papers from the AAAI Workshop, pp. 41–48. AAAI Press, Menlo Park (1998); Technical Report WS-98-05
Google Scholar
Eyheramendy, S., Lewis, D.D., Madigan, D.: On the Naive Bayes model for text categorization. In: Bishop, C.M., Frey, B.J. (eds.) AI & Statistics 2003: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, pp. 332–339 (2003)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley, New York (1991)
Book Google Scholar
Church, K.W., Gale, W.A.: Poisson mixtures. Natural Language Engineering 1, 163–190 (1995)
Article Google Scholar
Katz, S.M.: Distribution of content words and phrases in text and language modelling. Natural Language Engineering 2, 15–59 (1996)
Article Google Scholar
Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), Washington, D.C., pp. 616–623. AAAI Press, Menlo Park (2003)
Google Scholar
Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.D., Stamatopoulos, P.: Learning to filter spam e-mail: A comparison of a Naive Bayesian and a memory-based approach. In: Zaragoza, H., Gallinari, P., Rajman, M. (eds.) Proc. Workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2000), Lyon, France, pp. 1–13 (2000)
Google Scholar
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to extract symbolic knowledge from the world wide web. In: Proc. 15th Conference of the American Association for Artificial Intelligence (AAAI 1998), Madison, WI, pp. 509–516. AAAI Press, Menlo Park (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of General Linguistics, University of Passau, Germany
Karl-Michael Schneider

Authors

Karl-Michael Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software and Computing Systems, University of Alicante, Spain
José Luis Vicedo
Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Spain
Patricio Martínez-Barco
Grupo de investigación del Procesamiento del Lenguaje y Sistemas de Información, Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
Rafael Muńoz
Departamento de Lenguajes y Sistemas Informáticos, Carretera de San Vicente del Raspeig, Universidad de Alicante, 03690 San Vicente del Raspeig, Alicante, Spain
Maximiliano Saiz Noeda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schneider, KM. (2004). On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds) Advances in Natural Language Processing. EsTAL 2004. Lecture Notes in Computer Science(), vol 3230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30228-5_42

Download citation

DOI: https://doi.org/10.1007/978-3-540-30228-5_42
Published: 20 October 2004
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23498-2
Online ISBN: 978-3-540-30228-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics