Abstract
Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
P. Domingos and M. J. Pazzani. On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29(2/3):103–130, 1997.
S. Dumais, J. Plat, D. Heckerman, and M. Sahami. Inductive learning algorithms and representation for text categorization. In Proceedings of CIKM-98, 7th ACM International Conference on Information and Knowledge Management, pages 148–155, 1998.
T. Joachims. Text categorization with support vector machines: learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning, pages 137–142, 1998.
D. D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398, pages 4–15, 1998.
A. K. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization, pages 137–142, 1998.
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of SIGIR-96, 19th ACM International Conference on Research and Development in Information Retrieval, pages 21–29, 1996.
Y. Yang and C. G. Chute. An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems, 12(3):252–277, 1994.
Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval, pages 42–49, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, SB., Rim, HC., Yook, D., Lim, HS. (2002). Effective Methods for Improving Naive Bayes Text Classifiers. In: Ishizuka, M., Sattar, A. (eds) PRICAI 2002: Trends in Artificial Intelligence. PRICAI 2002. Lecture Notes in Computer Science(), vol 2417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45683-X_45
Download citation
DOI: https://doi.org/10.1007/3-540-45683-X_45
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44038-3
Online ISBN: 978-3-540-45683-4
eBook Packages: Springer Book Archive