Effective Methods for Improving Naive Bayes Text Classifiers

Kim, Sang-Bum; Rim, Hae-Chang; Yook, DongSuk; Lim, Heui-Seok

doi:10.1007/3-540-45683-X_45

Sang-Bum Kim³,
Hae-Chang Rim³,
DongSuk Yook³ &
…
Heui-Seok Lim⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2417))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1019 Accesses
34 Citations

Abstract

Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. Domingos and M. J. Pazzani. On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29(2/3):103–130, 1997.
Article MATH Google Scholar
S. Dumais, J. Plat, D. Heckerman, and M. Sahami. Inductive learning algorithms and representation for text categorization. In Proceedings of CIKM-98, 7th ACM International Conference on Information and Knowledge Management, pages 148–155, 1998.
Google Scholar
T. Joachims. Text categorization with support vector machines: learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning, pages 137–142, 1998.
Google Scholar
D. D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398, pages 4–15, 1998.
Google Scholar
A. K. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization, pages 137–142, 1998.
Google Scholar
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of SIGIR-96, 19th ACM International Conference on Research and Development in Information Retrieval, pages 21–29, 1996.
Google Scholar
Y. Yang and C. G. Chute. An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems, 12(3):252–277, 1994.
Article Google Scholar
Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval, pages 42–49, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of CSE, Korea University, Anam-dong 5 ka, SungPuk-gu, SEOUL, 136-701, Korea
Sang-Bum Kim, Hae-Chang Rim & DongSuk Yook
Dept. of Info&Comm, Chonan University, Anseo-Dong, Chonan, ChungChong-NamDo, 330-180, Korea
Heui-Seok Lim

Authors

Sang-Bum Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hae-Chang Rim
View author publications
You can also search for this author in PubMed Google Scholar
DongSuk Yook
View author publications
You can also search for this author in PubMed Google Scholar
Heui-Seok Lim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Science and Technology Department of Information and Communication Engineering, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
Mitsuru Ishizuka
School of Information Technology Knowledge Representation and Reasoning Unit (KRRU) Faculty of Engineering and Information Technology, Griffith University, PMB 50 Gold Coast Mail Centre, Queensland, 9726, Australia
Abdul Sattar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, SB., Rim, HC., Yook, D., Lim, HS. (2002). Effective Methods for Improving Naive Bayes Text Classifiers. In: Ishizuka, M., Sattar, A. (eds) PRICAI 2002: Trends in Artificial Intelligence. PRICAI 2002. Lecture Notes in Computer Science(), vol 2417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45683-X_45

Download citation

DOI: https://doi.org/10.1007/3-540-45683-X_45
Published: 21 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44038-3
Online ISBN: 978-3-540-45683-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics