Publication IEICE TRANSACTIONS on Information and SystemsVol.E88-DNo.5pp.1091-1094 Publication Date: 2005/05/01 Online ISSN: DOI: 10.1093/ietisy/e88-d.5.1091 Print ISSN: 0916-8532 Type of Manuscript: LETTER Category: Natural Language Processing Keyword: text classification, naive Bayes,
Full Text: PDF(253.6KB)>>
Summary: The multinomial naive Bayes model has been widely used for probabilistic text classification. However, the parameter estimation for this model sometimes generates inappropriate probabilities. In this paper, we propose a topic document model for the multinomial naive Bayes text classification, where the parameters are estimated from normalized term frequencies of each training document. Experiments are conducted on Reuters 21578 and 20 Newsgroup collections, and our proposed approach obtained a significant improvement in performance compared to the traditional multinomial naive Bayes.