Adapting naive Bayes tree for text classification

Wang, Shasha; Jiang, Liangxiao; Li, Chaoqun

doi:10.1007/s10115-014-0746-y

Adapting naive Bayes tree for text classification

Regular Paper
Published: 10 April 2014

Volume 44, pages 77–89, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Shasha Wang¹,
Liangxiao Jiang¹ &
Chaoqun Li²

2218 Accesses
Explore all metrics

Abstract

Naive Bayes (NB) is one of the top 10 algorithms thanks to its simplicity, efficiency, and interpretability. To weaken its attribute independence assumption, naive Bayes tree (NBTree) has been proposed. NBTree is a hybrid algorithm, which deploys a naive Bayes classifier on each leaf node of the built decision tree and has demonstrated remarkable classification performance. When comes to text classification tasks, multinomial naive Bayes (MNB) has been a dominant modeling approach after the multi-variate Bernoulli model. Inspired by the success of NBTree, we propose a new algorithm called multinomial naive Bayes tree (MNBTree) by deploying a multinomial naive Bayes text classifier on each leaf node of the built decision tree. Different from NBTree, MNBTree builds a binary tree, in which the split attributes’ values are just divided into zero and nonzero. At the same time, MNBTree uses the information gain measure instead of the classification accuracy measure to build the tree for reducing the time consumption. To further scale up the classification performance of MNBTree, we propose its multiclass learning version called multiclass multinomial naive Bayes tree (MMNBTree) by applying the multiclass technique to MNBTree. The experimental results on a large number of widely used text classification benchmark datasets validate the effectiveness of our proposed algorithms: MNBTree and MMNBTree.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian Multinomial Naïve Bayes Classifier to Text Classification

Improved Naive Bayes with optimal correlation factor for text classification

Article 31 August 2019

A discriminative model selection approach and its application to text classification

Article 15 July 2017

References

McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: Working notes of the 1998 AAAI/ICML workshop on learning for text. AAAI Press, pp 41–48
Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the twentieth international conference on machine learning. Morgan Kaufmann, Los Altos, pp 616–623
Han EH, Karypis G (2000) Centroid-based document classification: analysis and experimental results. Springer, Berlin
Google Scholar
Han EH, Karypis G, Kumar V (2001) Text categorization using weight adjusted K-nearest neighbor classification. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 53–65
Jiang L, Wang D, Cai Z (2012) Discriminatively weighted naive Bayes and its application in text classification. Int J Artif Intell Tools 21(01):1–19. Article ID 1250007
Google Scholar
Jiang L, Cai Z, Zhang H, Wang D (2013) Naive Bayes text classifiers: a locally weighted learning approach. J Exp Theor Artif Intell 25(2):273–286
Article Google Scholar
Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp. 275–281
Losada DE, Azzopardi L (2008) Assessing multivariate Bernoulli models for information retrieval. ACM Trans Inf Syst (TOIS) 26(3):17
Article Google Scholar
Losada D (2005) Language modeling for sentence retrieval: a comparison between multiple-bernoulli models and multinomial models. In: Information Retrieval and Theory Workshop, Glasgow, UK
Kohavi R (1996) Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, pp 202–207
Shi L, Weng M, Ma X et al (2010) Rough set based decision tree ensemble algorithm for text classification. J Comput Inf Syst 6:89–95
Google Scholar
Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37
Article Google Scholar
Chickering DM (1996) Learning Bayesian networks is NP-complete. In: Fisher D, Lenz H (eds) Learning from data: artificial intelligence and statistics V. Springer, Berlin, pp 121–130
Chapter Google Scholar
Jiang L, Wang D, Cai Z, Yan X (2007) Survey of improving naive Bayes for classification. In: Proceedings of the 3rd international conference on advanced data mining and applications (ADMA’07), LNAI 4632, pp 134–145
Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented naive Bayes for class probability estimation. Knowl Based Syst 26:239–245
Article Google Scholar
Lorena AC, Carvalho AC, Gama JM (2008) A review on the combination of binary classifiers in multi-class problems. Artif Intell Rev 30(1–4):19–37
Article Google Scholar
Galar M, Fernndez A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems experimental study on one-vs-one and one-vs-all schemes. Pattern Recogn 44(8):1761–1776
Article Google Scholar
Tan PN, Steinbach M, Kumar V (2013) Introduction to data mining, 2nd edn. Addison-Wesley, Reading
Google Scholar
Aly M (2005) Survey on multiclass classification methods. Technical Report, Caltech, USA
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3 edn. Morgan Kaufmann, Los Altos. ISBN 978-0-12-374856-0
Alcal-Fdez J, Snchez L, Garca S et al (2009) KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft Comput 13(3):307–318
Article Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar
Mitchell TM (1997) Instance-based Learning. Chapter 8 in machine learning. McGraw-Hill, New York
Frank E, Hall M, Pfahringer B (2003) Locally weighted naive Bayes. In: Proceedings of the conference on uncertainty in artificial intelligence. Morgan Kaufmann, Los Altos, pp 249–256

Download references

Acknowledgments

We want to thank the anonymous reviewers for their valuable comments and suggestions. This work was partially supported by the National Natural Science Foundation of China (61203287), the Program for New Century Excellent Talents in University (NCET-12-0953), the Provincial Natural Science Foundation of Hubei (2011CDA103), and the Fundamental Research Funds for the Central Universities (CUG130504, CUG130414).

Author information

Authors and Affiliations

Department of Computer Science, China University of Geosciences, Wuhan, 430074, China
Shasha Wang & Liangxiao Jiang
Department of Mathematics, China University of Geosciences, Wuhan, 430074, China
Chaoqun Li

Authors

Shasha Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liangxiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Chaoqun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liangxiao Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Jiang, L. & Li, C. Adapting naive Bayes tree for text classification. Knowl Inf Syst 44, 77–89 (2015). https://doi.org/10.1007/s10115-014-0746-y

Download citation

Received: 14 October 2013
Revised: 04 February 2014
Accepted: 25 March 2014
Published: 10 April 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s10115-014-0746-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adapting naive Bayes tree for text classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bayesian Multinomial Naïve Bayes Classifier to Text Classification

Improved Naive Bayes with optimal correlation factor for text classification

A discriminative model selection approach and its application to text classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now