A CFS-Based Feature Weighting Approach to Naive Bayes Text Classifiers

Wang, Shasha; Jiang, Liangxiao; Li, Chaoqun

doi:10.1007/978-3-319-11179-7_70

Shasha Wang²¹,
Liangxiao Jiang²¹ &
Chaoqun Li²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8681))

Included in the following conference series:

International Conference on Artificial Neural Networks

4309 Accesses
4 Citations

Abstract

Recent work in supervised learning has shown that naive Bayes text classifiers with strong assumptions of independence among features, such as multinomial naive Bayes (MNB), complement naive Bayes (CNB) and the one-versus-all-but-one model (OVA), have achieved remarkable classification performance. This fact raises the question of whether a naive Bayes text classifier with less restrictive assumptions can perform even better. Responding to this question, we firstly evaluate the correlation-based feature selection (CFS) approach in this paper and find that it performs even worse than the original versions. Then, we propose a CFS-based feature weighting approach to these naive Bayes text classifiers. We call our feature weighted versions FWMNB, FWCNB and FWOVA respectively. Our proposed approach weakens the strong assumptions of independence among features by weighting the correlated features. The experimental results on a large suite of benchmark datasets show that our feature weighted versions significantly outperform the original versions in terms of classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yan, J., Gao, X.: Detection and recognition of text superimposed in images base on layered method. Neurocomputing 134, 3–14 (2014)
Article Google Scholar
Losada, D.E., Azzopardi, L.: Assessing multivariate Bernoulli models for information retrieval. ACM Transactions on Information Systems (TOIS) 26(3), Article No. 17 (2008)
Google Scholar
Han, E.-H., Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)
Chapter Google Scholar
Han, E.-H(S.), Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted K-Nearest Neighbor Classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 53–65. Springer, Heidelberg (2001)
Chapter Google Scholar
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM (1998)
Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Working Notes of the 1998 AAAI/ICML Workshop on Learning for Text, pp. 41–48. AAAI Press (1998)
Google Scholar
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 616–623. Morgan Kaufmann (2003)
Google Scholar
Berger, A.: Error-correcting output coding for text classification. In: IJCAI 1999: Workshop on Machine Learning for Information Filtering (1999)
Google Scholar
Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Information Retrieval, 5–31 (2001)
Google Scholar
Jiang, L., Wang, D., Cai, Z.: Discriminatively Weighted Naive Bayes and its Application in Text Classification. International Journal on Artificial Intelligence Tools 21(01), 1250007, 19 (2012)
Google Scholar
Jiang, L., Cai, Z., Zhang, H., Wang, D.: Naive Bayes Text Classifiers: A Locally Weighted Learning Approach. Journal of Experimental & Theoretical Artificial Intelligence 25(2), 273–286 (2013)
Article Google Scholar
Li, Y., Luo, C., Chung, S.M.: Weighted Naive Bayes for Text Classification Using positive Term-Class Dependency. International Journal on Artificial Intelligence Tools 21(01), 1250008, 16 (2012)
Google Scholar
Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning, pp. 359–366 (2000)
Google Scholar
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn., p. 978. Morgan Kaufmann (January 2011) ISBN 978-0-12-374856-0
Google Scholar
Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52(3), 239–281 (2003)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, China University of Geosciences, Wuhan, Hubei, China, 430074
Shasha Wang & Liangxiao Jiang
Department of Mathematics, China University of Geosciences, Wuhan, Hubei, China, 430074
Chaoqun Li

Authors

Shasha Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liangxiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Chaoqun Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, University of Hamburg, Vogt-Kölln-Straße 30, 22527, Hamburg, Germany
Stefan Wermter , Cornelius Weber & Sven Magg , &
Department of Informatics, Nicolaus Compernicus University, ul. Grudziądzka 5, 87-100, Torun, Poland
Włodzisław Duch
Department of Modern Languages, University of Helsinki, P.O. Box 24, 00014, Helsinki, Finland
Timo Honkela
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev str. bl. 25A, 1113, Sofia, Bulgaria
Petia Koprinkova-Hristova
Institute of Neural Information Processing, University of Ulm, 89069, Oberer Eselsberg, Ulm, Germany
Günther Palm
Department of Information Systems, Quartier UNIL-Dorigny, Bâtiment Internef, University of Lausanne, 1015, Lausanne, Switzerland
Alessandro E. P. Villa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Jiang, L., Li, C. (2014). A CFS-Based Feature Weighting Approach to Naive Bayes Text Classifiers. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_70

Download citation

DOI: https://doi.org/10.1007/978-3-319-11179-7_70
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11178-0
Online ISBN: 978-3-319-11179-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics