Abstract
Recent work in supervised learning has shown that naive Bayes text classifiers with strong assumptions of independence among features, such as multinomial naive Bayes (MNB), complement naive Bayes (CNB) and the one-versus-all-but-one model (OVA), have achieved remarkable classification performance. This fact raises the question of whether a naive Bayes text classifier with less restrictive assumptions can perform even better. Responding to this question, we firstly evaluate the correlation-based feature selection (CFS) approach in this paper and find that it performs even worse than the original versions. Then, we propose a CFS-based feature weighting approach to these naive Bayes text classifiers. We call our feature weighted versions FWMNB, FWCNB and FWOVA respectively. Our proposed approach weakens the strong assumptions of independence among features by weighting the correlated features. The experimental results on a large suite of benchmark datasets show that our feature weighted versions significantly outperform the original versions in terms of classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yan, J., Gao, X.: Detection and recognition of text superimposed in images base on layered method. Neurocomputing 134, 3–14 (2014)
Losada, D.E., Azzopardi, L.: Assessing multivariate Bernoulli models for information retrieval. ACM Transactions on Information Systems (TOIS) 26(3), Article No. 17 (2008)
Han, E.-H., Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)
Han, E.-H(S.), Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted K-Nearest Neighbor Classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 53–65. Springer, Heidelberg (2001)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM (1998)
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Working Notes of the 1998 AAAI/ICML Workshop on Learning for Text, pp. 41–48. AAAI Press (1998)
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 616–623. Morgan Kaufmann (2003)
Berger, A.: Error-correcting output coding for text classification. In: IJCAI 1999: Workshop on Machine Learning for Information Filtering (1999)
Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Information Retrieval, 5–31 (2001)
Jiang, L., Wang, D., Cai, Z.: Discriminatively Weighted Naive Bayes and its Application in Text Classification. International Journal on Artificial Intelligence Tools 21(01), 1250007, 19 (2012)
Jiang, L., Cai, Z., Zhang, H., Wang, D.: Naive Bayes Text Classifiers: A Locally Weighted Learning Approach. Journal of Experimental & Theoretical Artificial Intelligence 25(2), 273–286 (2013)
Li, Y., Luo, C., Chung, S.M.: Weighted Naive Bayes for Text Classification Using positive Term-Class Dependency. International Journal on Artificial Intelligence Tools 21(01), 1250008, 16 (2012)
Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning, pp. 359–366 (2000)
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn., p. 978. Morgan Kaufmann (January 2011) ISBN 978-0-12-374856-0
Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52(3), 239–281 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, S., Jiang, L., Li, C. (2014). A CFS-Based Feature Weighting Approach to Naive Bayes Text Classifiers. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_70
Download citation
DOI: https://doi.org/10.1007/978-3-319-11179-7_70
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11178-0
Online ISBN: 978-3-319-11179-7
eBook Packages: Computer ScienceComputer Science (R0)