Skip to main content

A CFS-Based Feature Weighting Approach to Naive Bayes Text Classifiers

  • Conference paper
Artificial Neural Networks and Machine Learning – ICANN 2014 (ICANN 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8681))

Included in the following conference series:

Abstract

Recent work in supervised learning has shown that naive Bayes text classifiers with strong assumptions of independence among features, such as multinomial naive Bayes (MNB), complement naive Bayes (CNB) and the one-versus-all-but-one model (OVA), have achieved remarkable classification performance. This fact raises the question of whether a naive Bayes text classifier with less restrictive assumptions can perform even better. Responding to this question, we firstly evaluate the correlation-based feature selection (CFS) approach in this paper and find that it performs even worse than the original versions. Then, we propose a CFS-based feature weighting approach to these naive Bayes text classifiers. We call our feature weighted versions FWMNB, FWCNB and FWOVA respectively. Our proposed approach weakens the strong assumptions of independence among features by weighting the correlated features. The experimental results on a large suite of benchmark datasets show that our feature weighted versions significantly outperform the original versions in terms of classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yan, J., Gao, X.: Detection and recognition of text superimposed in images base on layered method. Neurocomputing 134, 3–14 (2014)

    Article  Google Scholar 

  2. Losada, D.E., Azzopardi, L.: Assessing multivariate Bernoulli models for information retrieval. ACM Transactions on Information Systems (TOIS) 26(3), Article No. 17 (2008)

    Google Scholar 

  3. Han, E.-H., Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  4. Han, E.-H(S.), Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted K-Nearest Neighbor Classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 53–65. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  5. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM (1998)

    Google Scholar 

  6. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Working Notes of the 1998 AAAI/ICML Workshop on Learning for Text, pp. 41–48. AAAI Press (1998)

    Google Scholar 

  7. Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 616–623. Morgan Kaufmann (2003)

    Google Scholar 

  8. Berger, A.: Error-correcting output coding for text classification. In: IJCAI 1999: Workshop on Machine Learning for Information Filtering (1999)

    Google Scholar 

  9. Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Information Retrieval, 5–31 (2001)

    Google Scholar 

  10. Jiang, L., Wang, D., Cai, Z.: Discriminatively Weighted Naive Bayes and its Application in Text Classification. International Journal on Artificial Intelligence Tools 21(01), 1250007, 19 (2012)

    Google Scholar 

  11. Jiang, L., Cai, Z., Zhang, H., Wang, D.: Naive Bayes Text Classifiers: A Locally Weighted Learning Approach. Journal of Experimental & Theoretical Artificial Intelligence 25(2), 273–286 (2013)

    Article  Google Scholar 

  12. Li, Y., Luo, C., Chung, S.M.: Weighted Naive Bayes for Text Classification Using positive Term-Class Dependency. International Journal on Artificial Intelligence Tools 21(01), 1250008, 16 (2012)

    Google Scholar 

  13. Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning, pp. 359–366 (2000)

    Google Scholar 

  14. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn., p. 978. Morgan Kaufmann (January 2011) ISBN 978-0-12-374856-0

    Google Scholar 

  15. Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52(3), 239–281 (2003)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, S., Jiang, L., Li, C. (2014). A CFS-Based Feature Weighting Approach to Naive Bayes Text Classifiers. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_70

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11179-7_70

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11178-0

  • Online ISBN: 978-3-319-11179-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics