skip to main content
10.1145/3297156.3297256acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaiConference Proceedingsconference-collections
research-article

Naive Bayes Text Categorization Algorithm Based on TF-IDF Attribute Weighting

Published: 08 December 2018 Publication History

Abstract

As is known to us, Naive Bayes algorithm is a simple and efficient categorization algorithm. However, the assumption of conditional independence in this algorithm does not conform to objective reality which affects its categorization performance to some extent. In order to improve the categorization performance of Naive Bayes algorithm in text categorization, a Naive Bayes text categorization algorithm based on TF-IDF attribute weighting is proposed in this paper. We use the χ2 statistic to extract text features before the algorithm starts, and then assign the different weights to the different feature word sets according to the TF-IDF method. Finally, the improved Naive Bayes algorithm, TFIDFMNB, is used for text categorization of test text sets. The experimental comparison and analysis results show that the TFIDFMNB algorithm can be used for text categorization well.

References

[1]
Shasha Wang, Liangxiao Jiang, Chaoqun Li. Adapting Naive Bayes tree for text categorization{J}. Knowledge and Information Systems, 2015, 44(1).
[2]
Yong Zhou, Youwen Li, Shixiong Xia. An Improved KNN Text Categorization Algorithm Based on Clustering{J}. Journal of Computers, 2009, 4(3).
[3]
Keun Young Lee, Namil Chung, Suntae Hwang. Application of an artificial neural network (ANN) model for predicting mosquito abundances in urban areas{J}. Ecological Informatics, 2016, 36.
[4]
Li Y, Luo C, Chung S M. Weighted Naive Bayes for Text Categorization Using Positive Term-Class Dependency{J}. International Journal on Artificial Intelligence Tools, 2012, 21(1).
[5]
Jiang L, Wang D, Cai Z. Discriminatively weighted naive bayes and its application in text categorization{J}. International Journal on Artificial Intelligence Tools, 2012, 21(21):3898--3898.
[6]
Ganiz M C, George C, Pottenger W M. Higher order Naive Bayes: A novel non-IID approach to text categorization{J}. Knowledge and Data Engineering, IEEE Transactions on, 2011, 23(7): 1022--1034.
[7]
Sona Taheri, Musa Mammadov. Learning the Naive Bayes classifier with optimization models{J}. International Journal of Applied Mathematics and Computer Science, 2013, 23(4).
[8]
Hong Fei Sun, Wei Hou. Study on the Improvement of TFIDF Algorithm in Data Mining{J}. Advanced Materials Research, 2014, 3539(1042).

Cited By

View all
  • (2024)An Effective Machine Learning Approach with Hyper-parameter Tuning for Sentiment AnalysisData Intelligence10.3724/2096-7004.di.2024.0060Online publication date: 1-Nov-2024
  • (2024)The Fake News Recognition Method Based on Naïve Bayes with Improved TF-IDF AlgorithmMathematical Modeling and Simulation of Systems10.1007/978-3-031-67348-1_12(158-171)Online publication date: 7-Sep-2024
  • (2022)Algorithms for the classification of text documents, taking into account proximity in the attribute spaceModeling of systems and processes10.12737/2219-0767-2022-15-1-36-4315:1(36-43)Online publication date: 8-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CSAI '18: Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence
December 2018
641 pages
ISBN:9781450366069
DOI:10.1145/3297156
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Shenzhen University: Shenzhen University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. χ2 Statistic
  2. Naive Bayes
  3. TF-IDF
  4. Text Categorization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CSAI '18

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Effective Machine Learning Approach with Hyper-parameter Tuning for Sentiment AnalysisData Intelligence10.3724/2096-7004.di.2024.0060Online publication date: 1-Nov-2024
  • (2024)The Fake News Recognition Method Based on Naïve Bayes with Improved TF-IDF AlgorithmMathematical Modeling and Simulation of Systems10.1007/978-3-031-67348-1_12(158-171)Online publication date: 7-Sep-2024
  • (2022)Algorithms for the classification of text documents, taking into account proximity in the attribute spaceModeling of systems and processes10.12737/2219-0767-2022-15-1-36-4315:1(36-43)Online publication date: 8-Apr-2022
  • (2021)IFTA: Iterative filtering by using TF-AICL algorithm for Chinese encyclopedia knowledge refinementApplied Intelligence10.1007/s10489-021-02220-wOnline publication date: 22-Feb-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media