Abstract:
Feature Selection (FS) is one of the most important issues in Text Categorization (TC). Empirical studies show that Information Gain (IG) is an effective method in FS. Ho...Show MoreMetadata
Abstract:
Feature Selection (FS) is one of the most important issues in Text Categorization (TC). Empirical studies show that Information Gain (IG) is an effective method in FS. However, as traditional IG gives little attention to term frequency and takes into account the situation that the term does not appear, the effect is not ideal. In this paper, we put forward an improved information gain-based feature selection method using term frequency information and balance factor(IGTB) for statistical machine learning-based text categorization. Our feature selection method strives to precisely pick out the key feature items on the text corpus. Experiments on Reuters-21578 and WebKB collections show that our method efficiently enhances the categorization accuracy compared with the conventional information gain and other methods.
Date of Conference: 11-14 May 2014
Date Added to IEEE Xplore: 23 October 2014
ISBN Information: