A term weighting scheme based on the measure of relevance and distinction for text categorization | IEEE Conference Publication | IEEE Xplore

A term weighting scheme based on the measure of relevance and distinction for text categorization


Abstract:

Feature selection is often considered as a key step in text categorization. In this paper, we proposed a new feature selection algorithm, named AD, which comprehensively ...Show More

Abstract:

Feature selection is often considered as a key step in text categorization. In this paper, we proposed a new feature selection algorithm, named AD, which comprehensively measures the degree of relevance and distinction of terms occur in document set. We evaluated AD on three benchmark document collections, 20-Newsgroups, Reuters-21578 and WebKB, using two classification algorithms, Naive Bayes and Support Vector Machines. The experimental results, comparing AD with six classic feature-selection algorithms, show that the proposed method AD is significantly superior to Information Gain, Mutual Information, Odds Ratio, DIA association factor, Orthogonal Centroid Feature Selection and Ambiguity Measure when Naive Bayes classifier is used and significantly outperforms IG,MI,OR,DIA,OCFS and AM when Support Vector Machines is used.
Date of Conference: 01-03 June 2015
Date Added to IEEE Xplore: 06 August 2015
Electronic ISBN:978-1-4799-8676-7
Conference Location: Takamatsu, Japan

References

References is not available for this document.