skip to main content
10.1145/3380625.3380677acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmssConference Proceedingsconference-collections
research-article

Comparison on Feature Selection Methods for Text Classification

Published: 19 May 2020 Publication History

Abstract

The high-dimensional text data always contains a large quantity of noisy terms which bring negative effects on the performance of text classification. Feature selection is the common solution for dimension reduction in text classification. The choices of feature selection methods for text classification have significant impacts on classification accuracy. According to our literature review, few recent studies of feature selection focus on performance comparisons on feature selection methods. To fill this gap, this paper conducts discussions to compare performances of typical feature selection methods which are commonly involved in previous studies for text classification. Firstly, we introduce and discuss a series of typical feature selection methods in previous studies for text classification in details. Secondly, we conduct comparison experiments on four benchmark datasets to compare the effectiveness of twenty typical feature selection methods in text classification. Finally, we give conclusions on performance of the typical feature selection methods. The result of this paper gives a guideline for selecting appropriate feature selection methods for text classification academic analysis or real-world text classification applications.

References

[1]
Zhang, S., Chen, Y. & Huang, X. L. (2019). Text Classification of Public Feedbacks using Convolutional Neural Network Based on Differential Evolution Algorithm. International Journal of Computers Communications & Control, 14, 1(Feb. 2019), 124--134. DOI=10.15837/ijccc.2019.1.3420.
[2]
Mujtaba, G., Shuib, L. & Raj, R. G. 2019. Detection of Suspicious Terrorist Emails Using Text Classification: A Review. Malaysian Journal of Computer Science, 31, 4(2018), 271--299. DOI=10.22452/mjcs.vol31no4.3.
[3]
Bharadwaj, S., Sridhar, S. & Choudhary, R. 2018. Persona Traits Identification based on Myers-Briggs Type IndicatorMBTI -- A Text ClassificationApproach. In Proceedings of the 7th International Conference on Computing, Communications and Informatics ICACCI (Bangalore, India, Sep. 19-22, 2018), IEEE, New York, NY, 1076--1082.
[4]
Wenando, F. A., Adji, T. B. & Ardiyanto, I. 2017. Text classification to detect student level of understanding in prior knowledge activation process. Advanced Science Letters, 23, 3(Mar. 2017), 2285--2287. DOI=10.1166/asl.2017.8768.
[5]
Parwez, M., A., Abulaish, M. & Jahiruddin. 2019. Multi-Label Classification of Microblogging Texts Using Convolution Neural Network. IEEE Access, 7(2019), 68678--68691. DOI=10.1109/ACCESS.2019.2919494.
[6]
Viegas, F., Rocha, L., Resende, E., et al. 2018. Exploiting efficient and effective lazy Semi-Bayesian strategies for text classification. Neurocomputing, 307(Sep. 2018), 153--171. DOI=10.1016/j.neucom.2018.04.033.
[7]
Jiang, M., Liang, Y., Feng, X., et al. 2018. Text classification based on deep belief network and softmax regression. Neural Computing & Applications, 29, 1(Jan. 2018), 61--70. DOI=10.1007/s00521-016-2401-x.
[8]
Kilimci, Z., H. & Akyokus, S. 2018. Deep Learning- and Word Embedding-Based Heterogeneous Classifier Ensembles for Text Classification. Complexity, 2018(2018), 1--10. DOI=10.1155/2018/7130146.
[9]
Lee, J., Park, J., Kim, H. C., et al. 2019a. Competitive particle swarm optimization for multi-category text feature selection. Entropy, 21, 6(Jun. 2019). DOI=10.3390/e21060602.
[10]
Kushwaha, N. & Pant, M. 2018. Link based BPSO for feature selection in big data text clustering. Future Generation Computer Systems-The International Journal of Escience, 82(May. 2018), 109--199. DOI=10.1016/j.future.2017.12.005.
[11]
Abualigah, L. M. & Khader, A. T. 2017. Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. Journal of Supercomputing, 73, 11(Nov. 2017), 4773--4795. DOI=10.1007/s11227-017-2046-2.
[12]
Onan, A. & Korukoglu, S. 2017. A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43, 1(Feb. 2017), 25--38. DOI=10.1177/0165551515613226.
[13]
Ahmad, S. R., Abu Bakar, A. & Yaaku, M. R. 2019. Ant colony optimization for text feature selection in sentiment analysis. Intelligent Data Analysis, 23, 1(2019), 133--158. DOI=10.3233/IDA-173740.
[14]
Huang, C., Zhu, J., Liang Y., et al. 2019. An efficient automatic multiple objectives optimization feature selection strategy for internet text classification. International Journal of Machine Learning and Cybernetics, 10, 5(May. 2019), 1151--1163. DOI= 10.1007/s13042-018-0793-x.
[15]
Kim, K. & Z. zang, S. Y. 2019. Trigonometric comparison measure: A feature selection method for text categorization. Data & Knowledge Engineering, 119(Jan. 2019), 1--21. DOI=10.1155/2018/7130146.
[16]
Lee, J., Yu, I., Park, J., et al. 2019b. Memetic feature selection for multilabel text categorization label frequency difference. Information Sciences, 485(Jun. 2019), 263--280. DOI=10.1016/j.ins.2019.02.021.
[17]
Sahin, D. O. & Kilic, E. 2019. Two new feature selection metrics for text classification. Automatika, 60, 2(2019), 162--171. DOI=10.1080/00051144.2019.1602293.
[18]
Labani, M., Moradi, P., Ahmadizar, F., et al. 2018. A novel multivariate filter method for feature selection in text classification problems. Engineering Applications of Artificial in Intelligence, 70(Apr. 2018), 25--37. DOI=10.1016/j.engappai.2017.12.014.
[19]
Rehman, A., Javed, K. & Babri, H. A. 2017. Feature selection based on a normalized difference measure for text classification. Information Processing & Mangement, 53, 2(Mar. 2017), 473--489. DOI=10.1016/j.ipm.2016.12.004.
[20]
Tang, X., Dai, Y., Xiang, Y. 2019. Feature selection based on feature interactions with application to text categorization. Expert Systems with Applications, 120(Apr. 2019), 207--216. DOI=10.1016/j.eswa.2018.11.018.
[21]
Li, Z., Lu, W., Sun, Z., et al. 2017. A parallel feature selection method study for text classification. Neural Computing & Applications, 28, 1(Dec. 2017), S513-S524. DOI=10.1007/s00521-016-2351-3.
[22]
Wang, H., Hong, M. & Lay, R. Y. K. 2019. Utility-based feature selection for text classification. Knowledge and information systems, 61, 1(Oct. 2019), 197--226. DOI= 10.1007/s10115-018-1281-z.
[23]
Wang, H., Hong, M. 2019. Supervised Hebb rule based feature selection for text classification. Information Processing & Management, 56, 1(Jan. 2019), 167--191. DOI= 10.1016/j.ipm.2018.09.004.

Cited By

View all
  • (2024)Evaluating text classification: A benchmark studyExpert Systems with Applications10.1016/j.eswa.2024.124302254(124302)Online publication date: Nov-2024
  • (2024)A comprehensive review of cyberbullying-related content classification in online social mediaExpert Systems with Applications10.1016/j.eswa.2023.122644244(122644)Online publication date: Jun-2024
  • (2023)TF-Predictor: Transformer-Based Prerouting Path Delay Prediction FrameworkIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.321675242:7(2227-2237)Online publication date: Jul-2023
  • Show More Cited By

Index Terms

  1. Comparison on Feature Selection Methods for Text Classification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMSS 2020: Proceedings of the 2020 4th International Conference on Management Engineering, Software Engineering and Service Sciences
    January 2020
    301 pages
    ISBN:9781450376419
    DOI:10.1145/3380625
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • China University of Geosciences

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 May 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Feature selection
    2. Text classification
    3. Text mining

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICMSS 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Evaluating text classification: A benchmark studyExpert Systems with Applications10.1016/j.eswa.2024.124302254(124302)Online publication date: Nov-2024
    • (2024)A comprehensive review of cyberbullying-related content classification in online social mediaExpert Systems with Applications10.1016/j.eswa.2023.122644244(122644)Online publication date: Jun-2024
    • (2023)TF-Predictor: Transformer-Based Prerouting Path Delay Prediction FrameworkIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.321675242:7(2227-2237)Online publication date: Jul-2023
    • (2023)Experimental Analysis of the Machine Learning Algorithms for Crime Web Page ClassificationIETE Journal of Research10.1080/03772063.2023.222251370:5(4890-4902)Online publication date: 13-Jun-2023
    • (2021)A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunitiesNeural Computing and Applications10.1007/s00521-021-06406-833:22(15091-15118)Online publication date: 1-Nov-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media