Abstract
The main objective of feature selection in machine learning classification is to reduce the size of features by removing irrelevant and noisy features, with the goal of improving the accuracy and the efficiency of the classification model. Like continuous and mixed data classification, feature selection has been applied to better categorical data classification. On large datasets with tens of features, however, existing feature selection methods perform worse in terms of accuracy metrics than baseline categorical data classification models that involve full features. This paper presents a feature selection method that integrates Rough Set Attribute Reduction and Classical Filter-based feature selection method to improve the performance of categorical data classification. Two large categorical datasets from UCI repository are used to evaluate the method. Support Vector Machine, Random Forest and Multilayer Perceptron algorithms are used as machine learning classifiers. The results show that the proposed method outperforms existing feature selection models in terms of Accuracy, Precision, Recall, and F-measure for individual classes and their average weighted scores in both case studies. Benchmarking with baseline classification models, the best overall performance by the proposed method is obtained with Random Forest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986). https://doi.org/10.1007/BF00116251
Oriola, O., Kotzé, E.: Improved semi-supervised learning technique for automatic detection of South African abusive language on Twitter. S. Afr. Comput. J. 32, 56–79 (2020)
Pinky, N.J., Islam, S.M., Alice, R.S.: Edibility detection of mushroom using ensemble methods. Int. J. Image Graph. Sig. Process. 11, 55–62 (2019)
Babagoli, M., Pourmahmood, M., Vahid, A.: Heuristic nonlinear regression strategy for detecting phishing websites. Soft. Comput. 23(12), 4315–4327 (2018). https://doi.org/10.1007/s00500-018-3084-2
Wah, Y.B., Ibrahim, N., Hamid, H.A., Abdul-Rahman, S., Fong, S.: Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika J. Sci. Technol. 26, 329–340 (2018)
Dharani, M., Badkul, S., Gharat, K., Vidhate, A., Bhosale, D.: Detection of phishing websites using ensemble machine learning approach. In: ITM Web of Conference (ICACC-2021), vol. 40, p. 03012, pp. 1–5 (2021)
Honest, N.: A survey on feature selection techniques. GIS Sci. J. 7, 353–358 (2020)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)
Chouchoulas, A., Shen, Q.: Rough set-aided keyword reduction for text categorization. Appl. Artif. Intell. 15, 843–873 (2001)
Jensen, R., Shen, Q.: Rough set-based feature selection: a review. In: Rough Computing: Theories, Technologies and Applications (2007). https://doi.org/10.4018/978-1-59904-552-8.ch003
Peng, Y., Wu, Z., Jiang, J.: A novel feature selection approach for biomedical data classification. J. Biomed. Inform. 43, 15–23 (2010)
Wang, L., Ke, Y.: Feature selection considering interaction, redundancy and complementarity for outlier detection in categorical data. Knowl.-Based Syst. 275, 110678 (2023)
Wang, C., Wang, Y., Shao, M., Qian, Y., Chen, D.: Fuzzy rough attribute reduction for categorical data. IEEE Trans. Fuzzy Syst. 28, 818–830 (2020)
Shu, W., Shen, H.: Incremental feature selection based on rough set in dynamic incomplete data. Pattern Recogn. 47, 3890–3906 (2014)
Abdoos, A.A., Mianaei, P.K., Ghadikolaei, M.R.: Combined VMD-SVM based feature selection method for classification of power quality events. Appl. Soft Comput. J. 38, 637–646 (2016)
Odhiambo Omuya, E., Onyango Okeyo, G., Waema Kimwele, M.: Feature selection for classification using principal component analysis and information gain. Expert Syst. Appl. 174, 114765 (2021)
Erişti, H., Yildirim, Ö., Erişti, B., Demir, Y.: Optimal feature selection for classification of the power quality events using wavelet transform and least squares support vector machines. Int. J. Electr. Power Energy Syst. 49, 95–103 (2013)
Chiew, K.L., Lin, C., Wong, K., Yong, K.S.C., King, W.: A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166 (2019)
Garner, S.R.: WEKA: the waikato environment for knowledge analysis. In: Proceedings of the New Zealand Computer Science Research Students Conference 1995, pp. 57–64 (1995)
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Mohammad, R.M.A., MsCluskey, L., Thantah, F.: UCI Machine Learning Repository (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Oriola, O., Kotzé, E., Atawodi, O. (2024). A Feature Selection Method Based on Rough Set Attribute Reduction and Classical Filter-Based Feature Selection for Categorical Data Classification. In: Florez, H., Leon, M. (eds) Applied Informatics. ICAI 2023. Communications in Computer and Information Science, vol 1874. Springer, Cham. https://doi.org/10.1007/978-3-031-46813-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-46813-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46812-4
Online ISBN: 978-3-031-46813-1
eBook Packages: Computer ScienceComputer Science (R0)