Skip to main content

A Feature Selection Method Based on Rough Set Attribute Reduction and Classical Filter-Based Feature Selection for Categorical Data Classification

  • Conference paper
  • First Online:
Applied Informatics (ICAI 2023)

Abstract

The main objective of feature selection in machine learning classification is to reduce the size of features by removing irrelevant and noisy features, with the goal of improving the accuracy and the efficiency of the classification model. Like continuous and mixed data classification, feature selection has been applied to better categorical data classification. On large datasets with tens of features, however, existing feature selection methods perform worse in terms of accuracy metrics than baseline categorical data classification models that involve full features. This paper presents a feature selection method that integrates Rough Set Attribute Reduction and Classical Filter-based feature selection method to improve the performance of categorical data classification. Two large categorical datasets from UCI repository are used to evaluate the method. Support Vector Machine, Random Forest and Multilayer Perceptron algorithms are used as machine learning classifiers. The results show that the proposed method outperforms existing feature selection models in terms of Accuracy, Precision, Recall, and F-measure for individual classes and their average weighted scores in both case studies. Benchmarking with baseline classification models, the best overall performance by the proposed method is obtained with Random Forest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986). https://doi.org/10.1007/BF00116251

    Article  Google Scholar 

  2. Oriola, O., Kotzé, E.: Improved semi-supervised learning technique for automatic detection of South African abusive language on Twitter. S. Afr. Comput. J. 32, 56–79 (2020)

    Google Scholar 

  3. Pinky, N.J., Islam, S.M., Alice, R.S.: Edibility detection of mushroom using ensemble methods. Int. J. Image Graph. Sig. Process. 11, 55–62 (2019)

    Google Scholar 

  4. Babagoli, M., Pourmahmood, M., Vahid, A.: Heuristic nonlinear regression strategy for detecting phishing websites. Soft. Comput. 23(12), 4315–4327 (2018). https://doi.org/10.1007/s00500-018-3084-2

    Article  Google Scholar 

  5. Wah, Y.B., Ibrahim, N., Hamid, H.A., Abdul-Rahman, S., Fong, S.: Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika J. Sci. Technol. 26, 329–340 (2018)

    Google Scholar 

  6. Dharani, M., Badkul, S., Gharat, K., Vidhate, A., Bhosale, D.: Detection of phishing websites using ensemble machine learning approach. In: ITM Web of Conference (ICACC-2021), vol. 40, p. 03012, pp. 1–5 (2021)

    Google Scholar 

  7. Honest, N.: A survey on feature selection techniques. GIS Sci. J. 7, 353–358 (2020)

    Google Scholar 

  8. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)

    Article  Google Scholar 

  9. Chouchoulas, A., Shen, Q.: Rough set-aided keyword reduction for text categorization. Appl. Artif. Intell. 15, 843–873 (2001)

    Article  Google Scholar 

  10. Jensen, R., Shen, Q.: Rough set-based feature selection: a review. In: Rough Computing: Theories, Technologies and Applications (2007). https://doi.org/10.4018/978-1-59904-552-8.ch003

  11. Peng, Y., Wu, Z., Jiang, J.: A novel feature selection approach for biomedical data classification. J. Biomed. Inform. 43, 15–23 (2010)

    Article  Google Scholar 

  12. Wang, L., Ke, Y.: Feature selection considering interaction, redundancy and complementarity for outlier detection in categorical data. Knowl.-Based Syst. 275, 110678 (2023)

    Article  Google Scholar 

  13. Wang, C., Wang, Y., Shao, M., Qian, Y., Chen, D.: Fuzzy rough attribute reduction for categorical data. IEEE Trans. Fuzzy Syst. 28, 818–830 (2020)

    Article  Google Scholar 

  14. Shu, W., Shen, H.: Incremental feature selection based on rough set in dynamic incomplete data. Pattern Recogn. 47, 3890–3906 (2014)

    Article  Google Scholar 

  15. Abdoos, A.A., Mianaei, P.K., Ghadikolaei, M.R.: Combined VMD-SVM based feature selection method for classification of power quality events. Appl. Soft Comput. J. 38, 637–646 (2016)

    Article  Google Scholar 

  16. Odhiambo Omuya, E., Onyango Okeyo, G., Waema Kimwele, M.: Feature selection for classification using principal component analysis and information gain. Expert Syst. Appl. 174, 114765 (2021)

    Article  Google Scholar 

  17. Erişti, H., Yildirim, Ö., Erişti, B., Demir, Y.: Optimal feature selection for classification of the power quality events using wavelet transform and least squares support vector machines. Int. J. Electr. Power Energy Syst. 49, 95–103 (2013)

    Article  Google Scholar 

  18. Chiew, K.L., Lin, C., Wong, K., Yong, K.S.C., King, W.: A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166 (2019)

    Article  Google Scholar 

  19. Garner, S.R.: WEKA: the waikato environment for knowledge analysis. In: Proceedings of the New Zealand Computer Science Research Students Conference 1995, pp. 57–64 (1995)

    Google Scholar 

  20. Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  21. Mohammad, R.M.A., MsCluskey, L., Thantah, F.: UCI Machine Learning Repository (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oluwafemi Oriola .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oriola, O., Kotzé, E., Atawodi, O. (2024). A Feature Selection Method Based on Rough Set Attribute Reduction and Classical Filter-Based Feature Selection for Categorical Data Classification. In: Florez, H., Leon, M. (eds) Applied Informatics. ICAI 2023. Communications in Computer and Information Science, vol 1874. Springer, Cham. https://doi.org/10.1007/978-3-031-46813-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46813-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46812-4

  • Online ISBN: 978-3-031-46813-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics