Abstract
Recently, feature selection has become challenging for many machine learning disciplines. The success of most existing approaches depends on the effectiveness of searching strategies to select the most salient features from the original feature space. Unfortunately, these approaches may become impractical when dealing with high-dimensional datasets. In order to overcome this problem, recent studies rely on a boundary scheme (i.e., fixing a number of selected features) to reduce the searching space or a ranking scheme (e.g., features with less correlated scores) to guide the selection phase. However, choosing the best-fitted size for the feature subset is also a hard problem, and relying on one feature comparison criteria may ignore important features. In this paper, we propose a genetic algorithm that aims to optimize the feature subset and the appropriate number of selected features to maximize the performance of an Artificial Neural Network (ANN) classifier. To improve the efficiency of the selection phase, we combine the proposed GA with a local search algorithm based on a ranking aggregation approach. Our objective is to speed up the searching algorithm by taking advantage of different feature scoring criteria. We have assessed the performance of our approach over three categories of datasets: small, medium and high in terms of feature dimensionality (e.g., the smallest and the largest datasets include 8 and 7129 features, respectively). The empirical results have shown that our proposed approach outperforms the other state-of-the-art works when dealing with medium- and high-dimensional datasets and is comparable to them in the case of small-dimensional datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The accuracy of the classification model is calculated using the testing dataset.
References
Abellan, J., Mantas, C.J., Castellano, J.G., Moral-Garcia, S.: Increasing diversity in random forest learning algorithm via imprecise probabilities. Expert Syst. Appl. 97, 228–243 (2018)
Bhola A, Tiwari, A.K.: Machine learning based approaches for cancer classification using gene expression data. Mach. Learn. Appl. Int. J. (MLAIJ) (2015)
Borda, J.d.: Mémoire sur les élections au scrutin. Histoire de l’Academie Royale des Sciences pour 1781 (Paris, 1784) (1784)
Bouaguel, W., Brahim, A.B., Limam, M.: Feature selection by rank aggregation and genetic algorithms. In: Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing, pp. 74–81 (2013)
David, H.A., Nagaraja, H.N.: Order Statistics. Wiley, Hoboken (2004)
De Condorcet, N.: Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Cambridge University Press, Cambridge (2014)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Gasca, E., Sánchez, J.S., Alonso, R.: Eliminating redundancy and irrelevance using a new MLP-based feature selection method. Pattern Recogn. 39(2), 313–315 (2006)
Gu, Q., Li, Z., Han, J.: Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725 (2012)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003)
Hancer, E., Xue, B., Zhang, M.: Differential evolution for filter feature selection based on information theory and feature ranking. Knowl.-Based Syst. 140, 103–119 (2018)
Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007)
Ibrahim, S., Nazir, S., Velastin, S.A.: Feature selection using correlation analysis and principal component analysis for accurate breast cancer diagnosis. J. Imaging 7(11), 225 (2021)
Jia, L.: A hybrid feature selection method for software defect prediction. In: IOP Conference Series: Materials Science and Engineering, vol. 394, pp. 32–35. IOP Publishing (2018)
Kabir, M.M., Islam, M.M., Murase, K.: A new wrapper feature selection approach using neural network. Neurocomputing 73(16–18), 3273–3283 (2010)
Kabir, M.M., Shahjahan, M., Murase, K.: A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 74(17), 2914–2928 (2011)
Kemeny, J.G.: Mathematics without numbers. Daedalus 88(4), 577–591 (1959)
Kolde, R., Laur, S., Adler, P., Vilo, J.: Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28(4), 573–580 (2012)
Lai, C., Reinders, M.J., Wessels, L.: Random subspace method for multivariate feature selection. Pattern Recogn. Lett. 27(10), 1067–1076 (2006)
Li, J., Tang, J., Liu, H.: Reconstruction-based unsupervised feature selection: an embedded approach. In: IJCAI, pp. 2159–2165 (2017)
Liu, X.Y., Liang, Y., Wang, S., Yang, Z.Y., Ye, H.S.: A hybrid genetic algorithm with wrapper-embedded approaches for feature selection. IEEE Access 6, 22863–22874 (2018)
Maldonado, S., López, J.: Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification. Appl. Soft Comput. 67, 94–105 (2018)
Mokdad, F., Bouchaffra, D., Zerrouki, N., Touazi, A.: Determination of an optimal feature selection method based on maximum shapley value. In: 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 116–121. IEEE (2015)
Nekkaa, M., Boughaci, D.: A memetic algorithm with support vector machine for feature selection and classification. Memetic Comput. 7(1), 59–73 (2015). https://doi.org/10.1007/s12293-015-0153-2
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Qiu, C.: A novel multi-swarm particle swarm optimization for feature selection. Genet. Program Evolvable Mach. 20(4), 503–529 (2019). https://doi.org/10.1007/s10710-019-09358-0
Quanquan Gu, Zhenhui Li, J.H.: Generalized fisher score for feature selection. In: UAI 2011: Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (2011)
Rahman, M.A., Muniyandi, R.C.: Feature selection from colon cancer dataset for cancer classification using artificial neural network. Int. J. Adv. Sci. Eng. Inf. Technol. (2018)
Taherkhani, A., Cosma, G., McGinnity, T.M.: Deep-FS: a feature selection algorithm for deep Boltzmann machines. Neurocomputing 322, 22–37 (2018)
Thede, S.M.: An introduction to genetic algorithms. J. Comput. Sci. Coll. 20(1), 115–123 (2004)
Li, X., Zhang, J., Safara, F.: Improving the accuracy of diabetes diagnosis applications through a hybrid feature selection algorithm. Neural Process. Lett. (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Trung, B.Q., Duc, L.M., Anh, B.T.M. (2022). A Hybrid Approach Based on Genetic Algorithm with Ranking Aggregation for Feature Selection. In: Fujita, H., Fournier-Viger, P., Ali, M., Wang, Y. (eds) Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. IEA/AIE 2022. Lecture Notes in Computer Science(), vol 13343. Springer, Cham. https://doi.org/10.1007/978-3-031-08530-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-08530-7_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08529-1
Online ISBN: 978-3-031-08530-7
eBook Packages: Computer ScienceComputer Science (R0)