Abstract
Feature selection is an essential task in machine learning and data mining that involves identifying a subset of relevant features from a larger set. This paper proposes a novel technique for unsupervised feature selection based on a Neural Network in conjunction with an evolutionary algorithm. The proposed method aims to extract subsets of the most discriminative and relevant features from high-dimensional data, which can be eventually used for efficient and accurate machine learning. An evolutionary algorithm is employed to generate the feature subsets, and the goodness of a feature subset is evaluated through the ability of a neural network to reconstruct the whole original input space by mean squared error minimization (in an auto-encoder fashion). Experimental results demonstrate the effectiveness of the proposed approach in finding relevant feature subsets for successive learning tasks, achieving better classification and regression accuracy compared to state-of-the-art feature selection methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Promising results were also found on an industrial classified dataset for regression: the computed nRMSE on all the features was 1.83%. With Pearson filter methods, 2.5% of the features were selected, with a nRMSE of 1.86%. The wrapper-supervised approach selected the 33% of features, with an error of 1.37%. Our approach converged to 19% of the features, with 1.38% of nRMSE.
References
Abualigah, L., Khader, A.T., Al-Betar, M.: Unsupervised feature selection technique based on genetic algorithm for improving the text clustering, pp. 1–6, July 2016. https://doi.org/10.1109/CSIT.2016.7549453
Altarabichi, M.G., Nowaczyk, S., Pashami, S., Mashhadi, P.S.: Fast genetic algorithm for feature selection - a qualitative approximation approach. Expert Syst. Appl. 118528 (2023). https://doi.org/10.1016/j.eswa.2022.118528. https://www.sciencedirect.com/science/article/pii/S0957417422016049
Arenas, R.: sklearn-genetic-opt (2023). https://github.com/rodrigo-arenas/Sklearn-genetic-opt
Barbiero, P., Lutton, E., Squillero, G., Tonda, A.: A novel outlook on feature selection as a multi-objective problem. In: Idoumghar, L., Legrand, P., Liefooghe, A., Lutton, E., Monmarché, N., Schoenauer, M. (eds.) EA 2019. LNCS, vol. 12052, pp. 68–81. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45715-0_6
Barbiero, P., Squillero, G., Tonda, A.: Predictable features elimination: an unsupervised approach to feature selection. In: Nicosia, G., et al. (eds.) LOD 2021. LNCS, vol. 13163, pp. 399–412. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-95467-3_29
Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011) (2011)
Boutegrabet, W., Piot, O., Guenot, D., Gobinet, C.: Unsupervised feature selection by a genetic algorithm for mid-infrared spectral data. Anal. Chem. 94(46), 16050–16059 (2022). https://doi.org/10.1021/acs.analchem.2c03118. pMID: 36346912
De Stefano, C., Fontanella, F., Scotto di Freca, A.: Feature selection in high dimensional data by a filter-based genetic algorithm. In: Squillero, G., Sim, K. (eds.) EvoApplications 2017. LNCS, vol. 10199, pp. 506–521. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55849-3_33
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing, 2nd edn. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-44874-8
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002). https://doi.org/10.1023/A:1012487302797
Guyon, I.M.: Design of experiments for the NIPS 2003 variable selection benchmark (2003)
Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Heiss-Czedik, D.: An introduction to genetic algorithms. Artif. Life 3, 63–65 (1997)
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2011)
Martin-Bautista, M., Vila, M.A.: A survey of genetic feature selection in mining issues. In: Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol. 2, pp. 1314–1321 (1999). https://doi.org/10.1109/CEC.1999.782599
McKinney, W.: Data structures for statistical computing in Python. In: van der Walt, S., Millman, J. (eds.) Proceedings of the 9th Python in Science Conference, pp. 56–61 (2010). https://doi.org/10.25080/Majora-92bf1922-00a
Miao, J., Niu, L.: A survey on feature selection. Procedia Comput. Sci. 91, 919–926 (2016). https://doi.org/10.1016/j.procs.2016.07.111. https://www.sciencedirect.com/science/article/pii/S1877050916313047. Promoting Business Analytics and Quantitative Management of Technology: 4th International Conference on Information Technology and Quantitative Management (ITQM 2016)
Mitchell, M.: An Introduction to Genetic Algorithms (1996)
Mitchell, M.: An Introduction to Genetic Algorithms. Complex Adaptive Systems, 7th edn. Cambridge (2001)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pudjihartono, N., Fadason, T., Kempa-Liehr, A.W., O’Sullivan, J.M.: A review of feature selection methods for machine learning-based disease risk prediction. Front. Bioinform. (2022). https://doi.org/10.3389/fbinf.2022.927312. https://www.frontiersin.org/articles/10.3389/fbinf.2022.927312
Solorio-Fernández, S., Carrasco-Ochoa, J., Martínez-Trinidad, J.F.: A review of unsupervised feature selection methods. Artif. Intell. Rev. 53 (2020). https://doi.org/10.1007/s10462-019-09682-y
The Pandas Development Team: Pandas-dev/pandas: Pandas, February 2020. https://doi.org/10.5281/zenodo.3509134
Xie, J., Wang, M., Xu, S., Huang, Z., Grant, P.W.: The unsupervised feature selection algorithms based on standard deviation and cosine similarity for genomic data analysis. Front. Gen. 12 (2021). https://doi.org/10.3389/fgene.2021.684100. https://www.frontiersin.org/articles/10.3389/fgene.2021.684100
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bellarmino, N., Cantoro, R., Squillero, G. (2024). U-FLEX: Unsupervised Feature Learning with Evolutionary eXploration. In: Nicosia, G., Ojha, V., La Malfa, E., La Malfa, G., Pardalos, P.M., Umeton, R. (eds) Machine Learning, Optimization, and Data Science. LOD 2023. Lecture Notes in Computer Science, vol 14505. Springer, Cham. https://doi.org/10.1007/978-3-031-53969-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-53969-5_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53968-8
Online ISBN: 978-3-031-53969-5
eBook Packages: Computer ScienceComputer Science (R0)