Genetic Programming for Feature Selection Based on Feature Removal Impact in High-Dimensional Symbolic Regression | IEEE Journals & Magazine | IEEE Xplore

Genetic Programming for Feature Selection Based on Feature Removal Impact in High-Dimensional Symbolic Regression

Publisher: IEEE

Abstract:

Symbolic regression is increasingly important for discovering mathematical models for various prediction tasks. It works by searching for the arithmetic expressions that ...View more

Abstract:

Symbolic regression is increasingly important for discovering mathematical models for various prediction tasks. It works by searching for the arithmetic expressions that best represent a target variable using a set of input features. However, as the number of features increases, the search process becomes more complex. To address high-dimensional symbolic regression, this work proposes a genetic programming for feature selection method based on the impact of feature removal on the performance of SR models. Unlike existing Shapely value methods that simulate feature absence at the data level, the proposed approach suggests removing features at the model level. This approach circumvents the production of unrealistic data instances, which is a major limitation of Shapely value and permutation-based methods. Moreover, after calculating the importance of the features, a cut-off strategy, which works by injecting a number of random features and utilising their importance to automatically set a threshold, is proposed for selecting important features. The experimental results on artificial and real-world high-dimensional data sets show that, compared with state-of-the-art feature selection methods using the permutation importance and Shapely value, the proposed method not only improves the SR accuracy but also selects smaller sets of features.
Page(s): 2269 - 2282
Date of Publication: 11 March 2024
Electronic ISSN: 2471-285X
Publisher: IEEE

Funding Agency:


References

References is not available for this document.