A repetitive feature selection method based on improved ReliefF for missing data

Fan, Haiyan; Xue, Luyu; Song, Yan; Li, Ming

doi:10.1007/s10489-022-03327-4

A repetitive feature selection method based on improved ReliefF for missing data

Published: 22 March 2022

Volume 52, pages 16265–16280, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Haiyan Fan¹,
Luyu Xue²,
Yan Song ORCID: orcid.org/0000-0002-9035-9142³ &
…
Ming Li³

541 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

ReliefF is a representative and efficient algorithm amongst many feature selection methods, however, in the face of missing data, ReliefF and its variants might be invalid. To address this problem, a novel feature selection method, namely repetitive feature selection based on improved ReliefF, is proposed to obtain the optimal feature subset and make an accurate imputation delicately for missing data. The main idea is three-fold: 1) the data distribution determined by the distance of class center is introduced into the feature weights to construct a proper objective function, which greatly helps select significant and highly relevant features while removing redundant/noise ones; 2) the improved ReliefF is applied both before and after imputation to make full use of known data, and a non-negativity matrix factorization (NMF) model is established to make a sound imputation for missing data; and 3) during the NMF model learning, the mini-batch gradient descent (MBGD) technique is employed to accelerate the convergence and avoid trapping in local optima. Experiments on seven public data sets are utilized to show the effectiveness of the proposed feature selection method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iterative missing value imputation based on feature importance

Article 05 July 2024

Principal Components Analysis Based Frameworks for Efficient Missing Data Imputation Algorithms

Filter-based unsupervised feature selection using Hilbert–Schmidt independence criterion

Article 03 September 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Article MATH Google Scholar
Chebel-Morello B, Malinowski S, Senoussi H (2016) Feature selection for fault detection systems: Application to the Tennessee Eastman process. Appl Intell 44:111–122
Article Google Scholar
Cai H, Ruan P, Ng M, Akutsu T (2014) Feature weight estimation for gene selection: A local hyperlinear learning approach. BMC Bioinform, vol 15
Cekik R, Uysal AK (2020) A novel filter feature selection method using rough set for short text data. Expert Syst Appl 160:113691
Article Google Scholar
Doquire G, Verleysen M (2012) Feature selection with missing data using mutual information estimators. Neurocomputing 90:3–11
Article Google Scholar
Guyon IM, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Heikki M (2002) Local and global methods in data mining: Basic techniques and open problems. Springer, Berlin, pp 57–68
MATH Google Scholar
Hong JH, Cho SB (2006) Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recognit Lett 27(2):143–150
Article Google Scholar
Hunt R, Neshatian K, Zhang M (2012) A genetic programming approach to hyper-heuristic feature selection. In: Asia-Pacific conference on simulated evolution and learning, pp 320–330
Haq AU, Zeb A, Lei Z, Zhang D (2021) Forecasting daily stock trend using multi-filter feature selection and deep learning. Expert Syst Appl 168(3):114444
Article Google Scholar
Huang Z, Yang C, Zhou X, Huang T (2018) A hybrid feature selection method based on binary state transition algorithm and ReliefF. IEEE J Biomed Health Inform 23(5):1888–1898
Article Google Scholar
Lichman M (2016) UCI machine learning repository, [Online]. Available: http://archive.ics.uci.edu/ml
Kaiser J (2014) Dealing with missing values in data. J Syst Integr 5(1):42–51
Article Google Scholar
Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. Aaai 2:129–134
Google Scholar
Kononenko I (1994) Estimating attributes: Analysis and extensions of RELIEF. In: European conference on machine learning on machine learning. Springer, Berlin
Lall S, Sinha D, Ghosh A, Sengupta D, Bandyopadhyay S (2020) Stable feature selection using copula based mutual information. Pattern Recognit 112(1):107697
Google Scholar
Liu SG, Zhang J, Xiang Y, Zhou WL (2017) Fuzzy-based information decomposition for incomplete and imbalanced data learning. IEEE Trans Fuzzy Syst 25(6):1476–1490
Article Google Scholar
Luo X, Zhou M, Xia Y, et al. (2014) An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Trans Ind Inform 10(2):1273–1284
Article Google Scholar
Marill T, Green D (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9(1):11–17
Article Google Scholar
Mu Y, Liu W, Liu X, Fan W (2017) Stochastic gradient made stable: a manifold propagation approach for large-scale optimization. IEEE Trans Knowl Data Eng 29(2):458–471
Article Google Scholar
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Peterson LE (2009) K-nearest neighbor. Scholarpedia 4(2):1883
Article Google Scholar
Ratsch G (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320
Article MATH Google Scholar
Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RRelieff. Mach Learn 53(1-2):23–69
Article MATH Google Scholar
Shang W, Huang H, Zhu H, Lin Y, Qu Y, Wang Z (2007) A novel feature selection algorithm for text categorization. Expert Syst Appl 33(1):1–5
Article Google Scholar
Song Y, Si W, Dai F, Yang G (2020) Weighted relief with thresholds of feature selection for imbalanced data classification. Concurr Comput 32(14):e5691
Article Google Scholar
Solorio-Fernández S, Martínez-Trinidad J F, Carrasco-Ochoa JA (2020) A supervised filter feature selection method for mixed data based on spectral feature selection and information-theory redundancy analysis. Pattern Recognit Lett 138:321–328
Article Google Scholar
Sun Y (2007) Iterative RELIEF for feature weighting: Algorithms, theories, and applications. IEEE Trans Pattern Anal Mach Intell 29(6):1035–1051
Article Google Scholar
Song Y, Li M, Luo X, Yang G, Wang C (2019) Improved symmetric and nonnegative matrix factorization models for undirected, sparse and large-scaled networks: a triple factorization-based approach. IEEE Trans Ind Inform 16(5):3006–3017
Article Google Scholar
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: A review. Documentación Administrativa, pp 37– 64
Tang B, Zhang L (2020) Local preserving logistic I-Relief for semi-supervised feature selection. Neurocomputing 399(1):48–64
Article Google Scholar
Tran CT, Zhang M, Andreae P, Bing X, Giovanni S, Paolo B (2016) A wrapper feature selection approach to classification with missing data. In: European conference on the applications of evolutionary computation, vol 9597. Springer, Cham, pp 658–700
Thevenaz P, Unser M (2000) Optimization of mutual information for multiresolution image registration. IEEE Trans Image Process 9(12):2081–1099
MATH Google Scholar
Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 20(9):1100–1103
Article MATH Google Scholar
Wei M et al (2019) Bas-relief modeling from normal layers. IEEE Trans Vis Comput Graph 25 (4):1651–1665
Article Google Scholar
Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Article Google Scholar
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
MathSciNet MATH Google Scholar
Zahin SA, Ahmed CF, Alam T (2018) An effective method for classification with missing values. Appl Intell 48:3209–3230
Article Google Scholar
Zhang XX, Li TS (2012) Multivariate regression analytical method based on heuristic constructed variable under condition of incomplete data. J Comput Appl 32(8):2202–2274
Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 62073223, the Natural Science Foundation of Shanghai under Grant 22ZR1443400, and the Open Project of Key Laboratory of Aerospace Flight Dynamics and National Defense Science and Technology under Grants 6142210200304.

Author information

Authors and Affiliations

School of Business, University of Shanghai for Science and Technology, Shanghai, 200093, China
Haiyan Fan
School of Science, University of Shanghai for Science and Technology, Shanghai, 200093, China
Luyu Xue
Department of Control Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
Yan Song & Ming Li

Authors

Haiyan Fan
View author publications
You can also search for this author inPubMed Google Scholar
Luyu Xue
View author publications
You can also search for this author inPubMed Google Scholar
Yan Song
View author publications
You can also search for this author inPubMed Google Scholar
Ming Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yan Song.

Ethics declarations

Conflict of Interests

We declare that no conflict of interest exists in the submission of this manuscript, and the manuscript is approved by all authors for publication.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, H., Xue, L., Song, Y. et al. A repetitive feature selection method based on improved ReliefF for missing data. Appl Intell 52, 16265–16280 (2022). https://doi.org/10.1007/s10489-022-03327-4

Download citation

Accepted: 27 January 2022
Published: 22 March 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s10489-022-03327-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A repetitive feature selection method based on improved ReliefF for missing data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Iterative missing value imputation based on feature importance

Principal Components Analysis Based Frameworks for Efficient Missing Data Imputation Algorithms

Filter-based unsupervised feature selection using Hilbert–Schmidt independence criterion

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now