Abstract
Feature selection is an essential preprocessing requirement when solving a classification problem. In this respect, the Relief algorithm and its derivatives have been demonstrated to be a class of successful feature selectors. However, the computational cost of these algorithms is very high when large-scale datasets are processed. To solve this problem, we propose the fast ReliefF algorithm based on the information granulation of instances (IG-FReliefF). The algorithm uses K-means to granulate the dataset and selects the significant granules among them using the criteria defined by information entropy and information granulation, and then evaluates each feature on the dataset composed of the selected granules. Extensive experiments show that the proposed algorithm is more efficient than the existing representative algorithms, especially on large-scale data sets, and the proposed algorithm is almost the same as the comparison algorithm in terms of classification performance.
Similar content being viewed by others
References
Fawley WJ, PiatetskyShapiro G, Matheus CJ (1992) Knowledge discovery in databases: an overview. Ai Mag 13(3):3–16
Han JW, Kamber M (2006) Data mining: concepts and techniques. Data Min Conc Mod Methods Algorithms Sec Ed 5(4):1–18
Zhang C, Li HX, Chen CL, Zhou XZ (2020) Nonnegative representation based discriminant projection for face recognition. Int J Mach Learn Cybern (10)
Li HX, Zhang LB, Huang B, Zhou XZ (2020) Cost-sensitive dual-bidirectional linear discriminant analysis. Inform Sci 510:283–303
Destrero A, Mosci S, Mol CD, Verri A, Odone F (2009) Feature selection for high-dimensional data. Comput Manag Sci 6(1):25–40
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. AAAI 2:129–134
Kononenko I (1994) Estimating attributes: analysis and extensions of Relief. Mach Learn ECML 94:171–182
Robnik M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1–2):23–69
Sun YJ (2007) Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans Pattern Anal Mach Intell 29(6):1035–1051
Sun YJ, Todorovic S, Goodison S (2008) A feature selection algorithm capable of handling extremely large data dimensionality. In: Proceedings of the SIAM International Conference on Data Mining, Atlanta, Georgia, USA 530–540
Cai H, Ruan P, Ng M, Akutsu T (2014) Feature weight estimation for gene selection: a local hyperlinear learning approach. BMC Bioinform 15(1):1–13
Huang XJ, Zhang L, Wang BJ, Zhang Z, Li FZ (2018) Feature weight estimation based on dynamic representation and neighbor sparse reconstruction. Pattern Recogn 81(9):388–403
Zhang L, Huang XJ, Zhou WD (2019) Logistic local hyperplane-Relief: a feature weighting method for classification. Knowl Based Syst 181:104741
Liu XM, Tang JS, Liu J, Feng ZL (2008) A Semi-Supervised Relief based feature extraction algorithm. In: 2nd International Conference on Future Generation Communication and Networking Symposia. Piscataway NJ: IEEE Computer Society 3:3–6
Cheng YB, Cai YP, Sun YJ, Jian L (2008) Semi-supervised feature selection under logistic I-RELIEF framework. In: IEEE the 19th International Conference on Pattern Recognition. Piscataway NJ: 1–4
Zafra A, Pechenizkiy M, Ventura S (2012) ReliefF-MI: an extension of ReliefF to multiple instance learning. Neurocomputing 75(1):210–218
Song Y, Si WY, Dai FF, Yang GS (2020) Weighted ReliefF with threshold constraints of feature selection for imbalanced data classification. Concurr Comput Pract Exp 32(14):1–13
Kilicarslan S, Adem K, Celik M (2020) Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network. Med Hypoth 137:109577
Jin LL, Zeng QR, He JZ, Feng YJ, Zhou SQ, Wu Y (2019) A ReliefF-SVM-based method for marking dopamine-based disease characteristics: a study on SWEDD and parkinson‘’s disease. Behav Brain Res 356:400–407
Praveena HD, Subhas C, Naidu KR (2020) Automatic epileptic seizure recognition using ReliefF feature selection and long short term memory classifier. J Ambient Intell Hum Comput.
Wang Z, Zhang Y, Chen ZC, Yang H, Sun YX, Kang JM, Yang Y, Liang XJ (2016) Application of ReliefF algorithm to selecting feature sets for classification of high resolution remote sensing image. In: 2016 IEEE International Geoscience and Remote Sensing Symposium 755–758
Dou DY, Wu WZ, Yang JG, Zhang Y (2019) Classification of coal and gangue under multiple surface conditions via machine vision and Relief-SVM. Powder Technol 356:1024–1028
Zhou ZB, Wang YF, He XR, Zhang XC (2020) Optimization of random forests algorithm based on ReliefF-SA. IOP Conf Ser Mater Sci Eng 768:072065
Baskar SS, Arockiam L (2014) C-LAS Relief-An improved feature selection technique in data mining. Int J Comput Appl 83(13):33–36
Liu Y, Tang F, Zeng Z (2015) Feature selection based on dependency margin. IEEE Trans Cybern 45(6):1209–1221
Shi SB, Li GN, Chen HX, Liu JY, Hu YP, Xing L, Hu WJ (2017) Refrigerant charge fault diagnosis in the VRF system using bayesian artificial neural network combined with ReliefF filter. Appl Thermal Eng 112:698–706
Huang Y, Mccullagh PJ, Black ND (2009) An optimization of ReliefF for classification in large datasets. Data Knowl Eng 68(11):1348–1356
Yao YY (2009) Interpreting concept learning in cognitive informatics and granular computing. IEEE Trans Syst Man Cybern Part B 39(4):855–866
Niu JJ, Huang CC, Li JH, Fan M (2018) Parallel computing techniques for concept-cognitive learning based on granular computing. Int J Mach Learn Cybern 9(11):1785–1805
Mi YL, Shi Y, Li JH, Liu WQ, Yan MY (2020) Fuzzy-based concept learning method: exploiting data with fuzzy conceptual clustering. IEEE Tran Cybern 42(1):1–12
Yao YY (2020) Tri-level thinking: models of three-way decision. Int J Mach Learn Cybern 11:947–959
Liu D, Yang X, Li TR (2020) Three-way decisions: beyond rough sets and granular computing. Int J Mach Learn Cybern 11:989–1002
Wierman MJ (1999) Measuring uncertainty in rough set theory. Int J Gen Syst 28(4–5):283–297
Liang JY, Qian YH (2008) Information granules and entropy theory in information systems. Sci China (Ser F Inform Sci ) 10:29–46
Qian YH, Liang JY (2008) Combination entropy and combination granulation in rough set theory. Int J Uncert Fuz Knowl Based Syst 16(2):179–193
Qian YH, Liang JY, Wu WZ et al (2011) Information granularity in fuzzy binary GrC model. IEEE Trans Fuzzy Syst 19(2):253–264
Beaubouef T, Petry FE, Arora G (1998) Information-theoretic measures of uncertainty for rough sets and rough relational databases. Inform Sci 109(1):185–195
Bai L, Chen XQ, Liang JY, Shen HW, Guo YK (2017) Fast density clustering strategies based on the k-means algorithm. Pattern Recogn 71:375–386
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Nos. 61772323, 61976184, 61876103).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wei, W., Wang, D. & Liang, J. Accelerating ReliefF using information granulation. Int. J. Mach. Learn. & Cyber. 13, 29–38 (2022). https://doi.org/10.1007/s13042-021-01334-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01334-4