Skip to main content
Log in

Accelerating ReliefF using information granulation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Feature selection is an essential preprocessing requirement when solving a classification problem. In this respect, the Relief algorithm and its derivatives have been demonstrated to be a class of successful feature selectors. However, the computational cost of these algorithms is very high when large-scale datasets are processed. To solve this problem, we propose the fast ReliefF algorithm based on the information granulation of instances (IG-FReliefF). The algorithm uses K-means to granulate the dataset and selects the significant granules among them using the criteria defined by information entropy and information granulation, and then evaluates each feature on the dataset composed of the selected granules. Extensive experiments show that the proposed algorithm is more efficient than the existing representative algorithms, especially on large-scale data sets, and the proposed algorithm is almost the same as the comparison algorithm in terms of classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Fawley WJ, PiatetskyShapiro G, Matheus CJ (1992) Knowledge discovery in databases: an overview. Ai Mag 13(3):3–16

    Google Scholar 

  2. Han JW, Kamber M (2006) Data mining: concepts and techniques. Data Min Conc Mod Methods Algorithms Sec Ed 5(4):1–18

    MATH  Google Scholar 

  3. Zhang C, Li HX, Chen CL, Zhou XZ (2020) Nonnegative representation based discriminant projection for face recognition. Int J Mach Learn Cybern (10)

  4. Li HX, Zhang LB, Huang B, Zhou XZ (2020) Cost-sensitive dual-bidirectional linear discriminant analysis. Inform Sci 510:283–303

    Article  MathSciNet  Google Scholar 

  5. Destrero A, Mosci S, Mol CD, Verri A, Odone F (2009) Feature selection for high-dimensional data. Comput Manag Sci 6(1):25–40

    Article  MathSciNet  Google Scholar 

  6. Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502

    Article  MathSciNet  Google Scholar 

  7. Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. AAAI 2:129–134

    Google Scholar 

  8. Kononenko I (1994) Estimating attributes: analysis and extensions of Relief. Mach Learn ECML 94:171–182

  9. Robnik M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1–2):23–69

    Article  Google Scholar 

  10. Sun YJ (2007) Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans Pattern Anal Mach Intell 29(6):1035–1051

    Article  Google Scholar 

  11. Sun YJ, Todorovic S, Goodison S (2008) A feature selection algorithm capable of handling extremely large data dimensionality. In: Proceedings of the SIAM International Conference on Data Mining, Atlanta, Georgia, USA 530–540

  12. Cai H, Ruan P, Ng M, Akutsu T (2014) Feature weight estimation for gene selection: a local hyperlinear learning approach. BMC Bioinform 15(1):1–13

    Article  Google Scholar 

  13. Huang XJ, Zhang L, Wang BJ, Zhang Z, Li FZ (2018) Feature weight estimation based on dynamic representation and neighbor sparse reconstruction. Pattern Recogn 81(9):388–403

    Article  Google Scholar 

  14. Zhang L, Huang XJ, Zhou WD (2019) Logistic local hyperplane-Relief: a feature weighting method for classification. Knowl Based Syst 181:104741

    Article  Google Scholar 

  15. Liu XM, Tang JS, Liu J, Feng ZL (2008) A Semi-Supervised Relief based feature extraction algorithm. In: 2nd International Conference on Future Generation Communication and Networking Symposia. Piscataway NJ: IEEE Computer Society 3:3–6

  16. Cheng YB, Cai YP, Sun YJ, Jian L (2008) Semi-supervised feature selection under logistic I-RELIEF framework. In: IEEE the 19th International Conference on Pattern Recognition. Piscataway NJ: 1–4

  17. Zafra A, Pechenizkiy M, Ventura S (2012) ReliefF-MI: an extension of ReliefF to multiple instance learning. Neurocomputing 75(1):210–218

    Article  Google Scholar 

  18. Song Y, Si WY, Dai FF, Yang GS (2020) Weighted ReliefF with threshold constraints of feature selection for imbalanced data classification. Concurr Comput Pract Exp 32(14):1–13

    Article  Google Scholar 

  19. Kilicarslan S, Adem K, Celik M (2020) Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network. Med Hypoth 137:109577

    Article  Google Scholar 

  20. Jin LL, Zeng QR, He JZ, Feng YJ, Zhou SQ, Wu Y (2019) A ReliefF-SVM-based method for marking dopamine-based disease characteristics: a study on SWEDD and parkinson‘’s disease. Behav Brain Res 356:400–407

    Article  Google Scholar 

  21. Praveena HD, Subhas C, Naidu KR (2020) Automatic epileptic seizure recognition using ReliefF feature selection and long short term memory classifier. J Ambient Intell Hum Comput.

  22. Wang Z, Zhang Y, Chen ZC, Yang H, Sun YX, Kang JM, Yang Y, Liang XJ (2016) Application of ReliefF algorithm to selecting feature sets for classification of high resolution remote sensing image. In: 2016 IEEE International Geoscience and Remote Sensing Symposium 755–758

  23. Dou DY, Wu WZ, Yang JG, Zhang Y (2019) Classification of coal and gangue under multiple surface conditions via machine vision and Relief-SVM. Powder Technol 356:1024–1028

    Article  Google Scholar 

  24. Zhou ZB, Wang YF, He XR, Zhang XC (2020) Optimization of random forests algorithm based on ReliefF-SA. IOP Conf Ser Mater Sci Eng 768:072065

    Article  Google Scholar 

  25. Baskar SS, Arockiam L (2014) C-LAS Relief-An improved feature selection technique in data mining. Int J Comput Appl 83(13):33–36

    Google Scholar 

  26. Liu Y, Tang F, Zeng Z (2015) Feature selection based on dependency margin. IEEE Trans Cybern 45(6):1209–1221

    Article  Google Scholar 

  27. Shi SB, Li GN, Chen HX, Liu JY, Hu YP, Xing L, Hu WJ (2017) Refrigerant charge fault diagnosis in the VRF system using bayesian artificial neural network combined with ReliefF filter. Appl Thermal Eng 112:698–706

    Article  Google Scholar 

  28. Huang Y, Mccullagh PJ, Black ND (2009) An optimization of ReliefF for classification in large datasets. Data Knowl Eng 68(11):1348–1356

    Article  Google Scholar 

  29. Yao YY (2009) Interpreting concept learning in cognitive informatics and granular computing. IEEE Trans Syst Man Cybern Part B 39(4):855–866

    Article  Google Scholar 

  30. Niu JJ, Huang CC, Li JH, Fan M (2018) Parallel computing techniques for concept-cognitive learning based on granular computing. Int J Mach Learn Cybern 9(11):1785–1805

    Article  Google Scholar 

  31. Mi YL, Shi Y, Li JH, Liu WQ, Yan MY (2020) Fuzzy-based concept learning method: exploiting data with fuzzy conceptual clustering. IEEE Tran Cybern 42(1):1–12

    Google Scholar 

  32. Yao YY (2020) Tri-level thinking: models of three-way decision. Int J Mach Learn Cybern 11:947–959

    Article  Google Scholar 

  33. Liu D, Yang X, Li TR (2020) Three-way decisions: beyond rough sets and granular computing. Int J Mach Learn Cybern 11:989–1002

    Article  Google Scholar 

  34. Wierman MJ (1999) Measuring uncertainty in rough set theory. Int J Gen Syst 28(4–5):283–297

    Article  MathSciNet  Google Scholar 

  35. Liang JY, Qian YH (2008) Information granules and entropy theory in information systems. Sci China (Ser F Inform Sci ) 10:29–46

  36. Qian YH, Liang JY (2008) Combination entropy and combination granulation in rough set theory. Int J Uncert Fuz Knowl Based Syst 16(2):179–193

    Article  MathSciNet  Google Scholar 

  37. Qian YH, Liang JY, Wu WZ et al (2011) Information granularity in fuzzy binary GrC model. IEEE Trans Fuzzy Syst 19(2):253–264

    Article  Google Scholar 

  38. Beaubouef T, Petry FE, Arora G (1998) Information-theoretic measures of uncertainty for rough sets and rough relational databases. Inform Sci 109(1):185–195

    Article  Google Scholar 

  39. Bai L, Chen XQ, Liang JY, Shen HW, Guo YK (2017) Fast density clustering strategies based on the k-means algorithm. Pattern Recogn 71:375–386

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61772323, 61976184, 61876103).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiye Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, W., Wang, D. & Liang, J. Accelerating ReliefF using information granulation. Int. J. Mach. Learn. & Cyber. 13, 29–38 (2022). https://doi.org/10.1007/s13042-021-01334-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01334-4

Keywords

Navigation