Skip to main content
Log in

Iteratively local fisher score for feature selection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In machine learning, feature selection is a kind of important dimension reduction techniques, which aims to choose features with the best discriminant ability to avoid the issue of curse of dimensionality for subsequent processing. As a supervised feature selection method, Fisher score (FS) provides a feature evaluation criterion and has been widely used. However, FS ignores the association between features by assessing all features independently and loses the local information for fully connecting within-class samples. In order to solve these issues, this paper proposes a novel feature evaluation criterion based on FS, named iteratively local Fisher score (ILFS). Compared with FS, the new criterion pays more attention to the local structure of data by using K nearest neighbours instead of all samples when calculating the scatters of within-class and between-class. In order to consider the relationship between features, we calculate local Fisher scores of feature subsets instead of scores of single features, and iteratively select the current optimal feature to achieve this idea like sequential forward selection (SFS). Experimental results on UCI and TEP data sets show that the improved algorithm performs well in classification activities compared with some other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071

    Article  Google Scholar 

  2. Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin

    Book  Google Scholar 

  3. Appice A, Ceci M, Rawles S, Flach P (2004) Redundant feature elimination for multi-class problems. In: Proceedings of the twenty-first international conference on machine learning, p 5

  4. Bishop CM et al (1995) Neural networks for pattern recognition. Oxford University Press, Oxford

    MATH  Google Scholar 

  5. Bugata P, Drotár P (2019) Weighted nearest neighbors feature selection. Knowl-Based Syst 163:749–761

    Article  Google Scholar 

  6. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Elec Eng 40(1):16–28

    Article  Google Scholar 

  7. Chen L, Man H, Nefian AV (2005) Face recognition based on multi-class mapping of fisher scores. Pattern Recogn 38(6):799–811

    Article  Google Scholar 

  8. Dixit M, Li Y, Vasconcelos N (2019) Semantic fisher scores for task transfer: using objects to classify scenes. IEEE Trans Pattern Anal Mach Intell

  9. Downs JJ, Vogel EF (1993) A plant-wide industrial process control problem. Comput Chem Eng 17(3):245–255

    Article  Google Scholar 

  10. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  11. Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv:1202.3725

  12. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  13. He B, Shah S, Maung C, Arnold G, Wan G, Schweitzer H (2019) Heuristic search algorithm for dimensionality reduction optimally combining feature selection and feature extraction. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 2280–2287

  14. He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514

  15. He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems, pp 153–160

  16. Huang SH (2015) Supervised feature selection: a tutorial. Artif Intell Res 4(2):22–37

    Article  Google Scholar 

  17. Johnson BA, Iizuka K (2016) Integrating openstreetmap crowdsourced data and landsat time-series imagery for rapid land use/land cover (lulc) mapping: Case study of the laguna de bay area of the philippines. Appl Geogr 67:140–149

    Article  Google Scholar 

  18. Keogh EJ, Mueen A (2010) Curse of dimensionality

  19. Lai H, Tang Y, Luo H, Pan Y (2011) Greedy feature selection for ranking. In: Proceedings of the 2011 15th international conference on computer supported cooperative work in design (CSCWD). IEEE, pp 42–46

  20. Liu H, Setiono R (1995) Chi2: feature selection and discretization of numeric attributes. In: Proceedings of 7th IEEE international conference on tools with artificial intelligence. IEEE, pp 388–391

  21. Liu K, Yang X, Fujita H, Liu D, Yang X, Qian Y (2019) An efficient selector for multi-granularity attribute reduction. Inf Sci 505:457–472

    Article  Google Scholar 

  22. Liu K, Yang X, Yu H, Mi J, Wang P, Chen X (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowl-Based Syst 165:282–296

    Article  Google Scholar 

  23. Moran M, Gordon G (2019) Curious feature selection. Inf Sci 485:42–54

    Article  Google Scholar 

  24. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  25. Qi X, Liu X, Boumaraf S (2019) A new feature selection method based on monarch butterfly optimization and fisher criterion. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–6

  26. Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of relieff and rrelieff. Mach Learn 53(1-2):23–69

    Article  Google Scholar 

  27. Sakar CO, Polat SO, Katircioglu M, Kastro Y (2019) Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and lstm recurrent neural networks. Neural Comput Appl 31 (10):6893–6908

    Article  Google Scholar 

  28. Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158

    Article  Google Scholar 

  29. Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948

    Article  Google Scholar 

  30. Stańczyk U (2015) Feature evaluation by filter, wrapper, and embedded approaches. In: Feature selection for data and pattern recognition. Springer, pp 29–44

  31. Ververidis D, Kotropoulos C (2005) Sequential forward feature selection with low computational cost. In: 2005 13Th european signal processing conference. IEEE, pp 1–4

  32. Xue Y, Zhang L, Wang B, Zhang Z, Li F (2018) Nonlinear feature selection using gaussian kernel svm-rfe for fault diagnosis. Appl Intell 48(10):3306–3331

    Article  Google Scholar 

  33. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MathSciNet  MATH  Google Scholar 

  34. Zhang R, Zhang Z (2020) Feature selection with symmetrical complementary coefficient for quantifying feature interactions. Appl Intell 50(1):101–118

    Article  Google Scholar 

  35. Zhou H, Zhang Y, Zhang Y, Liu H (2019) Feature selection based on conditional mutual information: minimum conditional relevance and minimum conditional redundancy. Appl Intell 49(3):883–896

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supported by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, and the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Appendix A: Proof of Theorem 1

Appendix A: Proof of Theorem 1

Proof

We adopt the mathematical induction method to prove Theorem 1. Fist, we generate the series {J(G(t))}, t = 1,⋯ ,m using the rule (5).

For t = 1, G(0) = and \(\overline {G(0)}=\{1,\cdots ,m\}\). The highest score can be obtained by

$$ \begin{array}{@{}rcl@{}} J(G(1))=\frac{s_{b}^{G(1)}}{s_{w}^{G(1)}}=\max_{k\in\overline{G(0)}}\frac{s_{b}^{G(0)}+s^{\{k\}}_{b}}{s_{w}^{G(0)}+s^{\{k\}}_{w}}=\max_{k\in\overline{G(0)}}\frac{s^{\{k\}}_{b}}{s^{\{k\}}_{w}} \end{array} $$
(12)

where \(s_{b}^{G(0)}=0\) and \(s_{w}^{G(0)}=\delta \). In other words, J(G(1)) is generated by maximizing the local Fisher score of single feature. Then \(G(1)=\{k_{1}^{*}\}\) according to (5), and \(\overline {G(1)}=\{1,\cdots ,m\}-G(1)\), where \(k_{1}^{*}\) is the optimal feature selected in the first iteration. From now, \(k_{t}^{*}\) denotes the optimal feature selected in the t-th iteration.

For t = 2, J(G(2)) can be represented by

$$ J(G(2))=\frac{s_{b}^{G(2)}}{s_{w}^{G(2)}}=\frac{s_{b}^{G(1)}+s^{\{k_{2}^{*}\}}_{b}}{s_{w}^{G(1)}+s^{\{k_{2}^{*}\}}_{w}}=\max_{k\in\overline{G(1)}}\frac{s_{b}^{G(1)}+s^{\{k\}}_{b}}{s_{w}^{G(1)}+s^{\{k\}}_{w}} $$
(13)

Naturally, the score of the feature subset \(\{k_{2}^{*}\}\) is less than or equal to that of the feature subset \(\{k_{1}^{*}\}\). Namely

$$ \begin{array}{@{}rcl@{}} \frac{s^{\{k_{2}^{*}\}}_{b}}{s^{\{k_{2}^{*}\}}_{w}}\leq \frac{s_{b}^{G(1)}}{s_{w}^{G(1)}} \end{array} $$
(14)

By comparing (12) and (13), we have

$$ \begin{array}{@{}rcl@{}} J(G(2))-J(G(1))&=&\frac{s_{b}^{G(2)}}{s_{w}^{G(2)}}-\frac{s_{b}^{G(1)}}{s_{w}^{G(1)}}\\ &=&\frac{s_{b}^{G(1)}+s^{\{k_{2}^{*}\}}_{b}}{s_{w}^{G(1)}+s^{\{k_{2}^{*}\}}_{w}}-\frac{s_{b}^{G(1)}}{s_{w}^{G(1)}}\\ &=&\frac{s^{\{k_{2}^{*}\}}_{b}s_{w}^{G(1)}-s_{b}^{G(1)}s^{\{k_{2}^{*}\}}_{w}}{\left( s_{w}^{G(1)}+s^{\{k_{2}^{*}\}}_{w}\right)s_{w}^{G(1)}} \end{array} $$
(15)

Note that \(s_{w}^{G(1)}>0\) and \(s^{\{k_{2}^{*}\}}_{w}\geq 0\). Substituting (14) into (15), we have

$$ \begin{array}{@{}rcl@{}} J(G(2))-J(G(1))\leq 0 \end{array} $$
(16)

which means that Theorem 1 holds in the first two iterations.

Now, we assume that Theorem 1 holds in the (t − 1)-th and t-th iterations. Namely,

$$ \begin{array}{@{}rcl@{}} J(G(t))-J(G(t-1))\leq 0 \end{array} $$
(17)

For ∀t, J(G(t)) has the form:

$$ \begin{array}{@{}rcl@{}} J(G(t))&=&\frac{s_{b}^{G(t)}}{s_{w}^{G(t)}}=\frac{s_{b}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{w}}\\&=&\max_{k\in\overline{G(t-1)}}\frac{s_{b}^{G(t-1)}+s^{\{k\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k\}}_{w}} \end{array} $$
(18)

The difference of J(G(t)) and J(G(t + 1)) can be described as

$$ \begin{array}{@{}rcl@{}} J(G(t+1)) - J(G(t))&=&\frac{s_{b}^{G(t+1)}}{s_{w}^{G(t+1)}}-\frac{s_{b}^{G(t)}}{s_{w}^{G(t)}}\\ &=&\frac{s_{b}^{G(t)}+s^{\{k_{t+1}^{*}\}}_{b}}{s_{w}^{G(t)}+s^{\{k_{t+1}^{*}\}}_{w}}-\frac{s_{b}^{G(t)}}{s_{w}^{G(t)}}\\ &=&\frac{s^{\{k_{t+1}^{*}\}}_{b}s_{w}^{G(t)}-s_{b}^{G(t)}s^{\{k_{t+1}^{*}\}}_{w}}{\left( s_{w}^{G(t)}+s^{\{k_{t+1}^{*}\}}_{w}\right)s_{w}^{G(t)}} \end{array} $$
(19)

Because the denominator of (19) is greater than 0, we just consider the numerator of (19) that can be further represented as

$$ \begin{array}{@{}rcl@{}} &&s^{\{k_{t+1}^{*}\}}_{b}s_{w}^{G(t)}-s_{b}^{G(t)}s^{\{k_{t+1}^{*}\}}_{w}\\ &=& s^{\{k_{t+1}^{*}\}}_{b}\left( s_{w}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{w}\right) -\left( s_{b}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{b}\right)s^{\{k_{t+1}^{*}\}}_{w}\\ &=&\left( s^{\{k_{t+1}^{*}\}}_{b} s_{w}^{G(t-1)}-s_{b}^{G(t-1)} s^{\{k_{t+1}^{*}\}}_{w}\right)\\&&+\left( s^{\{k_{t+1}^{*}\}}_{b} s^{\{k_{t}^{*}\}}_{w}-s^{\{k_{t}^{*}\}}_{b} s^{\{k_{t+1}^{*}\}}_{w}\right) \end{array} $$
(20)

Because (17) holds true, we have the following inequalities

$$ \begin{array}{@{}rcl@{}} \frac{s_{b}^{G(t-1)}+s^{\{k\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k\}}_{w}} \leq \frac{s_{b}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{w}} \leq \frac{s_{b}^{G(t-1)}}{s_{w}^{G(t-1)}} \end{array} $$
(21)

where \(k,k^{*}_{t}\in \overline {G(t-1)}\) and \(k\neq k^{*}_{t}\). We separately derive the three inequalities and get

$$ \frac{s_{b}^{G(t-1)} + s^{\{k_{t}^{*}\}}_{b}}{s_{w}^{G(t-1)} + s^{\{k_{t}^{*}\}}_{w}} \!\leq\! \frac{s_{b}^{G(t-1)}}{s_{w}^{G(t-1)}} \!\Rightarrow\! s_{w}^{G(t-1)}s^{\{k_{t}^{*}\}}_{b}-s_{b}^{G(t-1)}s^{\{k_{t}^{*}\}}_{w} \!\leq\! 0 $$
(22)
$$ \frac{s_{b}^{G(t-1)}+s^{\{k\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k\}}_{w}} \leq \frac{s_{b}^{G(t-1)}}{s_{w}^{G(t-1)}}\Rightarrow s_{w}^{G(t-1)}s^{\{k\}}_{b}-s_{b}^{G(t-1)}s^{\{k\}}_{w} \leq 0 $$
(23)

and

$$ \begin{array}{@{}rcl@{}} &&~~~\frac{s_{b}^{G(t-1)}+s^{\{k\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k\}}_{w}} \leq \frac{s_{b}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{w}} \\ &&\Rightarrow \left( s^{\{k\}}_{b} s^{\{k_{t}^{*}\}}_{w}-s^{\{k_{t}^{*}\}}_{b} s^{\{k\}}_{w} \right)+\left( s^{\{k\}}_{b} s_{w}^{G(t-1)}-s_{b}^{G(t-1)} s^{\{k\}}_{w}\right) \\ &&\leq \left( s_{w}^{G(t-1)}s^{\{k_{t}^{*}\}}_{b}-s_{b}^{G(t-1)}s^{\{k_{t}^{*}\}}_{w}\right) \leq 0 \end{array} $$
(24)

Because (24) holds true for \(\forall k\in \overline {G(t-1)}\) and \(k_{t+1}^{*} \in \overline {G(t)} \subset \overline {G(t-1)}\) in (20), we can rewrite (20) as

$$ \begin{array}{@{}rcl@{}} s^{\{k_{t+1}^{*}\}}_{b}s_{w}^{G(t)}-s_{b}^{G(t)}s^{\{k_{t+1}^{*}\}}_{w}\leq 0 \end{array} $$
(25)

Substituting (25) into (19), we have

$$ \begin{array}{@{}rcl@{}} J(G(t+1))-J(G(t))\leq 0 \end{array} $$
(26)

Consequently, we complete the proof of Theorem 1 by using the mathematical induction method.□

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gan, M., Zhang, L. Iteratively local fisher score for feature selection. Appl Intell 51, 6167–6181 (2021). https://doi.org/10.1007/s10489-020-02141-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02141-0

Keywords

Navigation