Iteratively local fisher score for feature selection

Gan, Min; Zhang, Li

doi:10.1007/s10489-020-02141-0

Iteratively local fisher score for feature selection

Published: 05 February 2021

Volume 51, pages 6167–6181, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

787 Accesses
21 Citations
Explore all metrics

Abstract

In machine learning, feature selection is a kind of important dimension reduction techniques, which aims to choose features with the best discriminant ability to avoid the issue of curse of dimensionality for subsequent processing. As a supervised feature selection method, Fisher score (FS) provides a feature evaluation criterion and has been widely used. However, FS ignores the association between features by assessing all features independently and loses the local information for fully connecting within-class samples. In order to solve these issues, this paper proposes a novel feature evaluation criterion based on FS, named iteratively local Fisher score (ILFS). Compared with FS, the new criterion pays more attention to the local structure of data by using K nearest neighbours instead of all samples when calculating the scatters of within-class and between-class. In order to consider the relationship between features, we calculate local Fisher scores of feature subsets instead of scores of single features, and iteratively select the current optimal feature to achieve this idea like sequential forward selection (SFS). Experimental results on UCI and TEP data sets show that the improved algorithm performs well in classification activities compared with some other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection considering weighted relevancy

Article 13 July 2018

A recursive feature retention method for semi-supervised feature selection

Article 29 May 2021

Joint Clustering and Feature Selection

References

Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
Article Google Scholar
Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin
Book Google Scholar
Appice A, Ceci M, Rawles S, Flach P (2004) Redundant feature elimination for multi-class problems. In: Proceedings of the twenty-first international conference on machine learning, p 5
Bishop CM et al (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
MATH Google Scholar
Bugata P, Drotár P (2019) Weighted nearest neighbors feature selection. Knowl-Based Syst 163:749–761
Article Google Scholar
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Elec Eng 40(1):16–28
Article Google Scholar
Chen L, Man H, Nefian AV (2005) Face recognition based on multi-class mapping of fisher scores. Pattern Recogn 38(6):799–811
Article Google Scholar
Dixit M, Li Y, Vasconcelos N (2019) Semantic fisher scores for task transfer: using objects to classify scenes. IEEE Trans Pattern Anal Mach Intell
Downs JJ, Vogel EF (1993) A plant-wide industrial process control problem. Comput Chem Eng 17(3):245–255
Article Google Scholar
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv:1202.3725
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
He B, Shah S, Maung C, Arnold G, Wan G, Schweitzer H (2019) Heuristic search algorithm for dimensionality reduction optimally combining feature selection and feature extraction. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 2280–2287
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514
He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems, pp 153–160
Huang SH (2015) Supervised feature selection: a tutorial. Artif Intell Res 4(2):22–37
Article Google Scholar
Johnson BA, Iizuka K (2016) Integrating openstreetmap crowdsourced data and landsat time-series imagery for rapid land use/land cover (lulc) mapping: Case study of the laguna de bay area of the philippines. Appl Geogr 67:140–149
Article Google Scholar
Keogh EJ, Mueen A (2010) Curse of dimensionality
Lai H, Tang Y, Luo H, Pan Y (2011) Greedy feature selection for ranking. In: Proceedings of the 2011 15th international conference on computer supported cooperative work in design (CSCWD). IEEE, pp 42–46
Liu H, Setiono R (1995) Chi2: feature selection and discretization of numeric attributes. In: Proceedings of 7th IEEE international conference on tools with artificial intelligence. IEEE, pp 388–391
Liu K, Yang X, Fujita H, Liu D, Yang X, Qian Y (2019) An efficient selector for multi-granularity attribute reduction. Inf Sci 505:457–472
Article Google Scholar
Liu K, Yang X, Yu H, Mi J, Wang P, Chen X (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowl-Based Syst 165:282–296
Article Google Scholar
Moran M, Gordon G (2019) Curious feature selection. Inf Sci 485:42–54
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Qi X, Liu X, Boumaraf S (2019) A new feature selection method based on monarch butterfly optimization and fisher criterion. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–6
Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of relieff and rrelieff. Mach Learn 53(1-2):23–69
Article Google Scholar
Sakar CO, Polat SO, Katircioglu M, Kastro Y (2019) Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and lstm recurrent neural networks. Neural Comput Appl 31 (10):6893–6908
Article Google Scholar
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158
Article Google Scholar
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948
Article Google Scholar
Stańczyk U (2015) Feature evaluation by filter, wrapper, and embedded approaches. In: Feature selection for data and pattern recognition. Springer, pp 29–44
Ververidis D, Kotropoulos C (2005) Sequential forward feature selection with low computational cost. In: 2005 13Th european signal processing conference. IEEE, pp 1–4
Xue Y, Zhang L, Wang B, Zhang Z, Li F (2018) Nonlinear feature selection using gaussian kernel svm-rfe for fault diagnosis. Appl Intell 48(10):3306–3331
Article Google Scholar
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
MathSciNet MATH Google Scholar
Zhang R, Zhang Z (2020) Feature selection with symmetrical complementary coefficient for quantifying feature interactions. Appl Intell 50(1):101–118
Article Google Scholar
Zhou H, Zhang Y, Zhang Y, Liu H (2019) Feature selection based on conditional mutual information: minimum conditional relevance and minimum conditional redundancy. Appl Intell 49(3):883–896
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

School of Computer Science and Technology & Joint International Research Laboratory of Machine Learning and Neuromorphic Computing, Soochow University, Suzhou, 215006, Jiangsu, China
Min Gan & Li Zhang
Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Li Zhang

Authors

Min Gan
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supported by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, and the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Appendix A: Proof of Theorem 1

Proof

We adopt the mathematical induction method to prove Theorem 1. Fist, we generate the series {J(G(t))}, t = 1,⋯ ,m using the rule (5).

For t = 1, G(0) = ∅ and $\overline {G(0)}=\{1,\cdots ,m\}$. The highest score can be obtained by

$$ \begin{array}{@{}rcl@{}} J(G(1))=\frac{s_{b}^{G(1)}}{s_{w}^{G(1)}}=\max_{k\in\overline{G(0)}}\frac{s_{b}^{G(0)}+s^{\{k\}}_{b}}{s_{w}^{G(0)}+s^{\{k\}}_{w}}=\max_{k\in\overline{G(0)}}\frac{s^{\{k\}}_{b}}{s^{\{k\}}_{w}} \end{array} $$

(12)

where $s_{b}^{G(0)}=0$ and $s_{w}^{G(0)}=\delta $. In other words, J(G(1)) is generated by maximizing the local Fisher score of single feature. Then $G(1)=\{k_{1}^{*}\}$ according to (5), and $\overline {G(1)}=\{1,\cdots ,m\}-G(1)$, where $k_{1}^{*}$ is the optimal feature selected in the first iteration. From now, $k_{t}^{*}$ denotes the optimal feature selected in the t-th iteration.

For t = 2, J(G(2)) can be represented by

$$ J(G(2))=\frac{s_{b}^{G(2)}}{s_{w}^{G(2)}}=\frac{s_{b}^{G(1)}+s^{\{k_{2}^{*}\}}_{b}}{s_{w}^{G(1)}+s^{\{k_{2}^{*}\}}_{w}}=\max_{k\in\overline{G(1)}}\frac{s_{b}^{G(1)}+s^{\{k\}}_{b}}{s_{w}^{G(1)}+s^{\{k\}}_{w}} $$

(13)

Naturally, the score of the feature subset $\{k_{2}^{*}\}$ is less than or equal to that of the feature subset $\{k_{1}^{*}\}$. Namely

$$ \begin{array}{@{}rcl@{}} \frac{s^{\{k_{2}^{*}\}}_{b}}{s^{\{k_{2}^{*}\}}_{w}}\leq \frac{s_{b}^{G(1)}}{s_{w}^{G(1)}} \end{array} $$

(14)

By comparing (12) and (13), we have

$$ \begin{array}{@{}rcl@{}} J(G(2))-J(G(1))&=&\frac{s_{b}^{G(2)}}{s_{w}^{G(2)}}-\frac{s_{b}^{G(1)}}{s_{w}^{G(1)}}\\ &=&\frac{s_{b}^{G(1)}+s^{\{k_{2}^{*}\}}_{b}}{s_{w}^{G(1)}+s^{\{k_{2}^{*}\}}_{w}}-\frac{s_{b}^{G(1)}}{s_{w}^{G(1)}}\\ &=&\frac{s^{\{k_{2}^{*}\}}_{b}s_{w}^{G(1)}-s_{b}^{G(1)}s^{\{k_{2}^{*}\}}_{w}}{\left( s_{w}^{G(1)}+s^{\{k_{2}^{*}\}}_{w}\right)s_{w}^{G(1)}} \end{array} $$

(15)

Note that $s_{w}^{G(1)}>0$ and $s^{\{k_{2}^{*}\}}_{w}\geq 0$. Substituting (14) into (15), we have

$$ \begin{array}{@{}rcl@{}} J(G(2))-J(G(1))\leq 0 \end{array} $$

(16)

which means that Theorem 1 holds in the first two iterations.

Now, we assume that Theorem 1 holds in the (t − 1)-th and t-th iterations. Namely,

$$ \begin{array}{@{}rcl@{}} J(G(t))-J(G(t-1))\leq 0 \end{array} $$

(17)

For ∀t, J(G(t)) has the form:

$$ \begin{array}{@{}rcl@{}} J(G(t))&=&\frac{s_{b}^{G(t)}}{s_{w}^{G(t)}}=\frac{s_{b}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{w}}\\&=&\max_{k\in\overline{G(t-1)}}\frac{s_{b}^{G(t-1)}+s^{\{k\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k\}}_{w}} \end{array} $$

(18)

The difference of J(G(t)) and J(G(t + 1)) can be described as

$$ \begin{array}{@{}rcl@{}} J(G(t+1)) - J(G(t))&=&\frac{s_{b}^{G(t+1)}}{s_{w}^{G(t+1)}}-\frac{s_{b}^{G(t)}}{s_{w}^{G(t)}}\\ &=&\frac{s_{b}^{G(t)}+s^{\{k_{t+1}^{*}\}}_{b}}{s_{w}^{G(t)}+s^{\{k_{t+1}^{*}\}}_{w}}-\frac{s_{b}^{G(t)}}{s_{w}^{G(t)}}\\ &=&\frac{s^{\{k_{t+1}^{*}\}}_{b}s_{w}^{G(t)}-s_{b}^{G(t)}s^{\{k_{t+1}^{*}\}}_{w}}{\left( s_{w}^{G(t)}+s^{\{k_{t+1}^{*}\}}_{w}\right)s_{w}^{G(t)}} \end{array} $$

(19)

Because the denominator of (19) is greater than 0, we just consider the numerator of (19) that can be further represented as

$$ \begin{array}{@{}rcl@{}} &&s^{\{k_{t+1}^{*}\}}_{b}s_{w}^{G(t)}-s_{b}^{G(t)}s^{\{k_{t+1}^{*}\}}_{w}\\ &=& s^{\{k_{t+1}^{*}\}}_{b}\left( s_{w}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{w}\right) -\left( s_{b}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{b}\right)s^{\{k_{t+1}^{*}\}}_{w}\\ &=&\left( s^{\{k_{t+1}^{*}\}}_{b} s_{w}^{G(t-1)}-s_{b}^{G(t-1)} s^{\{k_{t+1}^{*}\}}_{w}\right)\\&&+\left( s^{\{k_{t+1}^{*}\}}_{b} s^{\{k_{t}^{*}\}}_{w}-s^{\{k_{t}^{*}\}}_{b} s^{\{k_{t+1}^{*}\}}_{w}\right) \end{array} $$

(20)

Because (17) holds true, we have the following inequalities

$$ \begin{array}{@{}rcl@{}} \frac{s_{b}^{G(t-1)}+s^{\{k\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k\}}_{w}} \leq \frac{s_{b}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{w}} \leq \frac{s_{b}^{G(t-1)}}{s_{w}^{G(t-1)}} \end{array} $$

(21)

where $k,k^{*}_{t}\in \overline {G(t-1)}$ and $k\neq k^{*}_{t}$. We separately derive the three inequalities and get

$$ \frac{s_{b}^{G(t-1)} + s^{\{k_{t}^{*}\}}_{b}}{s_{w}^{G(t-1)} + s^{\{k_{t}^{*}\}}_{w}} \!\leq\! \frac{s_{b}^{G(t-1)}}{s_{w}^{G(t-1)}} \!\Rightarrow\! s_{w}^{G(t-1)}s^{\{k_{t}^{*}\}}_{b}-s_{b}^{G(t-1)}s^{\{k_{t}^{*}\}}_{w} \!\leq\! 0 $$

(22)

$$ \frac{s_{b}^{G(t-1)}+s^{\{k\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k\}}_{w}} \leq \frac{s_{b}^{G(t-1)}}{s_{w}^{G(t-1)}}\Rightarrow s_{w}^{G(t-1)}s^{\{k\}}_{b}-s_{b}^{G(t-1)}s^{\{k\}}_{w} \leq 0 $$

(23)

and

$$ \begin{array}{@{}rcl@{}} &&~~~\frac{s_{b}^{G(t-1)}+s^{\{k\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k\}}_{w}} \leq \frac{s_{b}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{b}}{s_{w}^{G(t-1)}+s^{\{k_{t}^{*}\}}_{w}} \\ &&\Rightarrow \left( s^{\{k\}}_{b} s^{\{k_{t}^{*}\}}_{w}-s^{\{k_{t}^{*}\}}_{b} s^{\{k\}}_{w} \right)+\left( s^{\{k\}}_{b} s_{w}^{G(t-1)}-s_{b}^{G(t-1)} s^{\{k\}}_{w}\right) \\ &&\leq \left( s_{w}^{G(t-1)}s^{\{k_{t}^{*}\}}_{b}-s_{b}^{G(t-1)}s^{\{k_{t}^{*}\}}_{w}\right) \leq 0 \end{array} $$

(24)

Because (24) holds true for $\forall k\in \overline {G(t-1)}$ and $k_{t+1}^{*} \in \overline {G(t)} \subset \overline {G(t-1)}$ in (20), we can rewrite (20) as

$$ \begin{array}{@{}rcl@{}} s^{\{k_{t+1}^{*}\}}_{b}s_{w}^{G(t)}-s_{b}^{G(t)}s^{\{k_{t+1}^{*}\}}_{w}\leq 0 \end{array} $$

(25)

Substituting (25) into (19), we have

$$ \begin{array}{@{}rcl@{}} J(G(t+1))-J(G(t))\leq 0 \end{array} $$

(26)

Consequently, we complete the proof of Theorem 1 by using the mathematical induction method.□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gan, M., Zhang, L. Iteratively local fisher score for feature selection. Appl Intell 51, 6167–6181 (2021). https://doi.org/10.1007/s10489-020-02141-0

Download citation

Accepted: 12 December 2020
Published: 05 February 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10489-020-02141-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iteratively local fisher score for feature selection

Abstract

Access this article

Similar content being viewed by others

Feature selection considering weighted relevancy

A recursive feature retention method for semi-supervised feature selection

Joint Clustering and Feature Selection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix A: Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Iteratively local fisher score for feature selection

Abstract

Access this article

Similar content being viewed by others

Feature selection considering weighted relevancy

A recursive feature retention method for semi-supervised feature selection

Joint Clustering and Feature Selection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix A: Proof of Theorem 1

Appendix A: Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation