Classifier selection using geometry preserving feature

Pan, Binbin; Chen, Wen-Sheng; Deng, Liping; Xu, Chen; Zhou, Xiaobo

doi:10.1007/s00521-023-08828-y

Classifier selection using geometry preserving feature

Original Article
Published: 28 July 2023

Volume 35, pages 20955–20976, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Binbin Pan^1,2,
Wen-Sheng Chen^1,2,
Liping Deng³,
Chen Xu⁴ &
…
Xiaobo Zhou⁵

117 Accesses
Explore all metrics

Abstract

The selection of proper classifiers for a given data set is full of challenges. The critical problem of classifier selection is how to extract feature from data sets. This paper proposes a new method for feature extraction of a data set. Our method not only preserves the geometrical structure of a data set, but also characterizes the decision boundary of classification problems. Specifically speaking, the extracted feature can recover a data set that has the same Euclidean geometrical structure as the original data set. We present an efficient algorithm to compute the similarity between data set features. We theoretically analyze how the similarity between our features affects the performance of the support vector machine, a well-known classifier. The empirical results show that our method is effective in finding suitable classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparison of Feature Construction Methods in the Context of Supervised Feature Selection for Classification

Wide-ranging approach-based feature selection for classification

Article 17 November 2022

A Feature Selection Method Using Hierarchical Clustering

Data availability

The datasets analysed during the current study are available in the UCI machine learning repository, https://archive.ics.uci.edu/ml/index.php.

Notes

For simplicity, we omit the class labels.

References

Aha DW (1992) Generalizing from case studies: a case study. In: Proceedings of the ninth international conference on machine learning, pp 1–10
Bahri M, Salutari F, Putina A et al (2022) AutoML: state of the art with a focus on anomaly detection, challenges, and research directions. Int J Data Sci Anal 14(2):113–126
Article Google Scholar
Bensusan H (1998) God doesn’t always shave with Occam’s razor - learning when and how to prune. In: Proceedings of the tenth European conference on machine learning, pp 119–124
Bensusan H, Giraud-Carrier C (2000) Discovering task neighbourhoods through landmark learning performances. In: Proceedings of the fourth European conference on principles and practice of knowledge discovery in databases, pp 325–330
Bensusan H, Giraud-Carrier C, Kennedy C (2000) A higher-order approach to meta-learning. In: Proceedings of the ECML workshop on meta-learning: building automatic advice strategies for model selection and method combination, pp 109–118
Broomhead DS, Lowe D (1988) Multivariable functional interpolation and adaptive networks. Complex Syst 2(3):321–355
MathSciNet MATH Google Scholar
Cano JR (2013) Analysis of data complexity measures for classification. Expert Syst Appl 40(12):4820–4831
Article Google Scholar
Cartinhour J (1992) A Bayes classifier when the class distributions come from a common multivariate normal distribution. IEEE Trans Reliab 41(1):124–126
Article MATH Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Deng L, Xiao M (2023) Latent feature learning via autoencoder training for automatic classification configuration recommendation. Knowl-Based Syst 261(110):218
Google Scholar
Deng L, Xiao M (2023) A new automatic hyperparameter recommendation approach under low-rank tensor completion e framework. IEEE Trans Pattern Anal Mach Intell 45(4):4038–4050
Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Springer, Berlin
MATH Google Scholar
Duin RPW, Pekalska E, Tax DMJ (2004) The characterization of classification problems by classifier disagreements. In: Proceedings of the seventeenth international conference on pattern recognition, pp 140–143
Fernández-Delgado M, Cernadas E, Barro S et al (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181
MathSciNet MATH Google Scholar
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, Cambridge
MATH Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations. Johns Hopkins University Press, Baltimore
MATH Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York
Book MATH Google Scholar
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300
Article Google Scholar
Jain AK, Ramaswami M (1988) Classifier design with Parzen windows. Mach Intell Pattern Recogn 7:211–228
Google Scholar
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449
Article MATH Google Scholar
Kalousis A, Theoharis T (1999) NOEMON: design, implementation and performance results of an intelligent assistant for classifier selection. Intell Data Anal 3(5):319–337
MATH Google Scholar
Koren O, Hallin CA, Koren M et al (2022) AutoML classifier clustering procedure. Int J Intell Syst 37(7):4214–4232
Article Google Scholar
Macià N, Bernadó-Mansilla E, Orriols-Puig A et al (2013) Learner excellence biased by data set selection: a case for data characterisation and artificial data sets. Pattern Recogn 46(3):1054–1066
Article Google Scholar
Pan B, Chen WS, Chen B et al (2016) Efficient learning of supervised kernels with a graph-based loss function. Inf Sci 370(371):50–62
Article MathSciNet MATH Google Scholar
Pan B, Chen WS, Xu C et al (2016) A novel framework for learning geometry-aware kernels. IEEE Trans Neural Netw Learn Syst 27:939–951
Article MathSciNet Google Scholar
Peng Y, Flach PA, Brazdil P, et al (2002) Improved dataset characterisation for meta-learning. In: Proceedings of the Fifth international conference on discovery science, pp 141–152
Pfahringer B, Bensusan H, Giraud-Carrier C (2000) Meta-learning by landmarking various learning algorithms. In: Proceedings of the seventeenth international conference on machine learning, pp 743–750
Raudys S, Duin RPW (1998) Expected classification error of the fisher linear classifier with pseudo-inverse covariance matrix. Pattern Recogn Lett 19(5–6):385–392
Article MATH Google Scholar
Rice JR (1976) The algorithm selection problem. Adv Comput 15:65–118
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article MATH Google Scholar
Singh S (2003) Multiresolution estimates of classification complexity. IEEE Trans Pattern Anal Mach Intell 25(12):1534–1539
Article Google Scholar
Song Q, Wang G, Wang C (2012) Automatic recommendation of classification algorithms based on data set characteristics. Pattern Recogn 45(7):2672–2689
Article Google Scholar
Umeyama S (1988) An eigendecomposition approach to weighted graph matching problems. IEEE Trans Pattern Anal Mach Intell 10(5):695–703
Article MATH Google Scholar
Vong CM, Du J (2020) Accurate and efficient sequential ensemble learning for highly imbalanced multi-class data. Neural Netw 128:268–278
Article Google Scholar
Wang G, Song Q, Zhu X (2015) An improved data characterization method and its application in classification algorithm recommendation. Appl Intell 43(4):892–912
Article Google Scholar
Williams CKI, Seeger M (2000) The effect of the input density distribution on kernel-based classifiers. In: Proceedings of the seventeenth international conference on machine learning, pp 1159–1166
Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems 13. MIT Press, Cambridge, pp 682–688
Google Scholar
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390
Article Google Scholar
Yokota T, Yamashita Y (2013) A quadratically constrained MAP classifier using the mixture of Gaussians models as a weight function. IEEE Trans Neural Netw Learn Syst 24(7):1127–1140
Article Google Scholar
Yousef WA (2021) Estimating the standard error of cross-validation-based estimators of classifier performance. Pattern Recogn Lett 146:115–125
Article Google Scholar
Zhu X, Wu X (2004) Class noise versus attribute noise: a quantitative study. Artif Intell Rev 22(3):177–210
Article MATH Google Scholar
Zhu X, Wu X, Yang Y (2004) Error detection and impact-sensitive instance ranking in noisy datasets. In: McGuinness DL, Ferguson G (eds) Proceedings of the nineteenth national conference on artificial intelligence, July 25-29, 2004, San Jose, California, USA, pp 378–384

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 61602308 and the Interdisciplinary Innovation Team of Shenzhen University.

Author information

Authors and Affiliations

College of Mathematics and Statistics, Shenzhen University, Shenzhen, 518060, China
Binbin Pan & Wen-Sheng Chen
Guangdong Key Laboratory of Media Security, Shenzhen University, Shenzhen, 518060, China
Binbin Pan & Wen-Sheng Chen
School of Mathematical and Statistical Science, Southern Illinois University Carbondale, Carbondale, IL, 62901, USA
Liping Deng
Institute of Intelligent Computing Science, Shenzhen University, Shenzhen, 518060, China
Chen Xu
Department of Radiology, Wake Forest University School of Medicine, Medical Center Boulevard, Winston-Salem, NC, 27157, USA
Xiaobo Zhou

Authors

Binbin Pan
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Sheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Liping Deng
View author publications
You can also search for this author in PubMed Google Scholar
Chen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobo Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen-Sheng Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Theorem 2

Proof

Let $h(\textbf{x}) = \textbf{w}^\top \textbf{x}$, $h^\prime (\textbf{x}^\prime ) = {\textbf{w}^\prime }^\top \textbf{x}^\prime $ and $\Delta \textbf{w} = \textbf{w} - \textbf{w}^\prime $. Then we have

$$\begin{aligned} \begin{aligned}&\ |h(\textbf{x})-h^\prime (\textbf{x}^\prime ) |= |\textbf{w}^\top \textbf{x} - {\textbf{w}^\prime }^\top \textbf{x}^\prime |\\ &=\ |(\textbf{w}^\top \textbf{x} - {\textbf{w}^\prime }^\top \textbf{x}) + ({\textbf{w}^\prime }^\top \textbf{x} - {\textbf{w}^\prime }^\top \textbf{x}^\prime )|\\ \leqslant&\ |\textbf{w}^\top \textbf{x} - {\textbf{w}^\prime }^\top \textbf{x} |_2 + |{\textbf{w}^\prime }^\top \textbf{x} - {\textbf{w}^\prime }^\top \textbf{x}^\prime |_2\\ \leqslant&\ \Vert \textbf{w} - {\textbf{w}^\prime }\Vert _2\Vert \textbf{x}\Vert _2 + \Vert \textbf{w}^\prime \Vert _2\Vert \textbf{x} - \textbf{x}^\prime \Vert _2. \end{aligned} \end{aligned}$$

(A1)

Since $\textbf{w}$ and $\textbf{w}^\prime $ are the minimizers of the associated SVM problems (12), for all $t \in [0,1]$, we have

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\Vert \textbf{w}\Vert ^2_2 + C \sum _{i=1}^n L(y_i\textbf{w}^\top \textbf{x}_i) \\ \leqslant&\ \frac{1}{2}\Vert \textbf{w} + t \Delta \textbf{w} \Vert ^2_2 + C \sum _{i=1}^n L(y_i(\textbf{w} + t \Delta \textbf{w})^\top \textbf{x}_i), \end{aligned} \end{aligned}$$

(A2)

and

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\Vert \textbf{w}^\prime \Vert ^2_2 + C \sum _{i=1}^n L(y_i^\prime {\textbf{w}^\prime }^\top \textbf{x}_i^\prime ) \\ \leqslant&\ \frac{1}{2}\Vert \textbf{w}^\prime - t \Delta \textbf{w} \Vert ^2_2 + C \sum _{i=1}^n L(y_i^\prime (\textbf{w}^\prime - t \Delta \textbf{w})^\top \textbf{x}_i^\prime ). \end{aligned} \end{aligned}$$

(A3)

Summing (A2) and (A3), we obtain

$$\begin{aligned} \begin{aligned} t(1-t) \Vert \Delta \textbf{w} \Vert ^2_2&\leqslant C \sum _{i=1}^n \left[ \left( L(y_i(\textbf{w} + t \Delta \textbf{w})^\top \textbf{x}_i) - L(y_i\textbf{w}^\top \textbf{x}_i) \right) \right. \\&\ + \left. \left( L(y_i^\prime (\textbf{w}^\prime - t \Delta \textbf{w})^\top \textbf{x}_i^\prime - L(y_i^\prime {\textbf{w}^\prime }^\top \textbf{x}_i^\prime ) \right) \right] . \end{aligned} \end{aligned}$$

(A4)

Using the convexity of hinge loss, we have

$$\begin{aligned} \begin{aligned}&L(y_i(\textbf{w} + t \Delta \textbf{w})^\top \textbf{x}_i) - L(y_i\textbf{w}^\top \textbf{x}_i) \\ \leqslant&\ t \left( L(y_i {\textbf{w}^\prime }^\top \textbf{x}_i) - L(y_i\textbf{w}^\top \textbf{x}_i) \right) \end{aligned} \end{aligned}$$

(A5)

and

$$\begin{aligned} \begin{aligned}&L(y_i^\prime (\textbf{w}^\prime - t \Delta \textbf{w})^\top \textbf{x}_i^\prime ) - L(y_i^\prime {\textbf{w}^\prime }^\top \textbf{x}_i^\prime ) \\ \leqslant&\ -t \left( L(y_i^\prime {\textbf{w}^\prime }^\top \textbf{x}_i^\prime ) - L(y_i^\prime \textbf{w}^\top \textbf{x}_i^\prime ) \right) . \end{aligned} \end{aligned}$$

(A6)

Combing (A4), (A5) and (A6) and taking the limit $t \rightarrow 0$ leads to

$$\begin{aligned} \begin{aligned}&\Vert \textbf{w}^\prime -\textbf{w}\Vert ^2_2 \\ \leqslant&\ C\sum _{i=1}^n \left[ \left( L(y_i{\textbf{w}^\prime }^\top \textbf{x}_i)- L(y^\prime _i{\textbf{w}^\prime }^\top \textbf{x}^\prime _i) \right) \right. \\&\ + \left. \left( L(y^\prime _i \textbf{w}^\top \textbf{x}^\prime _i)- L(y_i \textbf{w}^\top \textbf{x}_i) \right) \right] \\ \leqslant&\ C\sum _{i=1}^n \Bigl [ \Vert \textbf{w}^\prime \Vert _2 \cdot \Vert y_i\phi (\textbf{x}_i) - y^\prime _i\phi (\textbf{x}^\prime _i)\Vert _2 \Bigr . \\&\ + \Bigl . \Vert \textbf{w}\Vert _2 \cdot \Vert y_i\phi (\textbf{x}_i) - y^\prime _i\phi (\textbf{x}^\prime _i)\Vert _2 \Bigr ]. \end{aligned} \end{aligned}$$

(A7)

The second inequality follows from the Lipschitz continuity of hinge loss. We write $\textbf{w}$ in terms of the dual variables $\alpha _i$, namely $\textbf{w} = \sum _{i=1}^n \alpha _i \textbf{x}_i$. Note that $0 \leqslant \alpha _i \leqslant C$ and $\textbf{x}_i = \textbf{G}^{1/2} \textbf{e}_i$. Then we obtain

$$\begin{aligned} \begin{aligned} \Vert \textbf{w} \Vert _2&= \Vert \sum _{i=1}^n \alpha _i \textbf{x}_i\Vert _2 \leqslant \sum _{i=1}^n |\alpha _i |\cdot \Vert \textbf{x}_i \Vert _2 \\&\leqslant C \sum _{i=1}^n \Vert \textbf{G}^{1/2} \textbf{e}_i \Vert _2 \leqslant n C \Vert \textbf{G}^{1/2} \Vert _2. \end{aligned} \end{aligned}$$

(A8)

Analogously, we have

$$\begin{aligned} \Vert \textbf{w}^\prime \Vert _2 \leqslant n C \Vert {\textbf{G}^\prime }^{1/2} \Vert _2. \end{aligned}$$

(A9)

Thus, (A7) can be rewritten as

$$\begin{aligned} \begin{aligned}&\Vert \textbf{w}^\prime -\textbf{w}\Vert ^2_2 \\ \leqslant&\ C_0^2 \sum _{i=1}^n\Vert y_i \textbf{x}_i -y^\prime _i \textbf{x}^\prime _i\Vert _2\\ &=\ C_0^2 \sum _{i=1}^n\Vert y_i\textbf{G}^{1/2}\textbf{e}_i - y^\prime _i{\textbf{G}^\prime }^{1/2}\textbf{e}_i\Vert _2\\ &=\ C_0^2 \sum _{i=1}^n\Vert (y_i\textbf{G}^{1/2}\textbf{e}_i - y^\prime _i\textbf{G}^{1/2}\textbf{e}_i) + (y^\prime _i\textbf{G}^{1/2}\textbf{e}_i - y^\prime _i{\textbf{G}^\prime }^{1/2}\textbf{e}_i)\Vert _2\\ \leqslant&\ C_0^2 \Bigl ( \Vert \textbf{G}^{1/2}\Vert _2 \sum _{i=1}^n |y_i - y^\prime _i|+ \sum _{i=1}^n \Vert \textbf{G}^{1/2}-{\textbf{G}^\prime }^{1/2}\Vert _2 \Bigr ) \\ \leqslant&\ C_0^2 \Vert \textbf{G}^{1/2}\Vert _2 \Vert \textbf{y} - \textbf{y}^\prime \Vert _1 + nC_0^2 \Vert \textbf{G} - \textbf{G}^\prime \Vert ^{1/2}_2 \end{aligned} \end{aligned}$$

(A10)

Here we use the inequality $\Vert \textbf{G}^{1/2}-{\textbf{G}^\prime }^{1/2}\Vert _2 \leqslant \Vert \textbf{G}-\textbf{G}^\prime \Vert ^{1/2}_2$.

We also have

$$\begin{aligned} \begin{aligned} \Vert \textbf{x} - \textbf{x}^\prime \Vert _2 &=\ \Vert \sum _{i=1}^n \beta _i \textbf{x}_i - \sum _{i=1}^n \beta _i \textbf{x}_i^\prime \Vert _2\\ &=\ \Vert \sum _{i=1}^n \beta _i (\textbf{G}^{1/2} - {\textbf{G}^\prime }^{1/2}) \textbf{e}_i\Vert _2 \\ \leqslant&\ \Vert \textbf{G}^{1/2} - {\textbf{G}^\prime }^{1/2}\Vert _2\Vert \varvec{\beta }\Vert _2 \\ \leqslant&\ \Vert \textbf{G} - {\textbf{G}^\prime }\Vert ^{1/2}_2\Vert \varvec{\beta }\Vert _2 \end{aligned} \end{aligned}$$

(A11)

and

$$\begin{aligned} \Vert \textbf{x} \Vert _2 = \Vert \sum _{i=1}^n \beta _i \textbf{x}_i \Vert _2 \leqslant \Vert \textbf{G}^{1/2}\Vert _2\Vert \varvec{\beta }\Vert _2. \end{aligned}$$

(A12)

Substituting (A9) to (A12) into (A1) yields

$$\begin{aligned} \begin{aligned}&|h(\textbf{x}) - h^\prime (\textbf{x}^\prime )|\\ \leqslant&\ C_0 (\Vert \textbf{G}^{1/2}\Vert _2\Vert \textbf{y} - \textbf{y}^\prime \Vert _1 + n \Vert \textbf{G} - \textbf{G}^\prime \Vert ^{1/2}_2)^{1/2} \Vert \textbf{G}^{1/2} \Vert _2 \Vert \varvec{\beta } \Vert _2 \\&+ nC \Vert {\textbf{G}^\prime }^{1/2} \Vert _2 \Vert \varvec{\beta }\Vert _2\Vert \textbf{G} - \textbf{G}^\prime \Vert ^{1/2}_2\\ \leqslant&\ C_0 (\Vert \textbf{G}^{1/2}\Vert _2^{1/2} \Vert \textbf{y} - \textbf{y}^\prime \Vert _1^{1/2} + \sqrt{n} \Vert \textbf{G} - \textbf{G}^\prime \Vert ^{1/4}_2) \Vert \textbf{G}^{1/2} \Vert _2 \Vert \varvec{\beta } \Vert _2 \\&+ nC \Vert {\textbf{G}^\prime }^{1/2} \Vert _2 \Vert \varvec{\beta }\Vert _2\Vert \textbf{G} - \textbf{G}^\prime \Vert ^{1/2}_2. \end{aligned} \end{aligned}$$

(A13)

The last inequality results from $(a+b)^{1/2}\leqslant \sqrt{a}+\sqrt{b}$. $\square $

Appendix B: Incomplete Cholesky decomposition (ICD) algorithm

The algorithm of ICD is shown in Algorithm 2.

Appendix C: Proof of Theorem 3

Proof

The second and the last statements can be inferred from the first and the third ones. Thus, we only need to proof (i) and (iii). Since $\textbf{G}$ and $\textbf{G}^\prime $ are isomorphic, there exists a bijection $\pi _0: \{1,2,\cdots ,n\}\rightarrow \{1,2,\cdots ,n\}$ such that

$$\begin{aligned} \textbf{G}(i,j) = \textbf{G}^\prime (\pi _0(i),\pi _0(j)),\ 1\leqslant i,j\leqslant n. \end{aligned}$$

We will argue by mathematical induction.

1) For $s = 1$, let $p_1 = \arg \max _{1 \leqslant i \leqslant n} \textbf{G}(i,i)$ and $q_1 = \arg \max _{1 \leqslant i \leqslant n} \textbf{G}^\prime (i,i)$. Since the largest diagonal elements of $\textbf{G}$ and $\textbf{G}^\prime $ are unique, we have $q_1 = \pi _0(p_1)$. We construct bijections

$$\begin{aligned} \omega _1(i) = \left\{ \begin{array}{ll} i, &{} i\ne 1, p_1;\\ p_1, &{} i = 1;\\ 1, &{} i = p_1. \end{array}\right. \end{aligned}$$

and

$$\begin{aligned} \omega ^\prime _1(i) = \left\{ \begin{array}{ll} i, &{} i\ne 1, q_1;\\ q_1, &{} i = 1;\\ 1, &{} i = q_1. \end{array}\right. \end{aligned}$$

If $p_1 = 1$ ($q_1 = 1$), then $\omega _1$ ($\omega ^\prime _1$) is an identity mapping. Define a composite function $\pi _1: \{1,2,\cdots ,n\}\rightarrow \{1,2,\cdots ,n\}$:

$$\begin{aligned} \pi _1(i) = \omega ^\prime _1\circ \pi _0\circ \omega _1(i). \end{aligned}$$

(C14)

Then

$$\begin{aligned} \pi _1(1) = \omega ^\prime _1\circ \pi _0\circ \omega _1(1) = \omega ^\prime _1\circ \pi _0(p_1) = \omega ^\prime _1(q_1) = 1. \end{aligned}$$

The first diagonal element of $\textbf{A}^{(1)}$ is

$$\begin{aligned} \begin{aligned} \textbf{A}^{(1)}(1,1)&= \sqrt{\textbf{G}(p_1, p_1)} = \sqrt{\textbf{G}^\prime (\pi _0(p_1),\pi _0( p_1))} \\&= \sqrt{\textbf{G}^\prime (q_1, q_1)} = \textbf{B}^{(1)}(1,1) = \textbf{B}^{(1)}(\pi _1(1),1). \end{aligned} \end{aligned}$$

The rest elements of the first column of $\textbf{A}^{(1)}$ are

$$\begin{aligned} \begin{aligned} \textbf{A}^{(1)}(i,1)&= \left\{ \begin{array}{ll} \frac{\textbf{G}(i, p_1)}{\textbf{A}^{(1)}(1,1)}, &{} i\ne p_1;\\ \frac{\textbf{G}(1, p_1)}{\textbf{A}^{(1)}(1,1)}, &{} i = p_1.\\ \end{array}\right. \\&= \frac{\textbf{G}(\omega _1(i), p_1)}{\textbf{A}^{(1)}(1,1)}, \ 2\leqslant i \leqslant n. \end{aligned} \end{aligned}$$

In a similar way,

$$\begin{aligned} \textbf{B}^{(1)}(i,1) = \frac{\textbf{G}^\prime (\omega _1^\prime (i), q_1)}{\textbf{B}^{(1)}(1,1)}, \ 2\leqslant i \leqslant n. \end{aligned}$$

Note that $\textbf{A}^{(1)}(1,1) = \textbf{B}^{(1)}(1,1)$, $\omega ^\prime _1\circ \omega ^\prime _1(i) = i$ and $\textbf{G}(i,p_1) = \textbf{G}^\prime (\pi _0(i),\pi _0(p_1))$. Then we obtain

$$\begin{aligned} \begin{aligned} \textbf{A}^{(1)}(i,1) = \ {}&\frac{\textbf{G}(\omega _1(i), p_1)}{\textbf{A}^{(1)}(1,1)} = \frac{\textbf{G}^\prime (\pi _0\circ \omega _1(i), \pi _0(p_1))}{\textbf{B}^{(1)}(1,1)}\\ = \ {}&\frac{\textbf{G}^\prime (\omega ^\prime _1\circ \omega ^\prime _1\circ \pi _0\circ \omega _1(i), q_1)}{\textbf{B}^{(1)}(1,1)} \\ = \ {}&\frac{\textbf{G}^\prime (\omega ^\prime _1\circ \pi _1(i), q_1)}{\textbf{B}^{(1)}(1,1)}\\ = \ {}&\textbf{B}^{(1)}(\pi _1(i),1), \ 2\leqslant i \leqslant n. \end{aligned} \end{aligned}$$

Let $\textbf{P}^{(1)} = \textbf{P}[1,p_1]$ and $\textbf{Q}^{(1)} = \textbf{P}[1,q_1]$. Denote $\textbf{G}^{(1)} = \textbf{P}^{(1)}\textbf{G}\textbf{P}^{(1)}$ and ${\textbf{G}^\prime }^{(1)} = \textbf{Q}^{(1)}\textbf{G}^\prime \textbf{Q}^{(1)}$, then

$$\begin{aligned} \textbf{G}^{(1)}(i,j) = (\textbf{P}^{(1)}\textbf{G}\textbf{P}^{(1)})(i,j) = \textbf{G}(\omega _1(i), \omega _1(j)) \end{aligned}$$

(C15)

and

$$\begin{aligned} \begin{aligned} {\textbf{G}^\prime }^{(1)}(\pi _1(i),\pi _1(j)) = \ {}&(\textbf{Q}^{(1)} \textbf{G}^\prime \textbf{Q}^{(1)})(\pi _1(i),\pi _1(j))\\ = \ {}&\textbf{G}^\prime (\omega ^\prime _1\circ \pi _1(i),\omega ^\prime _1\circ \pi _1(j))\\ = \ {}&\textbf{G}^\prime (\pi _0\circ \omega _1(i),\pi _0\circ \omega _1(j))\\ = \ {}&\textbf{G}(\omega _1(i), \omega _1(j)), \end{aligned} \end{aligned}$$

(C16)

where the third equality results from (C14). Combining (C15) and (C16) yields $\textbf{G}^{(1)}(i,j) = {\textbf{G}^\prime }^{(1)}(\pi _1(i),\pi _1(j))$. Therefore, the theorem holds for $s=1$.

2) Assume that the theorem holds for $s = m$. There exists a bijection $\pi _m:\{1,2,\cdots ,n\}\rightarrow \{1,2,\cdots ,n\}$ such that $\pi _m(i) = i$ ($1\leqslant i\leqslant m$) and for every i,

$$\begin{aligned} \textbf{A}^{(m)}(i,j) = \textbf{B}^{(m)}(\pi _m(i),j), \ 1\leqslant i\leqslant m. \end{aligned}$$

(C17)

There also exist permutation matrices $\textbf{P}^{(m)}$ and $\textbf{Q}^{(m)}$ such that

$$\begin{aligned} \textbf{G}^{(m)}(i,j) = {\textbf{G}^\prime }^{(m)}(\pi _m(i),\pi _m(j)), \end{aligned}$$

(C18)

where $\textbf{G}^{(m)} = \textbf{P}^{(m)} \textbf{G} \textbf{P}^{(m)}$ and ${\textbf{G}^\prime }^{(m)} = \textbf{Q}^{(m)} \textbf{G}^\prime \textbf{Q}^{(m)}$.

3) We will show that the theorem holds for $s = m+1$. The $(m+1)$th to the nth diagonal elements of $\textbf{A}^{(m)}$ are

$$\begin{aligned} \begin{aligned} \textbf{A}^{(m)}&(i,i) = \ \textbf{G}^{(m)}(i,i) - \sum _{j=1}^m\textbf{A}^{(m)}(i,j)^2\\ &=\ {\textbf{G}^\prime }^{(m)}(\pi _m(i),\pi _m(i)) - \sum _{j=1}^m\textbf{B}^{(m)}(\pi _m(i),j)^2\\ &=\ \textbf{B}^{(m)}(\pi _m(i),\pi _m(i)), \ m+1\leqslant i \leqslant n. \end{aligned} \end{aligned}$$

Note that $\pi _m(i) = i$ ($1\leqslant i\leqslant m$) and $\pi _m$ is a bijection. Let $p_{m+1} = \arg \max _{m+1\leqslant i\leqslant n}\textbf{A}^{(m)}(i,i)$ and $q_{m+1} = \arg \max _{m+1\leqslant i\leqslant n} \textbf{B}^{(m)}(i,i)$. We have $q_{m+1} = \pi _m(p_{m+1})$ due to the uniqueness of the largest diagonal elements of symmetric pivoting. We construct bijections

$$\begin{aligned} \omega _{m+1}(i) = \left\{ \begin{array}{ll} i, &{} i\ne m+1, p_{m+1};\\ p_{m+1}, &{} i = m+1;\\ m+1, &{} i = p_{m+1}. \end{array}\right. \end{aligned}$$

and

$$\begin{aligned} \omega ^\prime _{m+1}(i) = \left\{ \begin{array}{ll} i, &{} i\ne m+1, q_{m+1};\\ q_{m+1}, &{} i = m+1;\\ m+1, &{} i = q_{m+1}. \end{array}\right. \end{aligned}$$

Define a composite function $\pi _{m+1}:\{1,2,\cdots ,n\}\rightarrow \{1,2,\cdots ,n\}$:

$$\begin{aligned} \pi _{m+1}(i) = \omega ^\prime _{m+1}\circ \pi _m\circ \omega _{m+1}(i). \end{aligned}$$

(C19)

For $1 \leqslant i \leqslant m$,

$$\begin{aligned} \begin{aligned}&\pi _{m+1}(i) = \omega ^\prime _{m+1}\circ \pi _m\circ \omega _{m+1}(i)\\ &=\omega ^\prime _{m+1}\circ \pi _m(i) = \omega ^\prime _{m+1}(i) = i. \end{aligned} \end{aligned}$$

We also have

$$\begin{aligned} \begin{aligned}&\pi _{m+1}(m+1) = \omega ^\prime _{m+1}\circ \pi _m\circ \omega _{m+1}(m+1) \\ &=\omega ^\prime _{m+1}\circ \pi _m(p_{m+1}) = \omega ^\prime _{m+1}(q_{m+1}) = m+1. \end{aligned} \end{aligned}$$

Thus, $\pi _{m+1}(i) = i \ (1\leqslant i\leqslant m+1)$.

Let $\textbf{P}^{(m+1)} = \textbf{P}[m+1,p_{m+1}] \textbf{P}^{(m)}$ and $\textbf{Q}^{(m+1)} = \textbf{P}[m+1,q_{m+1}] \textbf{Q}^{(m)}$. Denote $\textbf{G}^{(m+1)} = \textbf{P}^{(m+1)} \textbf{G} \textbf{P}^{(m+1)} $ and ${\textbf{G}^\prime }^{(m+1)} = \textbf{Q}^{(m+1)} {\textbf{G}^\prime } \textbf{Q}^{(m+1)}$, then

$$\begin{aligned} \begin{aligned} \textbf{G}^{(m+1)}(i,j) &=\ (\textbf{P}^{(m+1)} \textbf{G} \textbf{P}^{(m+1)})(i,j) \\ &=\ \textbf{G}^{(m)}(\omega _{m+1}(i), \omega _{m+1}(j)), \end{aligned} \end{aligned}$$

(C20)

and

$$\begin{aligned} \begin{aligned}&\ {\textbf{G}^\prime }^{(m+1)}(\pi _{m+1}(i),\pi _{m+1}(j)) \\ &=\ (\textbf{Q}^{(m+1)} {\textbf{G}^\prime } \textbf{Q}^{(m+1)})(\pi _{m+1}(i),\pi _{m+1}(j))\\ &=\ {\textbf{G}^\prime }^{(m)}(\omega ^\prime _{m+1}\circ \pi _{m+1}(i),\omega ^\prime _{m+1}\circ \pi _{m+1}(j))\\ &=\ {\textbf{G}^\prime }^{(m)}(\pi _m\circ \omega _{m+1}(i),\pi _m\circ \omega _{m+1}(j))\\ &=\ \textbf{G}^{(m)}(\omega _{m+1}(i), \omega _{m+1}(j)), \end{aligned} \end{aligned}$$

(C21)

where the last two equations follows from (C19) and (C18), respectively. Combining (C20) and (C21) yields

$$\begin{aligned} \textbf{G}^{(m+1)}(i,j) = {\textbf{G}^\prime }^{(m+1)}(\pi _{m+1}(i), \pi _{m+1}(j)). \end{aligned}$$

For $1 \leqslant i \leqslant n$ and $1 \leqslant j \leqslant m$,

$$\begin{aligned} \begin{aligned} \textbf{A}^{(m+1)}(i,j) &=\left\{ \begin{array}{ll} \textbf{A}^{(m)}(i,j), &{} i\ne m+1, p_{m+1};\\ \textbf{A}^{(m)}(p_{m+1},j), &{} i = m+1;\\ \textbf{A}^{(m)}(m+1,j), &{} i = p_{m+1}.\\ \end{array}\right. \\ = \ {}&\textbf{A}^{(m)}(\omega _{m+1}(i),j) \\ = \ {}&\textbf{B}^{(m)}(\pi _m\circ \omega _{m+1}(i),j)\\ = \ {}&\textbf{B}^{(m)}(\omega ^\prime _{m+1}\circ \pi _{m+1}(i),j)\\ = \ {}&\textbf{B}^{(m+1)}(\pi _{m+1}(i),j), \end{aligned} \end{aligned}$$

where the third and the fourth equations results from (C17) and (C19), respectively. For the $(m+1)$th diagonal element of $\textbf{A}^{(m+1)}$,

$$\begin{aligned} \begin{aligned}&\textbf{A}^{(m+1)}(m+1,m+1) = \sqrt{\textbf{G}^{(m)}(p_{m+1}, p_{m+1})} \\ = \ {}&\sqrt{{\textbf{G}^\prime }^{(m)}(\pi _m(p_{m+1}),\pi _m( p_{m+1}))} \\ = \ {}&\sqrt{{\textbf{G}^\prime }^{(m)}(q_{m+1}, q_{m+1})} \\ = \ {}&\textbf{B}^{(m+1)}(m+1,m+1) \end{aligned} \end{aligned}$$

and for $m + 2 \leqslant i \leqslant n$,

$$\begin{aligned} \begin{aligned}&\ \textbf{A}^{(m + 1)}(i,m + 1) \\ &=\ \frac{{\textbf{G}^{(m + 1)}(i,m + 1)}}{{\textbf{A}^{(m + 1)}(m + 1,m + 1)}} -\\&\ \frac{\sum \limits _{l = 1}^m {\textbf{A}^{(m + 1)}(i,l)} \textbf{A}^{(m + 1)}(m + 1,l)}{\textbf{A}^{(m + 1)}(m + 1,m + 1)}\\ &=\ \frac{{\textbf{G}^\prime }^{(m + 1)}({\pi _{m + 1}}(i),{\pi _{m + 1}}(m + 1))}{\textbf{B}^{(m + 1)}(m + 1,m + 1)} -\\&\ \frac{\sum \limits _{l = 1}^m {\textbf{B}^{(m + 1)}({\pi _{m + 1}}(i),l)} \textbf{B}^{(m + 1)}({\pi _{m + 1}}(m + 1),l)}{\textbf{B}^{(m + 1)}(m + 1,m + 1)}\\ &=\ \frac{{\textbf{G}^\prime }^{(m+1)}({\pi _{m + 1}}(i),m + 1)}{\textbf{B}^{(m + 1)}(m + 1,m + 1)} -\\&\ \frac{\sum \limits _{l = 1}^m {\textbf{B}^{(m + 1)}({\pi _{m + 1}}(i),l)} \textbf{B}^{(m + 1)}(m + 1,l)}{\textbf{B}^{(m + 1)}(m + 1,m + 1)}\\ &=\ \textbf{B}^{(m + 1)}({\pi _{m + 1}}(i),m + 1). \end{aligned} \end{aligned}$$

Thus, for every i we obtain

$$\begin{aligned} \textbf{A}^{(m+1)}(i,j) = \textbf{B}^{(m+1)}(\pi _{m+1}(i),j),\ 1\leqslant j\leqslant m+1. \end{aligned}$$

The proof is complete by mathematical induction.

$\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pan, B., Chen, WS., Deng, L. et al. Classifier selection using geometry preserving feature. Neural Comput & Applic 35, 20955–20976 (2023). https://doi.org/10.1007/s00521-023-08828-y

Download citation

Received: 08 November 2022
Accepted: 28 June 2023
Published: 28 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00521-023-08828-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classifier selection using geometry preserving feature

Abstract

Access this article

Similar content being viewed by others

A Comparison of Feature Construction Methods in the Context of Supervised Feature Selection for Classification

Wide-ranging approach-based feature selection for classification

A Feature Selection Method Using Hierarchical Clustering

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Proof of Theorem 2

Proof

Appendix B: Incomplete Cholesky decomposition (ICD) algorithm

Appendix C: Proof of Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classifier selection using geometry preserving feature

Abstract

Access this article

Similar content being viewed by others

A Comparison of Feature Construction Methods in the Context of Supervised Feature Selection for Classification

Wide-ranging approach-based feature selection for classification

A Feature Selection Method Using Hierarchical Clustering

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Proof of Theorem 2

Proof

Appendix B: Incomplete Cholesky decomposition (ICD) algorithm

Appendix C: Proof of Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation