Generalising combinatorial discriminant analysis through conditioning truncated Rayleigh flow

Yang, Sijia; Xiong, Haoyi; Hu, Di; Xu, Kaibo; Wang, Licheng; Zhu, Peizhen; Sun, Zeyi

doi:10.1007/s10115-021-01587-z

Generalising combinatorial discriminant analysis through conditioning truncated Rayleigh flow

Regular Paper
Published: 09 July 2021

Volume 63, pages 2189–2208, (2021)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Sijia Yang¹,
Haoyi Xiong²,
Di Hu³,
Kaibo Xu⁴,
Licheng Wang¹,
Peizhen Zhu⁵ &
…
Zeyi Sun ORCID: orcid.org/0000-0003-0704-2708⁴

213 Accesses
1 Altmetric
Explore all metrics

Abstract

Fisher’s Linear Discriminant Analysis (LDA) has been widely used for linear classification, feature selection, and metrics learning in multivariate data analytics. To ensure high classification accuracy while optimally discovering predictive features from the data, this paper studied \(\mathbf {CDA}\), namely Combinatorial Discriminant Analysis that intends to combinatorially select a subset of features and assign weights to them optimally. \(\mathbf {CDA}\) extents the Truncated Rayleigh Flow algorithm (Tan et al. in J R Stat Soc: Ser B (Stat Methodol) 80(5):1057–1086, 2018) and improves LDA estimation under k-sparsity constraint. The experimental results based on the synthesized and real-world datasets demonstrate that our algorithm outperforms other LDA baselines and downstream classifiers. The empirical analysis shows that our algorithm can recover the combinatorial structure of optimal LDA with empirical consistency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Weikuan Jia, Meili Sun, … Sujuan Hou

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Bartosz Krawczyk

References

Tan KM, Wang Z, Liu H, Zhang T (2018) Sparse generalized eigenvalue problem: optimal statistical rates via truncated Rayleigh flow. J R Stat Soc: Ser B (Stat Methodol) 80(5):1057–1086
Article MathSciNet Google Scholar
RA Fisher (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7(2), 179–188
Google Scholar
R.O. Duda, P.E. Hart, D.G. Stork (2001) Pattern classification, 2nd edn. Wiley, Hoboken
MATH Google Scholar
Alipanahi B, Biggs M, Ghodsi A et al (2008) Distance metric learning vs. fisher discriminant analysis. In: Proceedings of the 23rd national conference on artificial intelligence, vol 2, pp 598–603
B Kulis et al. (2013) Metric learning: a survey. Found Trends Mach Learn 5(4), 287–364
Article MathSciNet Google Scholar
R Peck, J Van Ness (1982) The use of shrinkage estimators in linear discriminant analysis. IEEE Trans Pattern Anal Mach Intell 5:530–537
Article Google Scholar
Buhlmann P, Van De Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin
Book Google Scholar
KM Amin (2012) Combinatorial regression and improved basis pursuit for sparse estimation. California Institute of Technology, Pasadena
Google Scholar
Witten DM, Tibshirani R. (2009) Covariance-regularized regression and classification for high dimensional problems. J R Stat Soc: Ser B (Stat Methodol) 71(3):615–636
Article MathSciNet Google Scholar
Cai T, Liu W (2011) A direct estimation approach to sparse linear discriminant analysis. J Am Stat Assoc 106(496), 1566–1577
Article MathSciNet Google Scholar
Clemmensen L, Hastie T, Witten D, Ersboll B (2011) Sparse discriminant analysis. Technometrics 53(4)
Shao J, Wang Y, Deng X, Wang S et al. (2011) Sparse linear discriminant analysis by thresholding for high dimensional data. Ann Stat 39(2), 1241–1265
Article MathSciNet Google Scholar
Li Y, Jia J et al. (2017) L1 least squares for sparse high-dimensional LDA. Electron J Stat 11(1), 2499–2518
Article MathSciNet Google Scholar
Baraniuk RG. (2007) Compressive sensing. IEEE Signal Process Mag 24(4)
Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15(1), 2869–2909
MathSciNet MATH Google Scholar
Jankova J, Geer S et al (2015) Confidence intervals for high-dimensional inverse covariance estimation. Electron J Stat 9(1):1205–1229
Article MathSciNet Google Scholar
TT Cai, L Wang. (2011) Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans Inf Theory 57(7), 4680–4688
Article MathSciNet Google Scholar
Krzanowski WJ, Jonathan P, McCarthy WV, Thomas MR (1995) Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Appl Stat 44:101–115
Article Google Scholar
Ye J (2007) Least squares linear discriminant analysis. In: Proceedings of the 24th international conference on machine learning, pp 1087–1093. ACM
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441
Article Google Scholar
Anderson TW (1962) An introduction to multivariate statistical analysis. Technical report, Wiley, New York
Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 53(12), 4655–4666
Article MathSciNet Google Scholar
Globerson A, Roweis ST (2006) Metric learning by collapsing classes. In: Advances in neural information processing systems, pp 451–458
Cai TT, Ren Z, Zhou HH. et al. (2016) Estimating structured high-dimensional covariance and precision matrices: optimal rates and adaptive estimation. Electron J Stat 10(1), 1–59
MathSciNet MATH Google Scholar
Rothman AJ, Bickel PJ, Levina E, Zhu J. et al. (2008) Sparse permutation invariant covariance estimation. Electron J Stat 2:494–515
Article MathSciNet Google Scholar
Witten DM, Friedman JH, Simon N. (2011) New insights and faster computations for the graphical lasso. J Comput Graph Stat 20(4), 892–900
Article MathSciNet Google Scholar
Yu Y, Wang T, Samworth RJ (2014) A useful variant of the Davis–Kahan theorem for statisticians. Biometrika 102(2):315–323
Article MathSciNet Google Scholar
Lin C-J (2017) Libsvm data: classification (binary class)
Tibshirani R, Hastie T, Narasimhan B, Chu G. (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci 99(10), 6567–6572
Article Google Scholar
Yang D, Zhang D, Chen L, Qu B. (2015) Nationtelescope: monitoring and visualizing large-scale collective behavior in lbsns. J Netw Comput Appl 55:170–180
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Cyberspace Security & State Key Laboratory of Networking and Switching, Beijing University of Posts and Telecommunications, Haidian, Beijing, China
Sijia Yang & Licheng Wang
Big Data Lab, Baidu Research, Beijing, China
Haoyi Xiong
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
Di Hu
Mininglamp Academy of Sciences, Mininglamp Technology, Beijing, 100084, China
Kaibo Xu & Zeyi Sun
Department of Computer Sciences, Missouri University of Science and Technology, Rolla, MO, USA
Peizhen Zhu

Authors

Sijia Yang
View author publications
You can also search for this author in PubMed Google Scholar
Haoyi Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Di Hu
View author publications
You can also search for this author in PubMed Google Scholar
Kaibo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Licheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peizhen Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zeyi Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Licheng Wang or Zeyi Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 219 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, S., Xiong, H., Hu, D. et al. Generalising combinatorial discriminant analysis through conditioning truncated Rayleigh flow. Knowl Inf Syst 63, 2189–2208 (2021). https://doi.org/10.1007/s10115-021-01587-z

Download citation

Received: 28 October 2020
Revised: 18 June 2021
Accepted: 19 June 2021
Published: 09 July 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10115-021-01587-z

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalising combinatorial discriminant analysis through conditioning truncated Rayleigh flow

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Feature dimensionality reduction: a review

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 219 KB)

Rights and permissions

About this article

Cite this article

Navigation

Generalising combinatorial discriminant analysis through conditioning truncated Rayleigh flow

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Feature dimensionality reduction: a review

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 219 KB)

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation