Abstract
Fisher’s Linear Discriminant Analysis (LDA) has been widely used for linear classification, feature selection, and metrics learning in multivariate data analytics. To ensure high classification accuracy while optimally discovering predictive features from the data, this paper studied \(\mathbf {CDA}\), namely Combinatorial Discriminant Analysis that intends to combinatorially select a subset of features and assign weights to them optimally. \(\mathbf {CDA}\) extents the Truncated Rayleigh Flow algorithm (Tan et al. in J R Stat Soc: Ser B (Stat Methodol) 80(5):1057–1086, 2018) and improves LDA estimation under k-sparsity constraint. The experimental results based on the synthesized and real-world datasets demonstrate that our algorithm outperforms other LDA baselines and downstream classifiers. The empirical analysis shows that our algorithm can recover the combinatorial structure of optimal LDA with empirical consistency.
Similar content being viewed by others
References
Tan KM, Wang Z, Liu H, Zhang T (2018) Sparse generalized eigenvalue problem: optimal statistical rates via truncated Rayleigh flow. J R Stat Soc: Ser B (Stat Methodol) 80(5):1057–1086
RA Fisher (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7(2), 179–188
R.O. Duda, P.E. Hart, D.G. Stork (2001) Pattern classification, 2nd edn. Wiley, Hoboken
Alipanahi B, Biggs M, Ghodsi A et al (2008) Distance metric learning vs. fisher discriminant analysis. In: Proceedings of the 23rd national conference on artificial intelligence, vol 2, pp 598–603
B Kulis et al. (2013) Metric learning: a survey. Found Trends Mach Learn 5(4), 287–364
R Peck, J Van Ness (1982) The use of shrinkage estimators in linear discriminant analysis. IEEE Trans Pattern Anal Mach Intell 5:530–537
Buhlmann P, Van De Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin
KM Amin (2012) Combinatorial regression and improved basis pursuit for sparse estimation. California Institute of Technology, Pasadena
Witten DM, Tibshirani R. (2009) Covariance-regularized regression and classification for high dimensional problems. J R Stat Soc: Ser B (Stat Methodol) 71(3):615–636
Cai T, Liu W (2011) A direct estimation approach to sparse linear discriminant analysis. J Am Stat Assoc 106(496), 1566–1577
Clemmensen L, Hastie T, Witten D, Ersboll B (2011) Sparse discriminant analysis. Technometrics 53(4)
Shao J, Wang Y, Deng X, Wang S et al. (2011) Sparse linear discriminant analysis by thresholding for high dimensional data. Ann Stat 39(2), 1241–1265
Li Y, Jia J et al. (2017) L1 least squares for sparse high-dimensional LDA. Electron J Stat 11(1), 2499–2518
Baraniuk RG. (2007) Compressive sensing. IEEE Signal Process Mag 24(4)
Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15(1), 2869–2909
Jankova J, Geer S et al (2015) Confidence intervals for high-dimensional inverse covariance estimation. Electron J Stat 9(1):1205–1229
TT Cai, L Wang. (2011) Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans Inf Theory 57(7), 4680–4688
Krzanowski WJ, Jonathan P, McCarthy WV, Thomas MR (1995) Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Appl Stat 44:101–115
Ye J (2007) Least squares linear discriminant analysis. In: Proceedings of the 24th international conference on machine learning, pp 1087–1093. ACM
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441
Anderson TW (1962) An introduction to multivariate statistical analysis. Technical report, Wiley, New York
Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 53(12), 4655–4666
Globerson A, Roweis ST (2006) Metric learning by collapsing classes. In: Advances in neural information processing systems, pp 451–458
Cai TT, Ren Z, Zhou HH. et al. (2016) Estimating structured high-dimensional covariance and precision matrices: optimal rates and adaptive estimation. Electron J Stat 10(1), 1–59
Rothman AJ, Bickel PJ, Levina E, Zhu J. et al. (2008) Sparse permutation invariant covariance estimation. Electron J Stat 2:494–515
Witten DM, Friedman JH, Simon N. (2011) New insights and faster computations for the graphical lasso. J Comput Graph Stat 20(4), 892–900
Yu Y, Wang T, Samworth RJ (2014) A useful variant of the Davis–Kahan theorem for statisticians. Biometrika 102(2):315–323
Lin C-J (2017) Libsvm data: classification (binary class)
Tibshirani R, Hastie T, Narasimhan B, Chu G. (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci 99(10), 6567–6572
Yang D, Zhang D, Chen L, Qu B. (2015) Nationtelescope: monitoring and visualizing large-scale collective behavior in lbsns. J Netw Comput Appl 55:170–180
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Yang, S., Xiong, H., Hu, D. et al. Generalising combinatorial discriminant analysis through conditioning truncated Rayleigh flow. Knowl Inf Syst 63, 2189–2208 (2021). https://doi.org/10.1007/s10115-021-01587-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01587-z