Sparse optimal discriminant clustering

Wang, Yanhong; Fang, Yixin; Wang, Junhui

doi:10.1007/s11222-015-9547-8

Sparse optimal discriminant clustering

Published: 15 February 2015

Volume 26, pages 629–639, (2016)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Yanhong Wang¹,
Yixin Fang² &
Junhui Wang³

531 Accesses
Explore all metrics

Abstract

In this manuscript, we reinvestigate an existing clustering procedure, optimal discriminant clustering (ODC; Zhang and Dai in Adv Neural Inf Process Syst 23(12):2241–2249, 2009), and propose to use cross-validation to select the tuning parameter. Furthermore, because in high-dimensional data many of the features may be non-informative for clustering, we develop a variation of ODC, sparse optimal discriminant clustering (SODC), by adding a group-lasso type of penalty to ODC. We also demonstrate that both ODC and SDOC can be used as a dimension reduction tool for data visualization in cluster analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Clustering with K-Means - Which Penalties and for Which Data?

Sparse Principal Component Analysis via Rotation and Truncation

Linear discriminant analysis

Article 26 September 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Ben-David, S., Von Luxburg, U., Pal, D.: A sober look at clustering stability. 19th Annual Conference on Learning Theory (COLT 2006) 4005, 5–19 (2006)
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. Pac. Symp. Biocomput. 7, 6–17 (2002)
Google Scholar
Bouveyron, C., Brunet, C.: Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Stat. Comput. 22(1), 301–324 (2012)
Article MathSciNet MATH Google Scholar
Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Simul. Comput. 3(1), 1–27 (1974)
Article MathSciNet MATH Google Scholar
Cattell, R.B.: The scree test for the number of factors. Multivar. Behav. Res. 1(2), 245–276 (1966)
Article Google Scholar
Chang, W.: On using principal components before separating a mixture of two multivaiate normal distributions. Appl. Stat. 32(3), 267–275 (1998)
Article Google Scholar
Clemmensen, L., Hastie, T., Witten, D.M., Ersboll, B.: Sparse discriminant analysis. Technometrics 53(4), 406–413 (2011)
Article MathSciNet Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)
Article Google Scholar
De la Torre, F., Kanade, T.: Discriminative cluster analysis. In: The 23rd International Conference on Machine Learning, pp. 241–248 (2006)
Fang, Y., Wang, J.: Selection of the number of clusters via the bootstrap method. Comput. Stat. Data Anal. 56(3), 468–477 (2012)
Article MathSciNet MATH Google Scholar
Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–584 (1983)
Article MATH Google Scholar
Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes (with discussion). J. R. Stat. Soc. Ser. B 66(4), 815–849 (2004)
Article MathSciNet MATH Google Scholar
Friedman, J.H., Tukey, J.W.: A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput. C–23(9), 881–890 (1974)
Article MATH Google Scholar
Gnanadesikan, R.: Methods for Statistical Data Analysis of Multivariate Observations, 2nd edn. Wiley, New York (1997)
Book MATH Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
MATH Google Scholar
Hastie, T., Tibshirani, R., Buja, A.: Flexible discriminant analysis by optimal scoring. J. Am. Stat. Assoc. 89, 1255–1270 (1994)
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, 2nd edn. Springer, New York (2009)
Book MATH Google Scholar
Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 2, 241–254 (1967)
Article Google Scholar
Jones, M.C., Sibson, R.: What is projection pursuit? J. R. Stat. Soc. Ser. A 150(1), 1–37 (1987)
Article MathSciNet MATH Google Scholar
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluter Analysis. Wiley, New York (1990)
Book Google Scholar
Krzanowski, W.J., Lai, Y.T.: A criterion for determining the number of groups in a data set using sum-of-squares clustering. Biometrics 44(1), 23–34 (1988)
Article MathSciNet MATH Google Scholar
Lange, T., Roth, V., Braun, M., Buhmann, J.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)
Article MATH Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability 1, 281–297 (1967)
Maugis, C., Celeux, G., Martin-Magniette, M.L.: Variable selection in model-based clustering: a general variable role modeling. Comput. Stat. Data Anal. 53(11), 3872–3882 (2009)
Article MathSciNet MATH Google Scholar
Melnykov, V., Chen, W.-C., Maitra, R.: MixSim: an R package for simulating data to study performance of clustering algorithms. J. Stat. Softw. 51(12), 1–25 (2012)
Article Google Scholar
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 14, 849–856 (2001)
Raftery, A.E., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. Assoc. 101(473), 168–178 (2006)
Article MathSciNet MATH Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. (Am. Stat. Assoc.) 66(336), 846–850 (1971)
Article Google Scholar
Rocci, R., Gattone, S.F., Vichi, M.: A new dimension reduction method: factor discriminant K-means. J. Classif. 28, 210–226 (2011)
Article MathSciNet MATH Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Steinley, D., Brusco, M.J.: A new variable weighting and selection procedure for K-means cluster analysis. Multivar. Behav. Res. 43(1), 77–108 (2008)
Article MathSciNet Google Scholar
Sugar, C., James, G.: Finding the number of clusters in a data set: an imformation theoretic approach. J. Am. Stat. Assoc. 98(463), 750–763 (2003)
Article MathSciNet MATH Google Scholar
Sun, L., Ji, S., Ye, J.: A least squares formulation for canonical correlation analysis. In: The 25th International Conference Machine Learning, pp. 1024–1031 (2008)
Sun, W., Wang, J., Fang, Y.: Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electron. J. Stat. 6, 148–167 (2012)
Article MathSciNet MATH Google Scholar
Sun, W., Wang, J., Fang, Y.: Consistent selection of tuning parameters via variable selection stability. J. Mach. Learn. Res. 14, 3419–3440 (2013)
MathSciNet MATH Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B 63(2), 411–423 (2001)
Article MathSciNet MATH Google Scholar
Tyler, D.E., Critchley, F., Dümbgen, L., Oja, H.: Invariant co-ordinate selection (with discussion). J. R. Stat. Soc. Ser. B 71(3), 549–592 (2009)
Article MathSciNet MATH Google Scholar
Wang, J.: Consistent selection of the number of clusters via crossvalidation. Biometrika 97(4), 893–904 (2010)
Article MathSciNet MATH Google Scholar
Witten, D.M., Tibshirani, R.: A framework for feature selection in clustering. J. Am. Stat. Assoc. 105(490), 713–726 (2010)
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68(1), 49–67 (2006)
Article MathSciNet MATH Google Scholar
Zhang, Z., Dai, G.: Optimal scoring for unsupervised learning. Adv. Neural Inf. Process. Syst. 23(12), 2241–2249 (2009)
Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67(2), 301–320 (2005)
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, 30303, USA
Yanhong Wang
New York University, New York, USA
Yixin Fang
University of Hong Kong, Hong Kong, China
Junhui Wang

Authors

Yanhong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yixin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Junhui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanhong Wang.

Appendix

Given $\mathbf Y$, the sub-gradient equations for (3) of $W$ is

$$\begin{aligned}&- 2\widetilde{\mathbf{X}}_j^{'} \left( \mathbf{Y}- \sum _{l=1}^p \widetilde{\mathbf{X}}_l w_l\right) + 2\lambda _2 w_{j}\\&\quad + \lambda _1\frac{w_{j}}{\Vert w_{j}\Vert _2} = 0, \quad j=1, \ldots , p. \end{aligned}$$

Let the minimizer of (3) be $\widehat{W}=\left( \widehat{w}_1, \ldots , \widehat{w}_p\right) '$. If

$$\begin{aligned} \left\| \widetilde{\mathbf{X}}_j^{'}\left( \mathbf{Y}- \sum _{l\ne j} \widetilde{\mathbf{X}}_l\widehat{w}_l\right) \right\| _2< \left\| \frac{\lambda _1}{2}\right\| , \end{aligned}$$

then $\widehat{w}_j=0$. (Therefore, $\widehat{w}_j=0$ for any $j$ if $\lambda >\lambda ^{\max }_1=\max _{j}\Vert \widetilde{\mathbf{X}}_j^{'}\mathbf{Y}\Vert $.) Otherwise,

$$\begin{aligned} \widehat{w}_j= \left( \widetilde{\mathbf{X}}_j^{'} \widetilde{\mathbf{X}}_j+\lambda _2+\frac{\lambda _1 }{2\Vert w_{j} \Vert _2}\right) ^{-1}V_j, \end{aligned}$$

where $V_j=\widetilde{\mathbf{X}}_j^{'}(\mathbf{Y}- \sum _{l\ne j} \widetilde{\mathbf{X}}_l\widehat{w}_l)$. Note that $\widetilde{\mathbf{X}}_j^{'} \widetilde{\mathbf{X}}_j$ is actually a diagonal matrix where diagonal terms are sample variances of features. If we conduct standardization on the design matrix at the beginning, we have $\widetilde{\mathbf{X}}_j^{'} \widetilde{\mathbf{X}}_j=I_{k-1}$ then the above equation becomes

$$\begin{aligned} \widehat{w}_j = \left( \frac{2\Vert \widehat{w}_{j}\Vert _{2}}{\lambda _1+ 2(1+\lambda _2)\Vert \widehat{w}_{j}\Vert _{2}}\right) V_{j}. \end{aligned}$$

The Euclidean norm is $\Vert \widehat{w}_{j}\Vert _{2}=\frac{2\Vert V_{j}\Vert _2-\lambda _1}{2(1+\lambda _2)}$. Plugging this norm to the above formula of $\widehat{w}_j$, we get the formula of $\widehat{w}_j$ stated in the theorem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Fang, Y. & Wang, J. Sparse optimal discriminant clustering. Stat Comput 26, 629–639 (2016). https://doi.org/10.1007/s11222-015-9547-8

Download citation

Received: 02 March 2014
Accepted: 23 January 2015
Published: 15 February 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11222-015-9547-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse optimal discriminant clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sparse Clustering with K-Means - Which Penalties and for Which Data?

Sparse Principal Component Analysis via Rotation and Truncation

Linear discriminant analysis

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Sparse optimal discriminant clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sparse Clustering with K-Means - Which Penalties and for Which Data?

Sparse Principal Component Analysis via Rotation and Truncation

Linear discriminant analysis

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation