Toward structural sparsity: an explicit $$\ell _{2}/\ell _0$$ approach

Luo, Dijun; Ding, Chris; Huang, Heng

doi:10.1007/s10115-012-0545-2

Toward structural sparsity: an explicit $\ell _{2}/\ell _0$ approach

Regular Paper
Published: 18 September 2012

Volume 36, pages 411–438, (2013)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Dijun Luo¹,
Chris Ding¹ &
Heng Huang¹

665 Accesses
11 Citations
Explore all metrics

Abstract

As powerful tools, machine learning and data mining techniques have been widely applied in various areas. However, in many real-world applications, besides establishing accurate black box predictors, we are also interested in white box mechanisms, such as discovering predictive patterns in data that enhance our understanding of underlying physical, biological and other natural processes. For these purposes, sparse representation and its variations have been one of the focuses. More recently, structural sparsity has attracted increasing attentions. In previous research, structural sparsity was often achieved by imposing convex but non-smooth norms such as ${\ell _{2}/\ell _{1}}$ and group ${\ell _{2}/\ell _{1}}$ norms. In this paper, we present the explicit ${\ell _2/\ell _0}$ and group ${\ell _2/\ell _0}$ norm to directly approach the structural sparsity. To tackle the problem of intractable ${\ell _2/\ell _0}$ optimizations, we develop a general Lipschitz auxiliary function that leads to simple iterative algorithms. In each iteration, optimal solution is achieved for the induced subproblem and a guarantee of convergence is provided. Furthermore, the local convergent rate is also theoretically bounded. We test our optimization techniques in the multitask feature learning problem. Experimental results suggest that our approaches outperform other approaches in both synthetic and real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Article 14 February 2018

Tutorial on PCA and approximate PCA and approximate kernel PCA

Article Open access 31 October 2022

Notes

References

Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Article Google Scholar
Bach FR (2008) Bolasso: model consistent lasso estimation through the bootstrap. In: ‘ICML’, pp 33–40
Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the smo algorithm. In: ICML
Bach FR, Thibaux R, Jordan MI (2004) Computing regularization paths for learning multiple kernels. In: NIPS
Baralis E, Bruno G, Fiori A (2011) Measuring gene similarity by means of the classification distance. Knowl Inform Syst 29(1):81–101
Article Google Scholar
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202
Article MathSciNet MATH Google Scholar
Cai J-F, Candès EJ, Shen Z (2008) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982
Article Google Scholar
Candès EJ, Romberg JK (2006) Quantitative robust uncertainty principles and optimally sparse decompositions. Found Comput Math 6(2):227–254
Article MathSciNet MATH Google Scholar
Candès E, Tao T (2004) Rejoinder: statistical estimation when $p$ is much larger than $n$’. Annu Stat 35:2392–2404
Article Google Scholar
Candès E, Tao T (2005) Decoding by linear programming. IEEE Trans Inform Theory 51:4203–4215
Article MathSciNet MATH Google Scholar
Candès E, Wakin M (2008) An introduction to compressive sensing’. IEEE Signal Process Mag 25(2): 21–30
Article Google Scholar
Chen X, Lin Q, Kim S, Xing E (2010) An efficient proximal-gradient method for single and multi-task regression with structured sparsity. Technical Report, arXiv:1005.4717
Davis G, Mallat S, Avellaneda M (1997) Greedy adaptive approximation. J Constr Approx 13:57–98
MathSciNet MATH Google Scholar
Ding CHQ, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: KDD, pp 126–135
Ding C, Zhou D, He X, Zha H (June 2006) R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. Proceedings of international conference on machine learning (ICML)
Efron B, Hastie T, Johnstone L, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Article MathSciNet MATH Google Scholar
El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing mrmr filter and ga wrapper. Knowl Inform Syst 26(3):487–500
Article Google Scholar
Fan J, Li R (2003) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Friedman J, Hastie T, Hölfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Stat 1(2):302–332
Article MATH Google Scholar
Fu WJ (2000) Penalized regressions: the bridge versus the lasso. J Comput Graph Stat 7(3):397–416
Google Scholar
Huang K, Ying Y, Campbell C (2011) Generalized sparse metric learning with relative comparisons. Knowl Inform Syst 28(1):25–45
Article Google Scholar
Huang S, Li J, Sun L, Ye J, Fleisher A, Wu T, Chen K, Reiman E (2010) Learning brain connectivity of alzheimers disease by sparse inverse covariance estimation. NeuroImage 50:935–949
Article Google Scholar
Jenatton R, Obozinski G, Bach F (2009) Structured sparse principal component analysis’. Arxiv, preprint arXiv: 0909.1440
Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502
Article Google Scholar
Lee DD, Seung HS (1983) A method for solving a convex programming problem with convergence rate$o(1/k^2)$. Sov Math Dokl 27:372–376
Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Article Google Scholar
Leggetter CJ, Woodland PC (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput Speech Lang 9(2):171–185
Article Google Scholar
Liu J, Chen J, Ye J (2009) Large-scale sparse logistic regression. In: SIGKDD09, pp 547–556
Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient $l_{2,1}$-norm minimization. In: UAI2009
Liu J, Musialski P, Wonka P, Ye J (2009) Tensor completion for estimating missing values in visual data. In: ICCV09, pp 2114–2121
Liu J, Ye J (2010) Moreau-yosida regularization for grouped tree structure learning. In: Lafferty J, Williams CKI, Shawe-Taylor J, Zemel R, Culotta A (eds) NIPS vol 23, pp 1459–1467
Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:10–60
MathSciNet Google Scholar
Mallat S, Zhang Z (1993) Matching pursuit in a time-frequency dictionary. IEEE Trans Signal Process 41(12):3397–3415
Article MATH Google Scholar
Nesterov Y (2003) Introductory lectures on convex optimization: a basic course. Kluwer, Dordrecht
Google Scholar
Nesterov Y (2007) Gradient methods for minimizing composite objective function. Technical report CORE
Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint $\ell _{2,1}$-norms minimization. In: NIPS
Obozinski G, Taskar B, Jordan MI (2010) Joint covariate selection and joint subspace selection for multiple classification problems. Stat Comput 20:231–252
Article MathSciNet Google Scholar
Osborne MR, Presnell B, Turlach BA (2000) On the lasso and its dual. J Comput Graph Stat 9(2):319–337
MathSciNet Google Scholar
Peng J, Zhu J, Bergamaschi A, Han W, Noh D-Y, Pollack JR, Wang P (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann Appl Stat 2(1):53–77
Article MathSciNet Google Scholar
Shevade SK, Keerthi SS (2003) A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17):2246–2253
Article Google Scholar
Simmuteit S, Schleif F, Villmann T, Hammer B (2010) Evolving trees for the retrieval of mass spectrometry-based bacteria fingerprints. Knowl Inform Syst 25(2):327–343
Article Google Scholar
Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and trecvid. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval, pp 321–330
Stojnic M (2009) $\ell _2/\ell _1$-optimization in block-sparse compressed sensing and its strong thresholds. IEEE J Sel Top Signal Process 4(2):350–357
Google Scholar
Sun L, Liu J, Chen J, Ye J (2009) Efficient recovery of jointly sparse vectors. Adv Neural Inform Process Syst 22:1812–1820
Google Scholar
Sun L, Patel R, Liu J, Chen K, Wu T, Li J, Reiman E, Ye J (2009) Mining brain region connectivity for alzheimer’s disease study via sparse inverse covariance estimation. In: SIGKDD09, pp 1335–1344
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc B 58:267–288
MathSciNet MATH Google Scholar
Tibshirani R (2008) Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9(1):18–29
Article MATH Google Scholar
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2004) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B 67(1):91–108
Article MathSciNet Google Scholar
Trohidis K, Tsoumakas G, Kalliris G, Vlahavas I, (2008) Multilabel classification of music into emotions. In: Proceedings 9th international conference on music information retrieval (ISMIR, 2008) Philadelphia, PA, USA, vol 2008
Tropp J (2004) Just relax: Convex programming methods for subset selection and sparse approximation. ICES report, pp 04–04
Tropp J, Gilbert A (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inform Theory 53(12):4655–4666
Article MathSciNet Google Scholar
Wright J, Yang A, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Article Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67
Article MathSciNet MATH Google Scholar
Zhao Z et al (2008) Imputation of missing genotypes: an empirical evaluation of impute. BMC Genetics 9:85
Google Scholar
Zhao P, Rocha G, Yu B (2009) Grouped and hierarchical model selection through composite absolute penalties. Ann Stat 37(6A):3468–3497
Article MathSciNet MATH Google Scholar
Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7:2541–2563
MathSciNet MATH Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320
Article MathSciNet MATH Google Scholar
Zuo H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 110(476):1418–1429
Google Scholar

Download references

Acknowledgments

This research was partially supported by NSF CCF-0830780, 0917274, NSF DMS-0915228, NSF CNS-0923494, 1035913.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Texas at Arlington, 500 UTA Boulevard, Box 19015, Arlington, TX, 76019, USA
Dijun Luo, Chris Ding & Heng Huang

Authors

Dijun Luo
View author publications
You can also search for this author in PubMed Google Scholar
Chris Ding
View author publications
You can also search for this author in PubMed Google Scholar
Heng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heng Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, D., Ding, C. & Huang, H. Toward structural sparsity: an explicit $\ell _{2}/\ell _0$ approach. Knowl Inf Syst 36, 411–438 (2013). https://doi.org/10.1007/s10115-012-0545-2

Download citation

Received: 04 April 2011
Revised: 07 November 2011
Accepted: 22 August 2012
Published: 18 September 2012
Issue Date: August 2013
DOI: https://doi.org/10.1007/s10115-012-0545-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward structural sparsity: an explicit \(\ell _{2}/\ell _0\) approach

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Tutorial on PCA and approximate PCA and approximate kernel PCA

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Toward structural sparsity: an explicit \(\ell _{2}/\ell _0\) approach

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Tutorial on PCA and approximate PCA and approximate kernel PCA

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation