Skip to main content
Log in

Toward structural sparsity: an explicit \(\ell _{2}/\ell _0\) approach

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

As powerful tools, machine learning and data mining techniques have been widely applied in various areas. However, in many real-world applications, besides establishing accurate black box predictors, we are also interested in white box mechanisms, such as discovering predictive patterns in data that enhance our understanding of underlying physical, biological and other natural processes. For these purposes, sparse representation and its variations have been one of the focuses. More recently, structural sparsity has attracted increasing attentions. In previous research, structural sparsity was often achieved by imposing convex but non-smooth norms such as \({\ell _{2}/\ell _{1}}\) and group \({\ell _{2}/\ell _{1}}\) norms. In this paper, we present the explicit \({\ell _2/\ell _0}\) and group \({\ell _2/\ell _0}\) norm to directly approach the structural sparsity. To tackle the problem of intractable \({\ell _2/\ell _0}\) optimizations, we develop a general Lipschitz auxiliary function that leads to simple iterative algorithms. In each iteration, optimal solution is achieved for the induced subproblem and a guarantee of convergence is provided. Furthermore, the local convergent rate is also theoretically bounded. We test our optimization techniques in the multitask feature learning problem. Experimental results suggest that our approaches outperform other approaches in both synthetic and real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://hapmap.ncbi.nlm.nih.gov/downloads/phasing/2005-03_phaseI/full/genotypes_chr21_CEU.phased.gz.

  2. http://research.microsoft.com/en-us/projects/objectclassrecognition/default.html.

  3. http://www.cl.cam.ac.uk/research/dtg/attarchive/ facedatabase.html.

  4. http://mlg.ucd.ie/content/view/61.

  5. http://www.public.asu.edu/~jye02/Software/SLEP/download.html.

References

  1. Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272

    Article  Google Scholar 

  2. Bach FR (2008) Bolasso: model consistent lasso estimation through the bootstrap. In: ‘ICML’, pp 33–40

  3. Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the smo algorithm. In: ICML

  4. Bach FR, Thibaux R, Jordan MI (2004) Computing regularization paths for learning multiple kernels. In: NIPS

  5. Baralis E, Bruno G, Fiori A (2011) Measuring gene similarity by means of the classification distance. Knowl Inform Syst 29(1):81–101

    Article  Google Scholar 

  6. Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202

    Article  MathSciNet  MATH  Google Scholar 

  7. Cai J-F, Candès EJ, Shen Z (2008) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982

    Article  Google Scholar 

  8. Candès EJ, Romberg JK (2006) Quantitative robust uncertainty principles and optimally sparse decompositions. Found Comput Math 6(2):227–254

    Article  MathSciNet  MATH  Google Scholar 

  9. Candès E, Tao T (2004) Rejoinder: statistical estimation when \(p\) is much larger than \(n\)’. Annu Stat 35:2392–2404

    Article  Google Scholar 

  10. Candès E, Tao T (2005) Decoding by linear programming. IEEE Trans Inform Theory 51:4203–4215

    Article  MathSciNet  MATH  Google Scholar 

  11. Candès E, Wakin M (2008) An introduction to compressive sensing’. IEEE Signal Process Mag 25(2): 21–30

    Article  Google Scholar 

  12. Chen X, Lin Q, Kim S, Xing E (2010) An efficient proximal-gradient method for single and multi-task regression with structured sparsity. Technical Report, arXiv:1005.4717

  13. Davis G, Mallat S, Avellaneda M (1997) Greedy adaptive approximation. J Constr Approx 13:57–98

    MathSciNet  MATH  Google Scholar 

  14. Ding CHQ, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: KDD, pp 126–135

  15. Ding C, Zhou D, He X, Zha H (June 2006) R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. Proceedings of international conference on machine learning (ICML)

  16. Efron B, Hastie T, Johnstone L, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499

    Article  MathSciNet  MATH  Google Scholar 

  17. El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing mrmr filter and ga wrapper. Knowl Inform Syst 26(3):487–500

    Article  Google Scholar 

  18. Fan J, Li R (2003) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  19. Friedman J, Hastie T, Hölfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Stat 1(2):302–332

    Article  MATH  Google Scholar 

  20. Fu WJ (2000) Penalized regressions: the bridge versus the lasso. J Comput Graph Stat 7(3):397–416

    Google Scholar 

  21. Huang K, Ying Y, Campbell C (2011) Generalized sparse metric learning with relative comparisons. Knowl Inform Syst 28(1):25–45

    Article  Google Scholar 

  22. Huang S, Li J, Sun L, Ye J, Fleisher A, Wu T, Chen K, Reiman E (2010) Learning brain connectivity of alzheimers disease by sparse inverse covariance estimation. NeuroImage 50:935–949

    Article  Google Scholar 

  23. Jenatton R, Obozinski G, Bach F (2009) Structured sparse principal component analysis’. Arxiv, preprint arXiv: 0909.1440

  24. Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502

    Article  Google Scholar 

  25. Lee DD, Seung HS (1983) A method for solving a convex programming problem with convergence rate\(o(1/k^2)\). Sov Math Dokl 27:372–376

    Google Scholar 

  26. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  27. Leggetter CJ, Woodland PC (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput Speech Lang 9(2):171–185

    Article  Google Scholar 

  28. Liu J, Chen J, Ye J (2009) Large-scale sparse logistic regression. In: SIGKDD09, pp 547–556

  29. Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient \(l_{2,1}\)-norm minimization. In: UAI2009

  30. Liu J, Musialski P, Wonka P, Ye J (2009) Tensor completion for estimating missing values in visual data. In: ICCV09, pp 2114–2121

  31. Liu J, Ye J (2010) Moreau-yosida regularization for grouped tree structure learning. In: Lafferty J, Williams CKI, Shawe-Taylor J, Zemel R, Culotta A (eds) NIPS vol 23, pp 1459–1467

  32. Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:10–60

    MathSciNet  Google Scholar 

  33. Mallat S, Zhang Z (1993) Matching pursuit in a time-frequency dictionary. IEEE Trans Signal Process 41(12):3397–3415

    Article  MATH  Google Scholar 

  34. Nesterov Y (2003) Introductory lectures on convex optimization: a basic course. Kluwer, Dordrecht

    Google Scholar 

  35. Nesterov Y (2007) Gradient methods for minimizing composite objective function. Technical report CORE

  36. Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint \(\ell _{2,1}\)-norms minimization. In: NIPS

  37. Obozinski G, Taskar B, Jordan MI (2010) Joint covariate selection and joint subspace selection for multiple classification problems. Stat Comput 20:231–252

    Article  MathSciNet  Google Scholar 

  38. Osborne MR, Presnell B, Turlach BA (2000) On the lasso and its dual. J Comput Graph Stat 9(2):319–337

    MathSciNet  Google Scholar 

  39. Peng J, Zhu J, Bergamaschi A, Han W, Noh D-Y, Pollack JR, Wang P (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann Appl Stat 2(1):53–77

    Article  MathSciNet  Google Scholar 

  40. Shevade SK, Keerthi SS (2003) A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17):2246–2253

    Article  Google Scholar 

  41. Simmuteit S, Schleif F, Villmann T, Hammer B (2010) Evolving trees for the retrieval of mass spectrometry-based bacteria fingerprints. Knowl Inform Syst 25(2):327–343

    Article  Google Scholar 

  42. Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and trecvid. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval, pp 321–330

  43. Stojnic M (2009) \(\ell _2/\ell _1\)-optimization in block-sparse compressed sensing and its strong thresholds. IEEE J Sel Top Signal Process 4(2):350–357

    Google Scholar 

  44. Sun L, Liu J, Chen J, Ye J (2009) Efficient recovery of jointly sparse vectors. Adv Neural Inform Process Syst 22:1812–1820

    Google Scholar 

  45. Sun L, Patel R, Liu J, Chen K, Wu T, Li J, Reiman E, Ye J (2009) Mining brain region connectivity for alzheimer’s disease study via sparse inverse covariance estimation. In: SIGKDD09, pp 1335–1344

  46. Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc B 58:267–288

    MathSciNet  MATH  Google Scholar 

  47. Tibshirani R (2008) Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9(1):18–29

    Article  MATH  Google Scholar 

  48. Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2004) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B 67(1):91–108

    Article  MathSciNet  Google Scholar 

  49. Trohidis K, Tsoumakas G, Kalliris G, Vlahavas I, (2008) Multilabel classification of music into emotions. In: Proceedings 9th international conference on music information retrieval (ISMIR, 2008) Philadelphia, PA, USA, vol 2008

  50. Tropp J (2004) Just relax: Convex programming methods for subset selection and sparse approximation. ICES report, pp 04–04

  51. Tropp J, Gilbert A (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inform Theory 53(12):4655–4666

    Article  MathSciNet  Google Scholar 

  52. Wright J, Yang A, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227

    Article  Google Scholar 

  53. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67

    Article  MathSciNet  MATH  Google Scholar 

  54. Zhao Z et al (2008) Imputation of missing genotypes: an empirical evaluation of impute. BMC Genetics 9:85

    Google Scholar 

  55. Zhao P, Rocha G, Yu B (2009) Grouped and hierarchical model selection through composite absolute penalties. Ann Stat 37(6A):3468–3497

    Article  MathSciNet  MATH  Google Scholar 

  56. Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7:2541–2563

    MathSciNet  MATH  Google Scholar 

  57. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320

    Article  MathSciNet  MATH  Google Scholar 

  58. Zuo H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 110(476):1418–1429

    Google Scholar 

Download references

Acknowledgments

This research was partially supported by NSF CCF-0830780, 0917274, NSF DMS-0915228, NSF CNS-0923494, 1035913.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, D., Ding, C. & Huang, H. Toward structural sparsity: an explicit \(\ell _{2}/\ell _0\) approach. Knowl Inf Syst 36, 411–438 (2013). https://doi.org/10.1007/s10115-012-0545-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0545-2

Keywords

Navigation