Skip to main content
Log in

Learning the coordinate gradients

  • Published:
Advances in Computational Mathematics Aims and scope Submit manuscript

Abstract

In this paper we study the problem of learning the gradient function with application to variable selection and determining variable covariation. Firstly, we propose a novel unifying framework for coordinate gradient learning from the perspective of multi-task learning. Various variable selection methods can be regarded as special instances of this framework. Secondly, we formulate the dual problems of gradient learning with general loss functions. This enables the direct application of standard optimization toolboxes to the case of gradient learning. For instance, gradient learning with SVM loss can be solved by quadratic programming (QP) routines. Thirdly, we propose a novel gradient learning formulation which can be cast as a learning the kernel matrix problem. Its relation with sparse regularization is highlighted. A semi-infinite linear programming (SILP) approach and an iterative optimization approach are proposed to efficiently solve this problem. Finally, we validate our proposed approaches on both synthetic and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Argyriou, A., Micchelli, C.A., Pontil, M., Ying, Y.: A spectral regularization framework for multi-task structure learning. Adv. Neural Inf. Process. Syst. (NIPS) 20 (2007)

  2. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bach, F.: Consistency of the group Lasso and multiple kernel learning. J. Mach. Learn. Res. 9, 1179–1225 (2008)

    MathSciNet  MATH  Google Scholar 

  4. Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2002)

    MathSciNet  Google Scholar 

  5. Caponnetto, A., Micchelli, C.A., Pontil, M., Ying, Y.: Universal multi-task kernels. J. Mach. Learn. Res. 9, 1615–1646 (2008)

    MathSciNet  MATH  Google Scholar 

  6. Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46, 131–159 (2002)

    Article  MATH  Google Scholar 

  7. Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition pursuits. SIAM J. Sci. Comput. 20, 33–61 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  8. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. De Bie, T., Lanckriet, G.R.G., Cristianini, N.: Convex tuning of the soft margin parameter. Technical Report UCB/CSD-03-1289, EECS Department, University of California, Berkeley (2003)

  10. Fan, J., Gijbels, I.: Local Polynomial Modelling and its Applications. Chapman and Hall, London (1996)

    MATH  Google Scholar 

  11. Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  12. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  13. Hettich, R., Kortanek, K.O.: Semi-infinite programming: theory, methods, and applications. SIAM Rev. 3, 380–429 (1993)

    Article  MathSciNet  Google Scholar 

  14. Koltchinskii, V.I., Panchenko, D.: Rademacher processes and bounding the risk of function learning. In: Wellner, J., Gin, E., Mason, D. (eds.) High Dimensional Probability, vol. II, pp. 443–459 (2000)

  15. Lanckriet, G.R.G., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20, 2626–2653 (2004)

    Article  Google Scholar 

  16. Lanckriet, G.R.G., Cristianini, N., Bartlett, P.L., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)

    MATH  Google Scholar 

  17. Lemaréchal, C., Nemirovski, A., Nesterov. Y.: New variants of bundle methods. Math. Program. 69, 111–147 (1995)

    Article  MATH  Google Scholar 

  18. Lin, Y.: Support vector machines and the Bayes rule classification. Data Mining and Knowledge Discovery 6, 259–275 (2002)

    Article  MathSciNet  Google Scholar 

  19. Li, L., Yin, X.: Sliced inverse regression with regularizations. Biometrics 64, 124–131 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  20. Micchelli, C.A., Pontil, M.: Kernels for multi-task learning. Adv. Neural Inf. Process. Syst. (NIPS) 17 (2004)

  21. Micchelli, C.A., Pontil, M.: On learning vector-valued functions. Neural Comput. 17, 177–204 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  22. Micchelli, C.A., Pontil, M.: Learning the kernel function via regularization. J. Mach. Learn. Res. 6, 1099–1125 (2005)

    MathSciNet  MATH  Google Scholar 

  23. Mukherjee, S., Tamayo, P., Slonim, D., Verri, A., Golub, A.T., Messirov, J.P., Poggio, T.: Support vector machine classification of microarray data. AI memo 182. CBCL paper 182, MIT (1998)

  24. Mukherjee, S., Wu, Q.: Estimation of gradients and coordinate covariation in classification. J. Mach. Learn. Res. 7, 2481–2514 (2006)

    MathSciNet  MATH  Google Scholar 

  25. Mukherjee, S., Wu, Q., Zhou, D.X.: Learning gradients on manifolds. Bernoulli 16, 181–207 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  26. Mukherjee, S., Zhou, D.X.: Learning coordinate covariances via gradient. J. Mach. Learn. Res. 7, 519–549 (2006)

    MathSciNet  MATH  Google Scholar 

  27. Schölkopf, B., Smola. A.J.: Learning with Kernels. MIT, Cambridge (2002)

    Google Scholar 

  28. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  29. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., DAmico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 1, 203–209 (2002)

    Article  Google Scholar 

  30. Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7, 1531–1565 (2006)

    MathSciNet  MATH  Google Scholar 

  31. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc B. 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  32. Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. Adv. Neural Inf. Process. Syst. (NIPS) 13 (2001)

  33. Wu, Q., Ying, Y., Zhou, D.X.: Multi-kernel regularized classifiers. J. Complex. 23, 108–134 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  34. Ying, Y., Campbell, C.: Learning coordinate gradients with multi-task kernels. The 21st Annual Conference on Learning Theory (COLT) (2008)

  35. Ying, Y., Zhou, D.X.: Learnability of Gaussians with fexible variances. J. Mach. Learn. Res. 8, 249–276 (2007)

    MathSciNet  MATH  Google Scholar 

  36. Zhang, T.: On the dual formulation of regularized linear systems with convex risks. Mach. Learn. 46, 91–129 (2002)

    Article  MATH  Google Scholar 

  37. Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–85 (2004)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yiming Ying.

Additional information

Communicated by Yuesheng Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ying, Y., Wu, Q. & Campbell, C. Learning the coordinate gradients. Adv Comput Math 37, 355–378 (2012). https://doi.org/10.1007/s10444-011-9211-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10444-011-9211-6

Keywords

Mathematics Subject Classifications (2010)

Navigation