Abstract
In this paper we study the problem of learning the gradient function with application to variable selection and determining variable covariation. Firstly, we propose a novel unifying framework for coordinate gradient learning from the perspective of multi-task learning. Various variable selection methods can be regarded as special instances of this framework. Secondly, we formulate the dual problems of gradient learning with general loss functions. This enables the direct application of standard optimization toolboxes to the case of gradient learning. For instance, gradient learning with SVM loss can be solved by quadratic programming (QP) routines. Thirdly, we propose a novel gradient learning formulation which can be cast as a learning the kernel matrix problem. Its relation with sparse regularization is highlighted. A semi-infinite linear programming (SILP) approach and an iterative optimization approach are proposed to efficiently solve this problem. Finally, we validate our proposed approaches on both synthetic and real datasets.
Similar content being viewed by others
References
Argyriou, A., Micchelli, C.A., Pontil, M., Ying, Y.: A spectral regularization framework for multi-task structure learning. Adv. Neural Inf. Process. Syst. (NIPS) 20 (2007)
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)
Bach, F.: Consistency of the group Lasso and multiple kernel learning. J. Mach. Learn. Res. 9, 1179–1225 (2008)
Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2002)
Caponnetto, A., Micchelli, C.A., Pontil, M., Ying, Y.: Universal multi-task kernels. J. Mach. Learn. Res. 9, 1615–1646 (2008)
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46, 131–159 (2002)
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition pursuits. SIAM J. Sci. Comput. 20, 33–61 (1999)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
De Bie, T., Lanckriet, G.R.G., Cristianini, N.: Convex tuning of the soft margin parameter. Technical Report UCB/CSD-03-1289, EECS Department, University of California, Berkeley (2003)
Fan, J., Gijbels, I.: Local Polynomial Modelling and its Applications. Chapman and Hall, London (1996)
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Hettich, R., Kortanek, K.O.: Semi-infinite programming: theory, methods, and applications. SIAM Rev. 3, 380–429 (1993)
Koltchinskii, V.I., Panchenko, D.: Rademacher processes and bounding the risk of function learning. In: Wellner, J., Gin, E., Mason, D. (eds.) High Dimensional Probability, vol. II, pp. 443–459 (2000)
Lanckriet, G.R.G., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20, 2626–2653 (2004)
Lanckriet, G.R.G., Cristianini, N., Bartlett, P.L., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)
Lemaréchal, C., Nemirovski, A., Nesterov. Y.: New variants of bundle methods. Math. Program. 69, 111–147 (1995)
Lin, Y.: Support vector machines and the Bayes rule classification. Data Mining and Knowledge Discovery 6, 259–275 (2002)
Li, L., Yin, X.: Sliced inverse regression with regularizations. Biometrics 64, 124–131 (2008)
Micchelli, C.A., Pontil, M.: Kernels for multi-task learning. Adv. Neural Inf. Process. Syst. (NIPS) 17 (2004)
Micchelli, C.A., Pontil, M.: On learning vector-valued functions. Neural Comput. 17, 177–204 (2005)
Micchelli, C.A., Pontil, M.: Learning the kernel function via regularization. J. Mach. Learn. Res. 6, 1099–1125 (2005)
Mukherjee, S., Tamayo, P., Slonim, D., Verri, A., Golub, A.T., Messirov, J.P., Poggio, T.: Support vector machine classification of microarray data. AI memo 182. CBCL paper 182, MIT (1998)
Mukherjee, S., Wu, Q.: Estimation of gradients and coordinate covariation in classification. J. Mach. Learn. Res. 7, 2481–2514 (2006)
Mukherjee, S., Wu, Q., Zhou, D.X.: Learning gradients on manifolds. Bernoulli 16, 181–207 (2010)
Mukherjee, S., Zhou, D.X.: Learning coordinate covariances via gradient. J. Mach. Learn. Res. 7, 519–549 (2006)
Schölkopf, B., Smola. A.J.: Learning with Kernels. MIT, Cambridge (2002)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., DAmico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 1, 203–209 (2002)
Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7, 1531–1565 (2006)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc B. 58, 267–288 (1996)
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. Adv. Neural Inf. Process. Syst. (NIPS) 13 (2001)
Wu, Q., Ying, Y., Zhou, D.X.: Multi-kernel regularized classifiers. J. Complex. 23, 108–134 (2007)
Ying, Y., Campbell, C.: Learning coordinate gradients with multi-task kernels. The 21st Annual Conference on Learning Theory (COLT) (2008)
Ying, Y., Zhou, D.X.: Learnability of Gaussians with fexible variances. J. Mach. Learn. Res. 8, 249–276 (2007)
Zhang, T.: On the dual formulation of regularized linear systems with convex risks. Mach. Learn. 46, 91–129 (2002)
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–85 (2004)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Yuesheng Xu.
Rights and permissions
About this article
Cite this article
Ying, Y., Wu, Q. & Campbell, C. Learning the coordinate gradients. Adv Comput Math 37, 355–378 (2012). https://doi.org/10.1007/s10444-011-9211-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10444-011-9211-6
Keywords
- Learning the gradient
- Multi-task kernel
- Feature selection
- Sparse regularization
- Learning the kernel matrix