Abstract
Multi-task learning (MTL) aims to improve the overall generalization performance by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. Modeling the relationship of multiple tasks is critical to the practical performance of MTL. We propose to correlate multiple tasks using a low-rank representation and formulate our MTL approaches as mathematical optimization problems of minimizing the empirical loss regularized by the aforementioned low-rank structure and a separate sparse structure. For the proposed MTL approaches, we develop gradient based optimization algorithms to efficiently find their globally optimal solutions. We also conduct theoretical analysis on our MTL approaches, i.e., deriving performance bounds to evaluate how well the integration of low-rank and sparse representations can estimate multiple related tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J. Abernethy, F. Bach, T. Evgeniou, J.P. Vert, A new approach to collaborative filtering: operator estimation with spectral regularization. J. Mach. Learn. Res. 10, 803–826 (2009)
R.K. Ando, BioCreative II gene mention tagging system at IBM Watson, in Proceedings of the Second BioCreative Challenge Evaluation Workshop (2007)
R.K. Ando, T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)
A. Argyriou, T. Evgeniou, M. Pontil, Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
B. Bakker, T. Heskes, Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4, 83–99 (2003)
J. Baxter, A model of inductive bias learning. J. Artif. Intell. Res. 12, 149–198 (2000)
A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
D.P. Bertsekas, A. Nedic, A.E. Ozdaglar, Convex Analysis and Optimization. Athena Scientific (April 2003)
J. Bi, T. Xiong, S. Yu, M. Dundar, R.B. Rao, An improved multi-task learning approach with applications in medical diagnosis, in ECML/PKDD (2008)
P.J. Bickel, Y. Ritov, A.B. Tsybakov, Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 37, 1705–1732 (2009)
S. Bickel, J. Bogojeska, T. Lengauer, T. Scheffer, Multi-task learning for HIV therapy screening, in ICML (2008)
S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004)
J.F. Cai, E.J. Candes, Z. Shen, A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
E.J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58(3), 1–37 (2011)
R. Caruana, Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, A.S. Willsky, Sparse and low-rank matrix decompositions, in SYSID (2009)
J. Chen, J. Liu, J. Ye, Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data 5(4), 22 (2012)
J. Chen, L. Tang, J. Liu, J. Ye, A convex formulation for learning shared structures from multiple tasks, in ICML (2009)
J. Chen, J. Zhou, J. Ye, Integrating low-rank and group-sparse structures for robust multi-task learning, in KDD (2011)
T. Evgeniou, C.A. Micchelli, M. Pontil, Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615–637 (2005)
M. Fazel, H. Hindi, S. Boyd, A rank minimization heuristic with application to minimum order system approximation, in ACL (2001)
G. Gene, V.L. Charles, Matrix Computations (Johns Hopkins University Press, Baltimore, 1996)
D. Goldfarb, S. Ma, Convergence of fixed point continuation algorithms for matrix rank minimization. Found. Comput. Math. 11(2), 183–210 (2011)
D. Hsu, S. Kakade, T. Zhang, Robust matrix decomposition with sparse corruptions. IEEE Trans. Inf. Theor. 57(11), 7221–7234 (2011)
L. Jacob, F. Bach, J.P. Vert, Clustered multi-task learning: a convex formulation, in NIPS (2008)
N.D. Lawrence, J.C. Platt, Learning to learn with the informative vector machine, in ICML (2004)
J. Liu, S. Ji, J. Ye, Multi-task feature learning via efficient l2,1-norm minimization, in UAI, pp. 339–348 (2009)
J. Liu, S. Ji, J. Ye, SLEP: with Efficient Projections (Arizona State University, Tempe, 2009). http://www.public.asu.edu/jye02/Software/SLEP
J. Liu, J. Ye, Efficient euclidean projections in linear time, in ICML (2009)
K. Lounici, M. Pontil, A.B. Tsybakov, S. van de Geer, Taking advantage of sparsity in multi-task learning, in COLT (2008)
A. Nemirovski, Efficient Methods in Convex Programming. Lecture Notes (1995)
Y. Nesterov, Introductory Lectures on Convex Programming. Lecture Notes (1998)
G. Obozinski, B. Taskar, M. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput. 37, 1871–1905 (2009)
T.K. Pong, P. Tseng, S. Ji, J. Ye, Trace norm regularization: reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20(6), 3465–3489 (2010)
B. Recht, M. Fazel, P.A. Parrilo, Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
A. Schwaighofer, V. Tresp, K. Yu, Learning gaussian process kernels via hierarchical bayes, in NIPS (2004)
A. Shapiro, Weighted minimum trace factor analysis. Psychometrika 47, 243–264 (1982)
S. Si, D. Tao, B. Geng, Bregman divergence-based regularization for transfer subspace learning. IEEE Trans. Knowl. Data Eng. 22, 929–942 (2010)
J.F. Sturm, Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)
R. Tibshirani, Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B 58, 267–288 (1994)
L. Vandenberghe, S. Boyd, Semidefinite programming. SIAM Rev. 38(1), 49–95 (1996)
D.L. Wallace, Bounds on normal approximations to student’s and the chi-square distributions. Ann. Math. Stat. 30(4), 1121–1130 (1959)
G.A. Watson, Characterization of the subdifferential of some matrix norms. Linear Algebra Appl. 170, 33–45 (1992)
J. Wright, A. Ganesh, S. Rao, Y. Ma, Robust principal component analysis: exact recovery of corrupted low-rank matrices by convex optimization, in NIPS (2009)
H. Xu, C. Caramanis, S. Sanghavi, Robust pca via outlier pursuit, in NIPS (2010)
Y. Xue, X. Liao, L. Carin, B. Krishnapuram, Multi-task learning for classification with dirichlet process priors. J. Mach. Learn. Res. 8, 35–63 (2007)
K. Yu, V. Tresp, A. Schwaighofer, Learning gaussian processes from multiple tasks, in ICML (2005)
J. Zhang, Z. Ghahramani, Y. Yang, Learning multiple related tasks using atent independent component analysis, in NIPS (2005)
J. Zhou, J. Chen, J. Ye, Clustered multi-task learning via alternating structure optimization, in NIPS (2011)
J. Zhou, J. Chen, J. Ye, Multi-Task Learning via Structural Regularization. Arizona State University, Tempe (2012). http://www.public.asu.edu/jye02/Software/MALSAR
J. Zhou, J. Liu, V.A. Narayan, J. Ye, Modeling disease progression via multi-task learning. NeuroImage 78, 233–248 (2013)
J. Zhou, L. Yuan, J. Liu, J. Ye, A multi-task learning formulation for predicting disease progression, in KDD (2011)
Acknowledgments
Part of this chapter is reprinted with permission from “Chen, J., Liu, J., Ye, J., Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks, ACM Transactions on Knowledge Discovery from Data, Vol. 5:4, © 2012 Association for Computing Machinery, Inc., http://doi.acm.org/10.1145/2086737.2086742” [17] and “Chen, J., Zhou, J., Ye, J., Integrating Low-Rank and Group-Sparse Structures for Robust Multi-task Learning, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Pages 42–50, © 2011 Association for Computing Machinery, Inc., http://doi.acm.org/10.1145/2020408.2020423” [19].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Lemma 1
Let \(\delta _1, \delta _2, \ldots , \delta _n\) be a random sample of size \(n\) from the Gaussian distribution \(\fancyscript{N} (0, \sigma )\). Let \(x_1, x_2, \ldots , x_n\) satisfy \(x_1^2 + x_2^2 + \cdots + x_n^2 = 1\). Denote a random variable \(v\) as
Then \(v\) obeys the Gaussian distribution \(\fancyscript{N} (0, 1)\).
Proof
Since \(\{ \delta _i \}\) are mutually independent, the mean of the random variable \(v\) can be computed as
Similarly, the variance of \(v\) can be computed
where the first equality follows from \(\mathbb {E} \left( \delta _i \delta _j \right) = 0~(i \ne j)\). Using the fact that the sum of Gaussian random variables is Gaussian distributed, we complete the proof of this lemma.
Lemma 2
Let \(\fancyscript{X}_p^2\) be a chi-squared random variable with \(p\) degrees of freedom. Then
Proof
From Theorem \(4.1\) in [42], we approximate the chi-square distribution using a normal distribution as
where \(\fancyscript{N}_{0,1} \sim \fancyscript{N} (0, 1)\) and \(z_p(q) = \sqrt{q - p - p \log \left( \frac{q}{p} \right) }\). It is known that for \(x \sim \fancyscript{N} (0, 1)\), the inequality \(\Pr \left( x \ge t \right) \le \exp (- \frac{t^2}{2})\) holds. Therefore we have
By substituting \(q = p + \pi ~(\pi > 0)\) into the inequality above, we complete the proof of this lemma.
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Chen, J., Zhou, J., Ye, J. (2014). Low-Rank and Sparse Multi-task Learning. In: Fu, Y. (eds) Low-Rank and Sparse Modeling for Visual Analysis. Springer, Cham. https://doi.org/10.1007/978-3-319-12000-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-12000-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11999-1
Online ISBN: 978-3-319-12000-3
eBook Packages: Computer ScienceComputer Science (R0)