Low-Rank and Sparse Multi-task Learning

Chen, Jianhui; Zhou, Jiayu; Ye, Jieping

doi:10.1007/978-3-319-12000-3_8

Jianhui Chen²,
Jiayu Zhou³ &
Jieping Ye⁴

2080 Accesses
1 Citations

Abstract

Multi-task learning (MTL) aims to improve the overall generalization performance by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. Modeling the relationship of multiple tasks is critical to the practical performance of MTL. We propose to correlate multiple tasks using a low-rank representation and formulate our MTL approaches as mathematical optimization problems of minimizing the empirical loss regularized by the aforementioned low-rank structure and a separate sparse structure. For the proposed MTL approaches, we develop gradient based optimization algorithms to efficiently find their globally optimal solutions. We also conduct theoretical analysis on our MTL approaches, i.e., deriving performance bounds to evaluate how well the integration of low-rank and sparse representations can estimate multiple related tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

J. Abernethy, F. Bach, T. Evgeniou, J.P. Vert, A new approach to collaborative filtering: operator estimation with spectral regularization. J. Mach. Learn. Res. 10, 803–826 (2009)
MATH Google Scholar
R.K. Ando, BioCreative II gene mention tagging system at IBM Watson, in Proceedings of the Second BioCreative Challenge Evaluation Workshop (2007)
Google Scholar
R.K. Ando, T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)
MathSciNet MATH Google Scholar
A. Argyriou, T. Evgeniou, M. Pontil, Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
Article Google Scholar
B. Bakker, T. Heskes, Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4, 83–99 (2003)
Google Scholar
J. Baxter, A model of inductive bias learning. J. Artif. Intell. Res. 12, 149–198 (2000)
MathSciNet MATH Google Scholar
A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Article MathSciNet MATH Google Scholar
D.P. Bertsekas, A. Nedic, A.E. Ozdaglar, Convex Analysis and Optimization. Athena Scientific (April 2003)
Google Scholar
J. Bi, T. Xiong, S. Yu, M. Dundar, R.B. Rao, An improved multi-task learning approach with applications in medical diagnosis, in ECML/PKDD (2008)
Google Scholar
P.J. Bickel, Y. Ritov, A.B. Tsybakov, Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 37, 1705–1732 (2009)
Article MathSciNet MATH Google Scholar
S. Bickel, J. Bogojeska, T. Lengauer, T. Scheffer, Multi-task learning for HIV therapy screening, in ICML (2008)
Google Scholar
S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004)
Book MATH Google Scholar
J.F. Cai, E.J. Candes, Z. Shen, A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Article MathSciNet MATH Google Scholar
E.J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58(3), 1–37 (2011)
Article Google Scholar
R. Caruana, Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, A.S. Willsky, Sparse and low-rank matrix decompositions, in SYSID (2009)
Google Scholar
J. Chen, J. Liu, J. Ye, Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data 5(4), 22 (2012)
Article Google Scholar
J. Chen, L. Tang, J. Liu, J. Ye, A convex formulation for learning shared structures from multiple tasks, in ICML (2009)
Google Scholar
J. Chen, J. Zhou, J. Ye, Integrating low-rank and group-sparse structures for robust multi-task learning, in KDD (2011)
Google Scholar
T. Evgeniou, C.A. Micchelli, M. Pontil, Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615–637 (2005)
MathSciNet MATH Google Scholar
M. Fazel, H. Hindi, S. Boyd, A rank minimization heuristic with application to minimum order system approximation, in ACL (2001)
Google Scholar
G. Gene, V.L. Charles, Matrix Computations (Johns Hopkins University Press, Baltimore, 1996)
MATH Google Scholar
D. Goldfarb, S. Ma, Convergence of fixed point continuation algorithms for matrix rank minimization. Found. Comput. Math. 11(2), 183–210 (2011)
Article MathSciNet MATH Google Scholar
D. Hsu, S. Kakade, T. Zhang, Robust matrix decomposition with sparse corruptions. IEEE Trans. Inf. Theor. 57(11), 7221–7234 (2011)
Article MathSciNet Google Scholar
L. Jacob, F. Bach, J.P. Vert, Clustered multi-task learning: a convex formulation, in NIPS (2008)
Google Scholar
N.D. Lawrence, J.C. Platt, Learning to learn with the informative vector machine, in ICML (2004)
Google Scholar
J. Liu, S. Ji, J. Ye, Multi-task feature learning via efficient l2,1-norm minimization, in UAI, pp. 339–348 (2009)
Google Scholar
J. Liu, S. Ji, J. Ye, SLEP: with Efficient Projections (Arizona State University, Tempe, 2009). http://www.public.asu.edu/jye02/Software/SLEP
J. Liu, J. Ye, Efficient euclidean projections in linear time, in ICML (2009)
Google Scholar
K. Lounici, M. Pontil, A.B. Tsybakov, S. van de Geer, Taking advantage of sparsity in multi-task learning, in COLT (2008)
Google Scholar
A. Nemirovski, Efficient Methods in Convex Programming. Lecture Notes (1995)
Google Scholar
Y. Nesterov, Introductory Lectures on Convex Programming. Lecture Notes (1998)
Google Scholar
G. Obozinski, B. Taskar, M. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput. 37, 1871–1905 (2009)
Google Scholar
T.K. Pong, P. Tseng, S. Ji, J. Ye, Trace norm regularization: reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20(6), 3465–3489 (2010)
Article MathSciNet MATH Google Scholar
B. Recht, M. Fazel, P.A. Parrilo, Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Article MathSciNet MATH Google Scholar
A. Schwaighofer, V. Tresp, K. Yu, Learning gaussian process kernels via hierarchical bayes, in NIPS (2004)
Google Scholar
A. Shapiro, Weighted minimum trace factor analysis. Psychometrika 47, 243–264 (1982)
Article MathSciNet MATH Google Scholar
S. Si, D. Tao, B. Geng, Bregman divergence-based regularization for transfer subspace learning. IEEE Trans. Knowl. Data Eng. 22, 929–942 (2010)
Article Google Scholar
J.F. Sturm, Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)
Article MathSciNet Google Scholar
R. Tibshirani, Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B 58, 267–288 (1994)
MathSciNet Google Scholar
L. Vandenberghe, S. Boyd, Semidefinite programming. SIAM Rev. 38(1), 49–95 (1996)
Article MathSciNet MATH Google Scholar
D.L. Wallace, Bounds on normal approximations to student’s and the chi-square distributions. Ann. Math. Stat. 30(4), 1121–1130 (1959)
Article MATH Google Scholar
G.A. Watson, Characterization of the subdifferential of some matrix norms. Linear Algebra Appl. 170, 33–45 (1992)
Article MathSciNet MATH Google Scholar
J. Wright, A. Ganesh, S. Rao, Y. Ma, Robust principal component analysis: exact recovery of corrupted low-rank matrices by convex optimization, in NIPS (2009)
Google Scholar
H. Xu, C. Caramanis, S. Sanghavi, Robust pca via outlier pursuit, in NIPS (2010)
Google Scholar
Y. Xue, X. Liao, L. Carin, B. Krishnapuram, Multi-task learning for classification with dirichlet process priors. J. Mach. Learn. Res. 8, 35–63 (2007)
MathSciNet MATH Google Scholar
K. Yu, V. Tresp, A. Schwaighofer, Learning gaussian processes from multiple tasks, in ICML (2005)
Google Scholar
J. Zhang, Z. Ghahramani, Y. Yang, Learning multiple related tasks using atent independent component analysis, in NIPS (2005)
Google Scholar
J. Zhou, J. Chen, J. Ye, Clustered multi-task learning via alternating structure optimization, in NIPS (2011)
Google Scholar
J. Zhou, J. Chen, J. Ye, Multi-Task Learning via Structural Regularization. Arizona State University, Tempe (2012). http://www.public.asu.edu/jye02/Software/MALSAR
J. Zhou, J. Liu, V.A. Narayan, J. Ye, Modeling disease progression via multi-task learning. NeuroImage 78, 233–248 (2013)
Article Google Scholar
J. Zhou, L. Yuan, J. Liu, J. Ye, A multi-task learning formulation for predicting disease progression, in KDD (2011)
Google Scholar

Download references

Acknowledgments

Part of this chapter is reprinted with permission from “Chen, J., Liu, J., Ye, J., Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks, ACM Transactions on Knowledge Discovery from Data, Vol. 5:4, © 2012 Association for Computing Machinery, Inc., http://doi.acm.org/10.1145/2086737.2086742” [17] and “Chen, J., Zhou, J., Ye, J., Integrating Low-Rank and Group-Sparse Structures for Robust Multi-task Learning, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Pages 42–50, © 2011 Association for Computing Machinery, Inc., http://doi.acm.org/10.1145/2020408.2020423” [19].

Author information

Authors and Affiliations

GE Global Research, San Ramon, CA, 94583, USA
Jianhui Chen
Samsung Research America, San Jose, CA, 95134, USA
Jiayu Zhou
Arizona State University, Tempe, AZ, 85281, USA
Jieping Ye

Authors

Jianhui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiayu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jieping Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianhui Chen .

Editor information

Editors and Affiliations

Northeastern University, Boston, Massachusetts, USA
Yun Fu

Appendix

Lemma 1

Let $\delta _1, \delta _2, \ldots , \delta _n$ be a random sample of size $n$ from the Gaussian distribution $\fancyscript{N} (0, \sigma )$. Let $x_1, x_2, \ldots , x_n$ satisfy $x_1^2 + x_2^2 + \cdots + x_n^2 = 1$. Denote a random variable $v$ as

$$\begin{aligned} v = \frac{1}{\sigma } \sum _{i=1}^n x_i \delta _{i}. \end{aligned}$$

Then $v$ obeys the Gaussian distribution $\fancyscript{N} (0, 1)$.

Proof

Since $\{ \delta _i \}$ are mutually independent, the mean of the random variable $v$ can be computed as

$$\begin{aligned} \mathbb {E} (v) = \mathbb {E} \left( \frac{1}{\sigma } \sum _{i=1}^n x_i \delta _{i} \right) = \frac{1}{\sigma } \sum _{i=1}^n x_i \mathbb {E} \left( \delta _{i} \right) = 0. \end{aligned}$$

Similarly, the variance of $v$ can be computed

$$\begin{aligned} \mathbb {E} \left( v - \mathbb {E} (v) \right) ^2 = \mathbb {E} \left( \frac{1}{\sigma ^2} \sum _{i=1}^n x_{i}^2 \delta _{i}^2 \right) = \frac{1}{\sigma ^2} \sum _{i=1}^n x_i^2 \mathbb {E} \left( \delta _{i}^2 \right) = 1, \end{aligned}$$

where the first equality follows from $\mathbb {E} \left( \delta _i \delta _j \right) = 0~(i \ne j)$. Using the fact that the sum of Gaussian random variables is Gaussian distributed, we complete the proof of this lemma.

Lemma 2

Let $\fancyscript{X}_p^2$ be a chi-squared random variable with $p$ degrees of freedom. Then

$$\begin{aligned} \Pr \left( \fancyscript{X}_p^2 \ge p + \pi \right) \le \exp \left( - \frac{1}{2} \left( \pi - p \log \left( 1 + \frac{\pi }{p} \right) \right) \right) , \pi > 0. \end{aligned}$$

Proof

From Theorem $4.1$ in [42], we approximate the chi-square distribution using a normal distribution as

$$\begin{aligned} \Pr \left( \fancyscript{X}_p^2 \ge q \right) \le \Pr \left( \fancyscript{N}_{0,1} \ge z_p(q) \right) , \, q > p, \end{aligned}$$

where $\fancyscript{N}_{0,1} \sim \fancyscript{N} (0, 1)$ and $z_p(q) = \sqrt{q - p - p \log \left( \frac{q}{p} \right) }$. It is known that for $x \sim \fancyscript{N} (0, 1)$, the inequality $\Pr \left( x \ge t \right) \le \exp (- \frac{t^2}{2})$ holds. Therefore we have

$$\begin{aligned} \Pr \left( \fancyscript{X}_p^2 \ge q \right) \le \exp \left( - \frac{1}{2} z_p^2(q) \right) . \end{aligned}$$

By substituting $q = p + \pi ~(\pi > 0)$ into the inequality above, we complete the proof of this lemma.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, J., Zhou, J., Ye, J. (2014). Low-Rank and Sparse Multi-task Learning. In: Fu, Y. (eds) Low-Rank and Sparse Modeling for Visual Analysis. Springer, Cham. https://doi.org/10.1007/978-3-319-12000-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-12000-3_8
Published: 30 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11999-1
Online ISBN: 978-3-319-12000-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Low-Rank and Sparse Multi-task Learning

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Lemma 1

Proof

Lemma 2

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation