Skip to main content

Low-Rank and Sparse Multi-task Learning

  • Chapter
  • First Online:
Low-Rank and Sparse Modeling for Visual Analysis

Abstract

Multi-task learning (MTL) aims to improve the overall generalization performance by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. Modeling the relationship of multiple tasks is critical to the practical performance of MTL. We propose to correlate multiple tasks using a low-rank representation and formulate our MTL approaches as mathematical optimization problems of minimizing the empirical loss regularized by the aforementioned low-rank structure and a separate sparse structure. For the proposed MTL approaches, we develop gradient based optimization algorithms to efficiently find their globally optimal solutions. We also conduct theoretical analysis on our MTL approaches, i.e., deriving performance bounds to evaluate how well the integration of low-rank and sparse representations can estimate multiple related tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. J. Abernethy, F. Bach, T. Evgeniou, J.P. Vert, A new approach to collaborative filtering: operator estimation with spectral regularization. J. Mach. Learn. Res. 10, 803–826 (2009)

    MATH  Google Scholar 

  2. R.K. Ando, BioCreative II gene mention tagging system at IBM Watson, in Proceedings of the Second BioCreative Challenge Evaluation Workshop (2007)

    Google Scholar 

  3. R.K. Ando, T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)

    MathSciNet  MATH  Google Scholar 

  4. A. Argyriou, T. Evgeniou, M. Pontil, Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)

    Article  Google Scholar 

  5. B. Bakker, T. Heskes, Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4, 83–99 (2003)

    Google Scholar 

  6. J. Baxter, A model of inductive bias learning. J. Artif. Intell. Res. 12, 149–198 (2000)

    MathSciNet  MATH  Google Scholar 

  7. A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  8. D.P. Bertsekas, A. Nedic, A.E. Ozdaglar, Convex Analysis and Optimization. Athena Scientific (April 2003)

    Google Scholar 

  9. J. Bi, T. Xiong, S. Yu, M. Dundar, R.B. Rao, An improved multi-task learning approach with applications in medical diagnosis, in ECML/PKDD (2008)

    Google Scholar 

  10. P.J. Bickel, Y. Ritov, A.B. Tsybakov, Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 37, 1705–1732 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  11. S. Bickel, J. Bogojeska, T. Lengauer, T. Scheffer, Multi-task learning for HIV therapy screening, in ICML (2008)

    Google Scholar 

  12. S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004)

    Book  MATH  Google Scholar 

  13. J.F. Cai, E.J. Candes, Z. Shen, A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  14. E.J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58(3), 1–37 (2011)

    Article  Google Scholar 

  15. R. Caruana, Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

    Article  MathSciNet  Google Scholar 

  16. V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, A.S. Willsky, Sparse and low-rank matrix decompositions, in SYSID (2009)

    Google Scholar 

  17. J. Chen, J. Liu, J. Ye, Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data 5(4), 22 (2012)

    Article  Google Scholar 

  18. J. Chen, L. Tang, J. Liu, J. Ye, A convex formulation for learning shared structures from multiple tasks, in ICML (2009)

    Google Scholar 

  19. J. Chen, J. Zhou, J. Ye, Integrating low-rank and group-sparse structures for robust multi-task learning, in KDD (2011)

    Google Scholar 

  20. T. Evgeniou, C.A. Micchelli, M. Pontil, Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615–637 (2005)

    MathSciNet  MATH  Google Scholar 

  21. M. Fazel, H. Hindi, S. Boyd, A rank minimization heuristic with application to minimum order system approximation, in ACL (2001)

    Google Scholar 

  22. G. Gene, V.L. Charles, Matrix Computations (Johns Hopkins University Press, Baltimore, 1996)

    MATH  Google Scholar 

  23. D. Goldfarb, S. Ma, Convergence of fixed point continuation algorithms for matrix rank minimization. Found. Comput. Math. 11(2), 183–210 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  24. D. Hsu, S. Kakade, T. Zhang, Robust matrix decomposition with sparse corruptions. IEEE Trans. Inf. Theor. 57(11), 7221–7234 (2011)

    Article  MathSciNet  Google Scholar 

  25. L. Jacob, F. Bach, J.P. Vert, Clustered multi-task learning: a convex formulation, in NIPS (2008)

    Google Scholar 

  26. N.D. Lawrence, J.C. Platt, Learning to learn with the informative vector machine, in ICML (2004)

    Google Scholar 

  27. J. Liu, S. Ji, J. Ye, Multi-task feature learning via efficient l2,1-norm minimization, in UAI, pp. 339–348 (2009)

    Google Scholar 

  28. J. Liu, S. Ji, J. Ye, SLEP: with Efficient Projections (Arizona State University, Tempe, 2009). http://www.public.asu.edu/jye02/Software/SLEP

  29. J. Liu, J. Ye, Efficient euclidean projections in linear time, in ICML (2009)

    Google Scholar 

  30. K. Lounici, M. Pontil, A.B. Tsybakov, S. van de Geer, Taking advantage of sparsity in multi-task learning, in COLT (2008)

    Google Scholar 

  31. A. Nemirovski, Efficient Methods in Convex Programming. Lecture Notes (1995)

    Google Scholar 

  32. Y. Nesterov, Introductory Lectures on Convex Programming. Lecture Notes (1998)

    Google Scholar 

  33. G. Obozinski, B. Taskar, M. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput. 37, 1871–1905 (2009)

    Google Scholar 

  34. T.K. Pong, P. Tseng, S. Ji, J. Ye, Trace norm regularization: reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20(6), 3465–3489 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  35. B. Recht, M. Fazel, P.A. Parrilo, Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  36. A. Schwaighofer, V. Tresp, K. Yu, Learning gaussian process kernels via hierarchical bayes, in NIPS (2004)

    Google Scholar 

  37. A. Shapiro, Weighted minimum trace factor analysis. Psychometrika 47, 243–264 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  38. S. Si, D. Tao, B. Geng, Bregman divergence-based regularization for transfer subspace learning. IEEE Trans. Knowl. Data Eng. 22, 929–942 (2010)

    Article  Google Scholar 

  39. J.F. Sturm, Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)

    Article  MathSciNet  Google Scholar 

  40. R. Tibshirani, Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B 58, 267–288 (1994)

    MathSciNet  Google Scholar 

  41. L. Vandenberghe, S. Boyd, Semidefinite programming. SIAM Rev. 38(1), 49–95 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  42. D.L. Wallace, Bounds on normal approximations to student’s and the chi-square distributions. Ann. Math. Stat. 30(4), 1121–1130 (1959)

    Article  MATH  Google Scholar 

  43. G.A. Watson, Characterization of the subdifferential of some matrix norms. Linear Algebra Appl. 170, 33–45 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  44. J. Wright, A. Ganesh, S. Rao, Y. Ma, Robust principal component analysis: exact recovery of corrupted low-rank matrices by convex optimization, in NIPS (2009)

    Google Scholar 

  45. H. Xu, C. Caramanis, S. Sanghavi, Robust pca via outlier pursuit, in NIPS (2010)

    Google Scholar 

  46. Y. Xue, X. Liao, L. Carin, B. Krishnapuram, Multi-task learning for classification with dirichlet process priors. J. Mach. Learn. Res. 8, 35–63 (2007)

    MathSciNet  MATH  Google Scholar 

  47. K. Yu, V. Tresp, A. Schwaighofer, Learning gaussian processes from multiple tasks, in ICML (2005)

    Google Scholar 

  48. J. Zhang, Z. Ghahramani, Y. Yang, Learning multiple related tasks using atent independent component analysis, in NIPS (2005)

    Google Scholar 

  49. J. Zhou, J. Chen, J. Ye, Clustered multi-task learning via alternating structure optimization, in NIPS (2011)

    Google Scholar 

  50. J. Zhou, J. Chen, J. Ye, Multi-Task Learning via Structural Regularization. Arizona State University, Tempe (2012). http://www.public.asu.edu/jye02/Software/MALSAR

  51. J. Zhou, J. Liu, V.A. Narayan, J. Ye, Modeling disease progression via multi-task learning. NeuroImage 78, 233–248 (2013)

    Article  Google Scholar 

  52. J. Zhou, L. Yuan, J. Liu, J. Ye, A multi-task learning formulation for predicting disease progression, in KDD (2011)

    Google Scholar 

Download references

Acknowledgments

Part of this chapter is reprinted with permission from “Chen, J., Liu, J., Ye, J., Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks, ACM Transactions on Knowledge Discovery from Data, Vol. 5:4, © 2012 Association for Computing Machinery, Inc., http://doi.acm.org/10.1145/2086737.2086742” [17] and “Chen, J., Zhou, J., Ye, J., Integrating Low-Rank and Group-Sparse Structures for Robust Multi-task Learning, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Pages 42–50, © 2011 Association for Computing Machinery, Inc., http://doi.acm.org/10.1145/2020408.2020423” [19].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianhui Chen .

Editor information

Editors and Affiliations

Appendix

Appendix

Lemma 1

Let \(\delta _1, \delta _2, \ldots , \delta _n\) be a random sample of size \(n\) from the Gaussian distribution \(\fancyscript{N} (0, \sigma )\). Let \(x_1, x_2, \ldots , x_n\) satisfy \(x_1^2 + x_2^2 + \cdots + x_n^2 = 1\). Denote a random variable \(v\) as

$$\begin{aligned} v = \frac{1}{\sigma } \sum _{i=1}^n x_i \delta _{i}. \end{aligned}$$

Then \(v\) obeys the Gaussian distribution \(\fancyscript{N} (0, 1)\).

Proof

Since \(\{ \delta _i \}\) are mutually independent, the mean of the random variable \(v\) can be computed as

$$\begin{aligned} \mathbb {E} (v) = \mathbb {E} \left( \frac{1}{\sigma } \sum _{i=1}^n x_i \delta _{i} \right) = \frac{1}{\sigma } \sum _{i=1}^n x_i \mathbb {E} \left( \delta _{i} \right) = 0. \end{aligned}$$

Similarly, the variance of \(v\) can be computed

$$\begin{aligned} \mathbb {E} \left( v - \mathbb {E} (v) \right) ^2 = \mathbb {E} \left( \frac{1}{\sigma ^2} \sum _{i=1}^n x_{i}^2 \delta _{i}^2 \right) = \frac{1}{\sigma ^2} \sum _{i=1}^n x_i^2 \mathbb {E} \left( \delta _{i}^2 \right) = 1, \end{aligned}$$

where the first equality follows from \(\mathbb {E} \left( \delta _i \delta _j \right) = 0~(i \ne j)\). Using the fact that the sum of Gaussian random variables is Gaussian distributed, we complete the proof of this lemma.

Lemma 2

Let \(\fancyscript{X}_p^2\) be a chi-squared random variable with \(p\) degrees of freedom. Then

$$\begin{aligned} \Pr \left( \fancyscript{X}_p^2 \ge p + \pi \right) \le \exp \left( - \frac{1}{2} \left( \pi - p \log \left( 1 + \frac{\pi }{p} \right) \right) \right) , \pi > 0. \end{aligned}$$

Proof

From Theorem \(4.1\) in [42], we approximate the chi-square distribution using a normal distribution as

$$\begin{aligned} \Pr \left( \fancyscript{X}_p^2 \ge q \right) \le \Pr \left( \fancyscript{N}_{0,1} \ge z_p(q) \right) , \, q > p, \end{aligned}$$

where \(\fancyscript{N}_{0,1} \sim \fancyscript{N} (0, 1)\) and \(z_p(q) = \sqrt{q - p - p \log \left( \frac{q}{p} \right) }\). It is known that for \(x \sim \fancyscript{N} (0, 1)\), the inequality \(\Pr \left( x \ge t \right) \le \exp (- \frac{t^2}{2})\) holds. Therefore we have

$$\begin{aligned} \Pr \left( \fancyscript{X}_p^2 \ge q \right) \le \exp \left( - \frac{1}{2} z_p^2(q) \right) . \end{aligned}$$

By substituting \(q = p + \pi ~(\pi > 0)\) into the inequality above, we complete the proof of this lemma.

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Chen, J., Zhou, J., Ye, J. (2014). Low-Rank and Sparse Multi-task Learning. In: Fu, Y. (eds) Low-Rank and Sparse Modeling for Visual Analysis. Springer, Cham. https://doi.org/10.1007/978-3-319-12000-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12000-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11999-1

  • Online ISBN: 978-3-319-12000-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics