Abstract
Multi-task learning (MTL), which optimizes multiple related learning tasks at the same time, has been widely used in various applications, including natural language processing, speech recognition, computer vision, multimedia data processing, biomedical imaging, socio-biological data analysis, multi-modality data analysis, etc. MTL sometimes is also referred to as joint learning, and is closely related to other machine learning subfields like multi-class learning, transfer learning, and learning with auxiliary tasks, to name a few. In this paper, we provide a brief review on this topic, discuss the motivation behind this machine learning method, compare various MTL algorithms, review MTL methods for incomplete data, and discuss its application in deep learning. We aim to provide the readers with a simple way to understand MTL without too many complicated equations, and to help the readers to apply MTL in their applications.
Similar content being viewed by others
References
Agarwal A, Gerber S, Daume H (2010) Learning multiple tasks using manifold regularization. In: Advances in neural information processing systems. pp 46–54
Ahmed B, Thesen T, Blackmon K, Kuzniecky R, Devinsky O, Dy J, Brodley C (2016) Multi-task learning with weak class labels: leveraging ieeg to detect cortical lesions in cryptogenic epilepsy. In: Machine learning for healthcare conference. pp 115–133
Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6(Nov):1817–1853
Argyriou A (2015) Machine learning software. http://ttic.uchicago.edu/~argyriou/code/
Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems. vol 19, pp 41–48. MIT press
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Argyriou A, Micchelli CA, Pontil M, Ying Y (2008) A spectral regularization framework for multi-task structure learning, nips 20 Journal Publications on Mathematics (Harmonic Analysis)
Caruana R (1998) Multitask learning. In: Learning to learn, pp 95–133. Springer
Chaichulee S, Villarroel M, Jorge J, Arteta C, Green G, McCormick K, Zisserman A, Tarassenko L (2017) Multi-task convolutional neural network for patient detection and skin segmentation in continuous non-contact vital sign monitoring. In: 2017 12th IEEE International conference on automatic face & gesture recognition (FG 2017). p 5110
Chen J, Liu J, Ye J (2012) Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans Knowl Discov Data 5(4):22:1–22
Chen J, Tang L, Liu J, Ye J (2009) A convex formulation for learning shared structures from multiple tasks. In: Proceedings of the 26th Annual International Conference on Machine Learning. pp 137–144. ACM
Chen J, Zhou J, Ye J (2011) Integrating low-rank and group-sparse structures for robust multi-task learning. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. pp 42–50. ACM
Ciliberto C (2017) Matmtl. https://github.com/cciliber/matMTL
Ciliberto C, Mroueh Y, Poggio T (2015) Convex learning of multiple tasks and their structure. In: International conference on machine learning (ICML)
Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning. pp 160–167. ACM
Crichton G, Pyysalo S (2017) Code supporting: a neural network multi- task learning approach to biomedical named entity recognition. software, https://doi.org/10.17863/CAM.12584
Elgammal A, Lee CS (2004) Separating style and content on a nonlinear manifold. In: Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. vol 1, pp I–I. IEEE
Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6(Apr):615–637
Evgeniou T, Pontil M (2004) Regularized multi–task learning. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. pp 109–117. ACM
Fan J, Zhao T, Kuang Z, Zheng Y, Zhang J, Yu J, Peng J (2017) HD-MTL: hierarchical deep multi-task learning for large-scale visual recognition. IEEE Trans Image Process 26(4):1923–1938
Fang Y, Ma Z, Zhang Z, Zhang XY, Bai X (2017) Dynamic multi-task learning with convolutional neural network. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. pp 1668–1674. https://doi.org/10.24963/ijcai.2017/231
Fazel M (2002) Matrix rank minimization with applications. Ph.D. thesis, Department of Electrical Engineering Stanford University
Ghafoorian M, Karssemeijer N, Heskes T, van Uden IWM, Sanchez CI, Litjens G, de Leeuw FE, van Ginneken B, Marchiori E, Platel B (2017) Location sensitive deep convolutional neural networks for segmentation of white matter hyperintensities. Scientific Reports 7(1):5110. https://doi.org/10.1038/s41598-017-05300-5
Girshick R (2015) Fast r-cnn. In: IEEE International conference on computer vision. pp 1440–1448
Godwin J (2018) Multi-task learning in tensorflow: Part 1. https://www.kdnuggets.com/2016/07/multi-task-learning-tensorflow-part-1.html
Gong P, Ye J, Zhang Cs (2012) Multi-stage multi-task feature learning. In: Advances in neural information processing systems. pp 1988–1996
Gong P, Ye J, Zhang C (2012) Robust multi-task feature learning. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. pp 895–903. ACM
Gong P, Zhou J, Fan W, Ye J (2014) Efficient multi-task feature learning with calibration. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. pp 761–770. ACM
Han L, Zhang Y (2015) Learning tree structure in multi-task learning. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 397–406. ACM
Han L, Zhang Y (2016) Multi-stage multi-task learning with reduced rank. In: AAAI. pp 1638–1644
Han L, Zhang Y, Song G, Xie K (2014) Encoding tree sparsity in multi-task learning: a probabilistic framework. In: AAAI. pp 1854–1860
Hu R, Zhu X, Cheng D, He W, Yan Y, Song J, Zhang S (2017) Graph self-representation method for unsupervised feature selection. Neurocomputing 220:130–137
Jacob L, Vert Jp, Bach FR (2009) Clustered multi-task learning: A convex formulation. In: Advances in neural information processing systems. pp 745–752
Jalali A, Ravikumar P, Sanghavi S (2013) A dirty model for multiple sparse regression. IEEE Trans Inf Theory 59(12):7947–7968
Jalali A, Sanghavi S, Ruan C, Ravikumar PK (2010) A dirty model for multi-task learning. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel R. S, Culotta A (eds) Advances in neural information processing systems 23, pp 964-972. Curran Associates, Inc
Jebara T (2004) Multi-task feature and kernel selection for svms. In: Proceedings of the twenty-first international conference on Machine learning. p 55. ACM
Jebara T (2011) Multitask sparsity via maximum entropy discrimination. J Mach Learn Res 12(Jan):75–110
Kim S, Xing EP (2010) Tree-guided group lasso for multi-task regression with structured sparsity. In: International conference on international conference on machine learning. pp. 543–550
Lee H, Battle A, Raina R, Ng AY (2007) Efficient sparse coding algorithms. In: Advances in neural information processing systems. pp 801–808
Lee S, Zhu J, Xing EP (2010) Adaptive multi-task lasso: with application to eqtl detection. In: Advances in neural information processing systems. pp 1306–1314
Li C, Gupta S, Rana S, Nguyen V, Venkatesh S, Ashley D, Livingston T (2016) Multiple adverse effects prediction in longitudinal cancer treatment. In: Pattern recognition (ICPR), 2016 23rd international conference on. pp 3156–3161. IEEE
Li X, Zhao L, Wei L, Yang MH, Wu F, Zhuang Y, Ling H, Wang J (2016) Deepsaliency: Multi-task deep neural network model for salient object detection. IEEE Trans Image Process 25(8):3919–3930
Liu F, Wee CY, Chen H, Shen D (2014) Inter-modality relationship constrained multi-modality multi-task feature selection for alzheimer’s disease and mild cognitive impairment identification. NeuroImage 84:466–475
Liu G, Yan Y, Song J, Sebe N (2014) Minimizing dataset bias: Discriminative multi-task sparse coding through shared subspace learning for image classification. In: Image processing (ICIP), 2014 IEEE international conference on. pp 2869–2873. IEEE
Liu H, Palatucci M, Zhang J (2009) Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In: Proceedings of the 26th Annual International Conference on Machine Learning. pp 649–656. ACM
Liu J, et al. (2009) SLEP: Sparse Learning with efficient projections arizona state university
Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. pp 339–348. AUAI Press
Liu J, Ye J (2009) Efficient euclidean projections in linear time. In: Proceedings of the 26th Annual International Conference on Machine Learning. pp 657–664. ACM
Liu J, Ye J (2010) Moreau-yosida regularization for grouped tree structure learning. In: Advances in neural information processing systems. pp 1459–1467
Liu M, Zhang J, Adeli E, Shen D (2017) Deep multi-task multi-channel learning for joint classification and regression of brain status. In: International conference on medical image computing and computer-assisted intervention. pp 3–11. Springer
Lounici K, Pontil M, Tsybakov AB, Van De Geer S (2009)
Lozano AC, Swirszcz G (2012) Multi-level lasso for sparse multi-task regression. In: Proceedings of the 29th International Coference on International Conference on Machine Learning. pp 595–602. Omnipress
Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th annual international conference on machine learning. pp 689–696. ACM
Mandal MK (2018) Multi-task learning in keras — implementation of multi-task classification loss. https://blog.manash.me/multi-task-learning-in-keras-implementation-of-multi-task-classification-loss-f1d42da5c3f6
Maurer A, Pontil M, Romera-Paredes B (2013) Sparse coding for multitask and transfer learning. In: International conference on machine learning. pp 343–351
McDonald AM, Pontil M, Stamos D (2014) Spectral k-support norm regularization. In: Advances in neural information processing systems. pp 3644–3652
Moeskops P, Wolterink JM, van der Velden BHM, Gilhuijs KGA, Leiner T, Viergever MA, Isgum I (2017) Deep learning for multi-task medical image segmentation in multiple modalities. CoRR arXiv:1704.03379
Negahban S, Wainwright MJ (2008) Joint support recovery under high-dimensional scaling: Benefits and perils of \(\ell _{1,\infty }\)-regularization. In: Proceedings of the 21st International Conference on Neural Information Processing Systems. pp 1161–1168. Curran Associates Inc
Ng A (2018) Multi-task learning. https://www.coursera.org/learn/machine-learning-projects/lecture/l9zia/multi-task-learning
Obozinski G, Taskar B, Jordan M (2006) Multi-task feature selection. Statistics Department UC Berkeley Tech Rep2
Obozinski G, Taskar B, Jordan MI (2010) Joint covariate selection and joint subspace selection for multiple classification problems. Stat Comput 20(2):231–252
Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: a strategy employed by v1? Vis Res 37(23):3311–3325
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Pong TK, Tseng P, Ji S, Ye J (2010) Trace norm regularization: reformulations, algorithms, and multi-task learning. SIAM J Optim 20(6):3465–3489
Ranjan R, Patel VM, Chellappa R (2017) Hyperface:A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence
Rao N, Cox C, Nowak R, Rogers TT (2013) Sparse overlapping sets lasso for multitask learning and its application to fmri analysis. In: Advances in neural information processing systems. pp 2202–2210
Romera-Paredes B, Argyriou A, Berthouze N, Pontil M (2012) Exploiting unrelated tasks in multi-task learning. In: International conference on artificial intelligence and statistics. pp 951–959
Ruder S (2017) An overview of multi-task learning in deep neural networks. arXiv:1706.05098
Samala RK, Chan HP, Hadjiiski L, Helvie MA, Richter C, Cha K (2018) Cross-domain and multi-task transfer learning of deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis. In: MICCAI. vol 10575. https://doi.org/10.1117/12.2293412
Seltzer ML, Droppo J (2013) Multi-task learning in deep neural networks for improved phoneme recognition. In: Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on. pp 6965–6969. IEEE
Seraj RM (2014) Multi-task learning Internet: https://www.cs.ubc.ca/~schmidtm/MLRG/multi-task%20learning.pdf
Suo Y, Dao M, Tran T, Mousavi H, Srinivas U, Monga V (2014) Group structured dirty dictionary learning for classification. In: Image processing (ICIP), 2014 IEEE international conference on. pp 150–154. IEEE
Thung KH, et al. (2014) Neurodegenerative disease diagnosis using incomplete multi-modality data via matrix shrinkage and completion. Neuroimage 91:386–400
Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1):267–288
Titsias MK, Lázaro-Gredilla M (2011) Spike and slab variational inference for multi-task and multiple kernel learning. In: Advances in neural information processing systems. pp 2339–2347
Turlach BA, Venables WN, Wright SJ (2005) Simultaneous variable selection. Technometrics 47(3):349–363
Vasilescu MAO, Terzopoulos D (2002) Multilinear image analysis for facial recognition. In: Pattern recognition, 2002. Proceedings. 16th international conference on. vol 2, pp 511–514. IEEE
Vogt J, Roth V (2012) A complete analysis of the l_1, p group-lasso. arXiv:1206.4632
Vounou M, Nichols TE, Montana G, Initiative ADN, et al. (2010) Discovering genetic associations with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regression approach. Neuroimage 53(3):1147–1159
Wachinger C, Reuter M, Klein T (2018) Deepnat: Deep convolutional neural network for segmenting neuroanatomy. NeuroImage 170:434–445. http://www.sciencedirect.com/science/article/pii/S1053811917301465
Wang H, et al. (2003) Facial expression decomposition. In: Computer vision, 2003. Proceedings. Ninth IEEE international conference on. pp 958–965. IEEE
Wang H, Nie F, Huang H, Yan J, Kim S, Risacher S, Saykin A, Shen L (2012) High-order multi-task feature learning to identify longitudinal phenotypic markers for alzheimer’s disease progression prediction. In: Advances in neural information processing systems. pp 1277–1285
Wang J, Ye J (2015) Safe screening for multi-task feature learning with multiple data matrices. In: International conference on machine learning. pp 1747–1756
Wang Z, Zhu X, Adeli E, Zhu Y, Nie F, Munsell B, Wu G (2017) Multi-modal classification of neurodegenerative disease by progressive graph-based transductive learning. Med Image Anal 39:218–230
Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. Journal of Big Data 3(1):9
Wu Z, Valentini-Botinhao C, Watts O, King S (2015) Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis. In: Acoustics, speech and signal processing (ICASSP), 2015 IEEE international conference on. pp 4460–4464. IEEE
Xiang S, Yuan L, Fan W, Wang Y, Thompson PM, Ye J, Initiative ADN, et al. (2014) Bi-level multi-source learning for heterogeneous block-wise missing data. NeuroImage 102:192–206
Xin B, Kawahara Y, Wang Y, Hu L, Gao W (2016) Efficient generalized fused lasso and its applications. ACM Transactions on Intelligent Systems and Technology (TIST) 7(4):60
Xue W, Brahm G, Pandey S, Leung S, Li S (2018) Full left ventricle quantification via deep multitask relationships learning. Med Image Anal 43:54–65. https://doi.org/10.1016/j.media.2017.09.005
Yan K, Zhang D, Xu Y (2017) Correcting instrumental variation and time-varying drift using parallel and serial multitask learning. IEEE Trans Instrum Meas 66(9):2306–2316
Yuan L, et al. (2012) Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. NeuroImage 61(3):622–632
Zhang C, Zhang Z (2014) Improving multiview face detection with multi-task deep convolutional neural networks. In: Applications of computer vision (WACV), 2014 IEEE winter conference on. pp 1036–1041. IEEE
Zhang D, et al. (2012) Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease. Neuroimage 59 (2):895–907
Zhang J, Ghahramani Z, Yang Y (2006) Learning multiple related tasks using latent independent component analysis. In: Advances in neural information processing systems. pp 1585–1592
Zhang J, Ghahramani Z, Yang Y (2008) Flexible latent variable models for multi-task learning. Mach Learn 73(3):221–242
Zhang J, Liang J, Hu H (2017) Multi-view texture classification using hierarchical synthetic images. Multimedia Tools and Applications 76(16):17511–17523
Zhang J, Liu M, Shen D (2017) Detecting anatomical landmarks from limited medical imaging data using two-stage task-oriented deep neural networks. IEEE Trans Image Process 26(10):4753– 4764
Zhang J, Liu M, Wang L, Chen S, Yuan P, Li J, Shen SGF, Tang Z, Chen KC, Xia JJ et al (2017) Joint craniomaxillofacial bone segmentation and landmark digitization by context-guided fully convolutional networks. In: International conference on medical image computing and computer-assisted intervention. pp 720–728. Springer
Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient knn classification with different numbers of nearest neighbors IEEE transactions on neural networks and learning systems
Zhang W, Li R, Zeng T, Sun Q, Kumar S, Ye J, Ji S (2015) Deep model based transfer and multi-task learning for biological image analysis. https://doi.org/10.1145/2783258.2783304
Zhang Y, Yang Q (2017) A survey on multi-task learning. arXiv:1707.08114
Zhang Y, Yeung DY (2012) A convex formulation for learning task relationships in multi-task learning. arXiv:1203.3536
Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision. pp 94–108. Springer
Zheng J, Ni LM (2013) Time-dependent trajectory regression on road networks via multi-task learning. In: AAAI
Zheng W, Zhu X, Zhu Y, Hu R, Lei C (2017) Dynamic graph learning for spectral feature selection. Multimedia Tools and Applications, pp 1–17
Zhou J, Chen J, Ye J (2011) Malsar: Multi-task learning via structural regularization. Arizona State University 21
Zhou J, Liu J, Narayan VA, Ye J (2012) Modeling disease progression via fused sparse group lasso. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. pp 1095–1103. ACM
Zhou J, Yuan L, Liu J, Ye J (2011) A multi-task learning formulation for predicting disease progression. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. pp 814–822. ACM
Zhou Y, Jin R, Hoi SCH (2010) Exclusive lasso for multi-task feature selection. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. pp 988–995
Zhu X, Li X, Zhang S, Ju C, Wu X (2017) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE transactions on neural networks and learning systems 28(6):1263–1275
Zhu X, Li X, Zhang S, Xu Z, Yu L, Wang C (2017) Graph pca hashing for similarity search. IEEE Transactions on Multimedia 19(9):2033–2044
Zhu X, Suk HI, Huang H, Shen D (2016) Structured sparse low-rank regression model for brain-wide and genome-wide associations. In: International conference on medical image computing and computer-assisted intervention. pp 344–352. Springer
Zhu X, Suk HI, Huang H, Shen D (2017) Low-rank graph-regularized structured sparse regression for identifying genetic biomarkers. IEEE Transactions on Big Data 3(4):405–414
Zhu X, Suk HI, Lee SW, Shen D (2016) Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification. IEEE Trans Biomed Eng 63(3):607–618
Zhu X, Zhang S, Hu R, Zhu Y et al (2017) Local and global structure preservation for robust unsupervised spectral feature selection IEEE Transactions on Knowledge and Data Engineering
Zhu Y, Kim M, Zhu X, Yan J, Kaufer D, Wu G (2017) Personalized diagnosis for alzheimers disease. In: International conference on medical image computing and computer-assisted intervention. pp 205–213. Springer
Zhu Y, Zhu X, Zhang H, Gao W, Shen D, Wu G (2016) Reveal consistent spatial-temporal patterns from dynamic functional connectivity for autism spectrum disorder identification. In: International conference on medical image computing and computer-assisted intervention. pp 106–114. Springer
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Thung, KH., Wee, CY. A brief review on multi-task learning. Multimed Tools Appl 77, 29705–29725 (2018). https://doi.org/10.1007/s11042-018-6463-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6463-x