Abstract
Prognosis, such as predicting mortality, is common in medicine. When confronted with small numbers of samples, as in rare medical conditions, the task is challenging. We propose a framework for classification with data with small numbers of samples. Conceptually, our solution is a hybrid of multi-task and transfer learning, employing data samples from source tasks as in transfer learning, but considering all tasks together as in multi-task learning. Each task is modelled jointly with other related tasks by directly augmenting the data from other tasks. The degree of augmentation depends on the task relatedness and is estimated directly from the data. We apply the model on three diverse real-world data sets (healthcare data, handwritten digit data and face data) and show that our method outperforms several state-of-the-art multi-task learning baselines. We extend the model for online multi-task learning where the model parameters are incrementally updated given new data or new tasks. The novelty of our method lies in offering a hybrid multi-task/transfer learning model to exploit sharing across tasks at the data-level and joint parameter learning.









Similar content being viewed by others
Notes
A detail description on optimization methods can be found in [25].
Ethics approval obtained through University and the hospital—12/83.
References
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Argyriou A, Pontil M, Ying Y, Charles MA (2007) A spectral regularization framework for multi-task structure learning. In: Advances in neural information processing systems, pp 25–32
Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 1. Springer, New York
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. ACM, pp 92–100
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learning 3(1):1–122
Chelba C, Acero A (2006) Adaptation of maximum entropy capitalizer: little data can help a lot. Comput speech Lang 20(4):382–399
Chen M, Weinberger KQ, Blitzer J (2011) Co-training for domain adaptation. In: NIPS, pp 2456–2464
Cover TM, Thomas JA (2012) Elements of information theory. Wiley, New York
Daumé III H (2009) Bayesian multitask learning with latent hierarchies. In: Proceedings of the 25th conference on uncertainty in artificial intelligence, pp 135–142
Duan L, Xu D, Tsang IW (2012) Domain adaptation from multiple sources: a domain-dependent regularization approach. IEEE Trans Neural Netw Learn Syst 23(3):504–518
Eaton E, Ruvolo PL (2013) Ella: an efficient lifelong learning algorithm. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 507–515
Evgeniou A, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems: proceedings of the 2006 conference, vol 19. The MIT Press, p 41
Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 109–117
Gao J, Fan W, Jiang J, Han J (2008) Knowledge transfer via multiple model local structure mapping. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 283–291
Gong P, Ye J, Zhang C (2012) Robust multi-task feature learning. In: Proceedings of the 18th ACM SIGKDD. ACM, pp 895–903
Gupta S, Phung D, Venkatesh S (2012) A bayesian nonparametric joint factor model for learning shared and individual subspaces from multiple data sources. In: Proceedings of the SDM, pp 200–211
Gupta S, Phung D, Venkatesh S (2013) Factorial multi-task learning: a bayesian nonparametric approach. In: Proceedings of international conference on machine learning, pp 657–665
Hastie T, Tibshirani R, Jerome J, Friedman H (2001) The elements of statistical learning, vol 1. Springer, New York
Jalali A, Sanghavi S, Ruan C, Ravikumar PK (2010) A dirty model for multi-task learning. In: Neural information processing systems, pp 964–972
Jebara T (2004) Multi-task feature and kernel selection for SVMs. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 55
Ji S, Ye J (2009) An accelerated gradient method for trace norm minimization. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 457–464
Kang Z, Grauman K, Sha F (2011) Learning with whom to share in multi-task feature learning. In: Proceedings of the 28th international conference on machine learning, pp 521–528
Lawrence ND, Platt JC (2004) Learning to learn with the informative vector machine. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 65
Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, pp 339–348
Liu J, Chen J, Ye J (2009) Large-scale sparse logistic regression. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 547–556
Nemirovski A (2005) Efficient methods in convex programming. Lecture Notes. http://www2.isye.gatech.edu/~nemirovs/
Nesterov Y, Nesterov UE (2004) Introductory lectures on convex optimization: a basic course, vol 87. Springer, Berlin
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Rai P, Daume H (2010) Infinite predictor subspace models for multitask learning. In: International conference on artificial intelligence and statistics, pp 613–620
Raudys SJ, Jain AK (1991) Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell 13(3):252–264
Schmidt M (2010) Graphical model structure learning with l1-regularization. PhD thesis, The University of British Columbia
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58:267–288
Xue Y, Liao X, Carin L, Krishnapuram B (2007) Multi-task learning for classification with dirichlet process priors. J Mach Learn Res 8:35–63
Yang H, Lyu MR, King I (2013) Efficient online learning for multitask feature selection. ACM Trans Knowl Discov Data 7(2):6:1–6:27
Zhang Y, Yeung D-Y (2010) A convex formulation for learning task relationships in multi-task learning. In: UAI, pp 733–442
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saha, B., Gupta, S., Phung, D. et al. Multiple task transfer learning with small sample sizes. Knowl Inf Syst 46, 315–342 (2016). https://doi.org/10.1007/s10115-015-0821-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0821-z