Skip to main content
Log in

A multi-loss super regression learner (MSRL) with application to survival prediction using proteomics

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Even though a number of regression techniques have been proposed over the years to handle a large number of regressors, due to the complex nature of data emerging from recent high-throughput experiments, it is unlikely that any single technique will be successful in modeling all data types. Thus, multiple regression algorithms from the collection of modern regression techniques that are capable of handling high dimensional regressors should be entertained for analyzing such data. A novel approach of building a super regression learner is proposed which can be fit with a training data set in order to make future predictions of a continuous outcome. The resulting super regression model is multi-objective in nature and mimics the performances of the best component regression models irrespective of the data type. This is accomplished by combining elements of bootstrap based risk calculation, rank aggregation, and stacking. The utility of this approach is demonstrated through its use on mass spectrometry data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Barron A (1991) Complexity regularization with application to artificial neural networks. In: Roussas G (ed) Nonparametric functional estimation and related topics. Kluwer, Dordrecht, pp 561–576

    Chapter  Google Scholar 

  • Breiman L (1996b) Bagging predictors. Mach Learn 24:123–140

  • Breiman L (1996a) Stacked regression. Mach Learn 24:49–64

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  • Chandra A, Yao X (2006) Evolving hybrid ensembles of learning machines for better generalization. Neurocomputing 69:686–700

    Article  Google Scholar 

  • Cherkassky V, Yunqian M (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17:113–126

    Article  MATH  Google Scholar 

  • Chun H, Keles S (2010) Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Ser B 72:3–25

    Article  MathSciNet  Google Scholar 

  • Chung D, Chun H, Keles S (2012) spls: Sparse Partial Least Squares (SPLS) Regression and Classification. R package version 2.1-2

  • Coombes KR, Koomen JM, Baggerly KA, Morris JS, Kobayashi R (2005) Understanding the characteristics of mass spectrometry data through the use of simulation. Cancer Inf 1:41–52

    Google Scholar 

  • Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  • Datta S, Pihur V, Datta S (2010) An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data. BMC Bioinf 11:427

    Article  Google Scholar 

  • De Bock KW, Coussement K, Van den Poel D (2010) Ensemble classification based on generalized additive models. Comput Stat Data Anal 54:1535–1546

    Article  MATH  Google Scholar 

  • Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40:139–157

    Article  Google Scholar 

  • Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A (2011) e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.6

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(407–451):494–499

    MathSciNet  Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139

    Article  MATH  MathSciNet  Google Scholar 

  • Goldenberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison Wesley, Reading

    Google Scholar 

  • Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–401

    Article  MATH  MathSciNet  Google Scholar 

  • Kong E, Dietterich TG (1995) Error-correcting output coding correct bias and variance. In The XII international conference on machine learning, San Francisco, CA, pp 313–321

  • Kuhn M (2012) caret: Classification and regression training. R package version 2.10

  • Mevik B-H, Wehrens R, Liland KH (2011) pls: Partial Least Squares and Principal Component regression. R package version 2.3-0

  • Monteith K, Carroll J, Seppi K, Martinez T (2011) Turning Bayesian model averaging into Bayesian model combination. In: Proceedings of the international joint conference on neural networks IJCNN’11, IEEE Press, pp 2657–2663

  • Morris JS, Coombes KR, Koomen J, Baggerly KA, Kobayashi R (2005) Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21:1764–1775

    Article  Google Scholar 

  • Mostajabi F, Datta S, Datta S (2013) Predicting patient survival from proteomic profile using mass spectrometry data: an empirical study. Commun Stat Simul Comput 42:485–498

    Article  MATH  MathSciNet  Google Scholar 

  • Ndukum J, Atlas M, Datta S (2011) pkDACLASS: open source software for analyzing MALDI-TOF data. Bioinformation 6:45–47

    Article  Google Scholar 

  • Pihur V, Datta S, Datta S (2007) Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics 23:1607–1615

    Article  Google Scholar 

  • Pihur V, Datta S, Datta S (2009) RankAggreg, an R package for weighted rank aggregation. BMC Bioinf 10:427

    Article  Google Scholar 

  • Rosipal R, Kramer N (2006) Overview and recent advances in partial least squares. In: Saunders C, Grobelnik M, Gunn J, Shawe-Taylor J (eds) Subspace, latent structure and feature selection: statistical and optimization perspectives workshop (SLSFS 2005). Springer, New York, pp 34–51

    Chapter  Google Scholar 

  • Rubinstein RY (1997) Optimization of computer simulation models with rare events. Eur J Oper Res 99:89–112

    Article  Google Scholar 

  • Schiller JH, Harrington D, Belani CP, Langer C, Sandler A, Krook J, Zhu J, Johnson DH (2002) Eastern Cooperative Oncology Group: comparison of four chemotherapy regimens for advanced non-small-cell lung cancer. N Engl J Med 346:92–98

    Article  Google Scholar 

  • Smit EF, Meerbeeck PAM, Lianes P, Debruyne C, Legrand C, Schramel F, Smit H et al (2003) Three-arm randomized study of two cisplatin-based regimens and paclitaxel plus gemcitabine in advanced non-small-cell lung cancer: a phase III trial of the European Organization for Research and Treatment of Cancer Lung Cancer Group—EORTC 08975. J Clin Oncol 21:3909–3917

    Article  Google Scholar 

  • Smola AJ, Scholkopf B (2003) A tutorial on support vector regression. http://alex.smola.org/papers/2003/SmoSch03b

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288

    MATH  MathSciNet  Google Scholar 

  • van der Laan M, Polley EC, Hubbard AE (2007) Super learner. Stat Appl Genet Mol Biol 6:25

  • Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  • Vapnik VN (1999) The nature of statistical learning theory, 2nd edn. Springer, Berlin

    Google Scholar 

  • Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York

    Book  MATH  Google Scholar 

  • Voortman J, Pham TV, Knol JC, Giaccone G, Jimenez CR (2009) Prediction of outcome of non-small cell lung cancer patients treated with chemotherapy and bortezomib by time-course MALDI-TOF-MS serum peptide profiling. Proteome Sci 7:34

    Article  Google Scholar 

  • White H (1989) Learning in artificial neural networks: a statistical perspective. Neural Comput 1:425–464

    Article  Google Scholar 

  • Wold H (1996) Estimation of principal components and related models by iterative least squares. In: Krishnaiah PR (ed) Multivariate analysis. Academic Press, New York, pp 391–420

    Google Scholar 

  • Zou H, Hastie T (2012) elasticnet: Elastic-Net for Sparse Estimation and Sparse PCA. R package version 1.1

Download references

Acknowledgments

This research was supported in part by grants from National Science Foundation (NSF-DMS-0805559, NSF-DMS-1125909) and the National Institutes of Health (NIH-CA133844). We thankfully acknowledge Johannes Voortman and Thang V. Pham for graciously sharing the Netherlands NSCLC data with us. We thank two anonymous reviewers for numerous constructive suggestions leading to a much better paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Susmita Datta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shah, J., Datta, S. & Datta, S. A multi-loss super regression learner (MSRL) with application to survival prediction using proteomics. Comput Stat 29, 1749–1767 (2014). https://doi.org/10.1007/s00180-014-0516-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-014-0516-z

Keywords

Navigation