Skip to main content

Advertisement

Log in

Boosting for Multivariate Longitudinal Responses

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Boosting, a machine learning approach, has gained popularity over the years in its application to various types of data, including longitudinal data. However, its application to data involving multivariate responses is limited. In this article, we present a new approach where we apply gradient boosting, a generic form of boosting, to model multivariate longitudinal responses. Our approach can handle time-varying covariates as well as high dimensionality of covariates and responses when some of the covariates and responses are pure noise. A key feature of our approach is that it is designed to select covariates that affect responses differently at different time intervals; thereby, an overall effect of any covariate can be dissected and represented as a function of time. A novel feature of our approach is that, in addition to covariate selection, we also perform response selection for different time intervals. This helps to identify and order responses based on their importance for a given time interval. Simulation results show that the prediction performance of our approach does not deteriorate in high dimensionality and can approximate the true model. Application of our approach to a clinical laboratory data evaluates the behavior of bilirubin and creatinine for the heart failure patients before and after the heart transplant, and identifies important risk factors that affect their behavior. Our approach can be implemented using the R package BoostMLR

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis. 2nd ed. Hoboken: Wiley Press; 2011.

    Book  Google Scholar 

  2. Majid M, Farveh V, Ahmad A. Liver diseases in heart failure. Heart Asia;143–149;2011.

  3. Mark Sarnak. A patient with heart failure and worsening kidney function. Clin J Am Soc Nephrol. 2014;9(10):1790–8.

    Article  Google Scholar 

  4. Rajeswaran J, Blackstone EH, Bernard J. Evolution of association between renal and liver function while awaiting for the heart transplant: an application using bivariate multiphase nonlinear mixed effect model. Stat Methods Med Res. 2018;27(7):2216–30.

    Article  MathSciNet  Google Scholar 

  5. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.

    Article  MathSciNet  Google Scholar 

  6. Laird NM, Ware JH. Random-effect models for longitudinal data. Biometrics. 1982;38:963–74.

    Article  Google Scholar 

  7. Cho H. The analysis of multivariate longitudinal data using multivariate marginal models. J Multivar Anal. 2016;143:481–91.

    Article  MathSciNet  Google Scholar 

  8. Asar O. On multivariate binary longitudinal data models and their application in forecasting. MS Thesis, Middle East Technical University; 2012.

  9. Komarek A, Komarkova L. Capabilities of R package mixAK for clustering based on multivariate continuous and discrete longitudinal data. J Stat Softw. 2014;59(12):1–38.

    Article  Google Scholar 

  10. Giltinan D, Davidian M. Nonlinear models for repeated measurement data. London: Chapman & Hall; 1995.

    Google Scholar 

  11. Staniswalis JG, Lee JJ. Nonparametric regression analysis of longitudinal data. J Am Stat Assoc. 1998;93(444):1403–18.

    Article  MathSciNet  Google Scholar 

  12. Lin X, Carroll RJ. Nonparametric function estimation for cluster data when the predictor is measured without/with error. J Am Stat Assoc. 2000;95(450):520–34.

    Article  Google Scholar 

  13. Welsh AH, Lin X, Carroll RJ. Marginal longitudinal nonparametric regression: locality and efficiency of spline and kernel methods. J Am Stat Assoc. 2002;97(458):482–93.

    Article  MathSciNet  Google Scholar 

  14. Fan J, Zhang W. Statistical estimation in varying coefficient models. Ann Stat. 1999;27(5):1491–518.

    Article  MathSciNet  Google Scholar 

  15. Cai Z, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. J Am Stat Assoc. 2000;95(451):888–902.

    Article  MathSciNet  Google Scholar 

  16. Fan J, Zhang W. Statistical methods for varying coefficient models. Stat Infer. 2008;1:179–95.

    MathSciNet  MATH  Google Scholar 

  17. Sela RJ, Simonoff JS. RE-EM trees: a data mining approach for longitudinal and clustered data. Mach Learn. 2012;86:169–207.

    Article  MathSciNet  Google Scholar 

  18. Mandel F, Ghosh RP, Barnett I. Neural networks for clustered and longitudinal data using mixed effects models. Biometrics. https://doi.org/10.1111/biom.13615.

  19. Wood SN. Low rank scale invariant tensor product smooths for generalized additive mixed models. Biometrics. 2006;62(4):1025–36.

    Article  MathSciNet  Google Scholar 

  20. Hoover DR, Rice JA, Wu CO, Yang L-P. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85(4):809–22.

    Article  MathSciNet  Google Scholar 

  21. Huang JZ, Wu CO, Zhou L. Varying coefficient models and basis function approximations for the analysis of repreated measurements. Biometrika. 2002;89(1):111–28.

    Article  MathSciNet  Google Scholar 

  22. Chiang CT, Rice JA, Wu CO. Smoothing splines estimation for varying coefficient models with repeatedly measured dependent variables. J Am Stat Assoc. 2001;96(454):605–19.

    Article  MathSciNet  Google Scholar 

  23. Blackstone EH, Naftel DC, Turner ME Jr. The decomposition of time-varying hazard into phases, each incorporating a separate stream of concomitant information. J Am Stat Assoc. 1986;81:615–24.

    Article  Google Scholar 

  24. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29:1189–232.

    Article  MathSciNet  Google Scholar 

  25. Wang L, Li H, Huang JZ. Variable selection in nonparametric varying coefficient models for analysis of repeated measurements. J Am Stat Assoc. 2008;103(484):1556–69.

    Article  MathSciNet  Google Scholar 

  26. Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat. 2000;28(2):337–74.

    Article  Google Scholar 

  27. Pande A, Li L, Rajeswaran J, Ehrlinger J, Kogalur UB, Blackstone Eugene H, Ishwaran H. Boosted multivariate trees for longitudinal data. Mach Learn. 2017;106(2):277–305.

    Article  MathSciNet  Google Scholar 

  28. Tutz G, Reithinger F. A boosting approach to flexible semi parametric mixed models. Stat Med:26(14),2872–2900;2007.

  29. Tutz G, Groll A. Generalized linear mixed models based on boosting. Stat Model Regress Struct:197–215;2010.

  30. Yue M, Li J, Cheng MY. Two-step sparse boosting for high dimensional longitudinal data with varying coefficients. Comput Stat Data Anal. 2019;131:222–34.

    Article  MathSciNet  Google Scholar 

  31. Hothorn T, Buhlmann P, Kneib T, Schmid M, Hofner B. Model-based boosting 2.0. J Mach Learn Res. 2010;11:2109–13.

    MathSciNet  MATH  Google Scholar 

  32. Lutz RW, Buhlmann P. Boosting for high multivariate responses in high dimensional linear regression. Stat Sin. 2006;16:471–94.

    MathSciNet  MATH  Google Scholar 

  33. Buhlmann P, Yu B. Boosting with L\(_2\) loss: regression and classification. J Am Stat Assoc. 2003;98(462):324–39.

    Article  Google Scholar 

  34. Buhlmann P. Boosting for high-dimensional linear models. Ann Stat. 2006;34(2):559–83.

    Article  MathSciNet  Google Scholar 

  35. De Boor C. A practical guide to splines. Berlin: Springer; 1978.

    Book  Google Scholar 

  36. Pande A. Boosting model for longitudinal data. Ph.D. dissertation, University of Miami; 2017.

  37. Pande A, Ishwaran H. BoostMLR: boosting for multivariate longitudinal response, 2021. R package version 1.0.3.

  38. Asar O, Ilk O. mmm: an R package for analyzing multivariate longitudinal data with multivariate marginal models. Comput Methods Programs Biomed. 2013;112:649–54.

    Article  Google Scholar 

  39. Hunt SA, Abraham WT, Chin MH, et al. American College of Cardiology, American Heart Association,. guideline update for the diagnosis and management of chronic heart failure in the adult. Circulation. 2005;112:1824–1852.

Download references

Acknowledgements

Heart, Vascular and Thoracic Institute, Cleveland Clinic provided funding for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amol Pande.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (PDF 553 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pande, A., Ishwaran, H. & Blackstone, E. Boosting for Multivariate Longitudinal Responses. SN COMPUT. SCI. 3, 186 (2022). https://doi.org/10.1007/s42979-022-01072-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01072-6

Keywords