Boosting for Multivariate Longitudinal Responses

Pande, Amol; Ishwaran, Hemant; Blackstone, Eugene

doi:10.1007/s42979-022-01072-6

Boosting for Multivariate Longitudinal Responses

Original Research
Published: 09 March 2022

Volume 3, article number 186, (2022)
Cite this article

SN Computer Science Aims and scope Submit manuscript

238 Accesses
Explore all metrics

Abstract

Boosting, a machine learning approach, has gained popularity over the years in its application to various types of data, including longitudinal data. However, its application to data involving multivariate responses is limited. In this article, we present a new approach where we apply gradient boosting, a generic form of boosting, to model multivariate longitudinal responses. Our approach can handle time-varying covariates as well as high dimensionality of covariates and responses when some of the covariates and responses are pure noise. A key feature of our approach is that it is designed to select covariates that affect responses differently at different time intervals; thereby, an overall effect of any covariate can be dissected and represented as a function of time. A novel feature of our approach is that, in addition to covariate selection, we also perform response selection for different time intervals. This helps to identify and order responses based on their importance for a given time interval. Simulation results show that the prediction performance of our approach does not deteriorate in high dimensionality and can approximate the true model. Application of our approach to a clinical laboratory data evaluates the behavior of bilirubin and creatinine for the heart failure patients before and after the heart transplant, and identifies important risk factors that affect their behavior. Our approach can be implemented using the R package BoostMLR

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosted multivariate trees for longitudinal data

Article 04 November 2016

Empirical investigations of boosting with pseudo-outcome imputation for missing responses

Article 30 January 2025

A boosting first-hitting-time model for survival analysis in high-dimensional settings

Article Open access 27 April 2022

References

Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis. 2nd ed. Hoboken: Wiley Press; 2011.
Book Google Scholar
Majid M, Farveh V, Ahmad A. Liver diseases in heart failure. Heart Asia;143–149;2011.
Mark Sarnak. A patient with heart failure and worsening kidney function. Clin J Am Soc Nephrol. 2014;9(10):1790–8.
Article Google Scholar
Rajeswaran J, Blackstone EH, Bernard J. Evolution of association between renal and liver function while awaiting for the heart transplant: an application using bivariate multiphase nonlinear mixed effect model. Stat Methods Med Res. 2018;27(7):2216–30.
Article MathSciNet Google Scholar
Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
Article MathSciNet Google Scholar
Laird NM, Ware JH. Random-effect models for longitudinal data. Biometrics. 1982;38:963–74.
Article Google Scholar
Cho H. The analysis of multivariate longitudinal data using multivariate marginal models. J Multivar Anal. 2016;143:481–91.
Article MathSciNet Google Scholar
Asar O. On multivariate binary longitudinal data models and their application in forecasting. MS Thesis, Middle East Technical University; 2012.
Komarek A, Komarkova L. Capabilities of R package mixAK for clustering based on multivariate continuous and discrete longitudinal data. J Stat Softw. 2014;59(12):1–38.
Article Google Scholar
Giltinan D, Davidian M. Nonlinear models for repeated measurement data. London: Chapman & Hall; 1995.
Google Scholar
Staniswalis JG, Lee JJ. Nonparametric regression analysis of longitudinal data. J Am Stat Assoc. 1998;93(444):1403–18.
Article MathSciNet Google Scholar
Lin X, Carroll RJ. Nonparametric function estimation for cluster data when the predictor is measured without/with error. J Am Stat Assoc. 2000;95(450):520–34.
Article Google Scholar
Welsh AH, Lin X, Carroll RJ. Marginal longitudinal nonparametric regression: locality and efficiency of spline and kernel methods. J Am Stat Assoc. 2002;97(458):482–93.
Article MathSciNet Google Scholar
Fan J, Zhang W. Statistical estimation in varying coefficient models. Ann Stat. 1999;27(5):1491–518.
Article MathSciNet Google Scholar
Cai Z, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. J Am Stat Assoc. 2000;95(451):888–902.
Article MathSciNet Google Scholar
Fan J, Zhang W. Statistical methods for varying coefficient models. Stat Infer. 2008;1:179–95.
MathSciNet MATH Google Scholar
Sela RJ, Simonoff JS. RE-EM trees: a data mining approach for longitudinal and clustered data. Mach Learn. 2012;86:169–207.
Article MathSciNet Google Scholar
Mandel F, Ghosh RP, Barnett I. Neural networks for clustered and longitudinal data using mixed effects models. Biometrics. https://doi.org/10.1111/biom.13615.
Wood SN. Low rank scale invariant tensor product smooths for generalized additive mixed models. Biometrics. 2006;62(4):1025–36.
Article MathSciNet Google Scholar
Hoover DR, Rice JA, Wu CO, Yang L-P. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85(4):809–22.
Article MathSciNet Google Scholar
Huang JZ, Wu CO, Zhou L. Varying coefficient models and basis function approximations for the analysis of repreated measurements. Biometrika. 2002;89(1):111–28.
Article MathSciNet Google Scholar
Chiang CT, Rice JA, Wu CO. Smoothing splines estimation for varying coefficient models with repeatedly measured dependent variables. J Am Stat Assoc. 2001;96(454):605–19.
Article MathSciNet Google Scholar
Blackstone EH, Naftel DC, Turner ME Jr. The decomposition of time-varying hazard into phases, each incorporating a separate stream of concomitant information. J Am Stat Assoc. 1986;81:615–24.
Article Google Scholar
Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29:1189–232.
Article MathSciNet Google Scholar
Wang L, Li H, Huang JZ. Variable selection in nonparametric varying coefficient models for analysis of repeated measurements. J Am Stat Assoc. 2008;103(484):1556–69.
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat. 2000;28(2):337–74.
Article Google Scholar
Pande A, Li L, Rajeswaran J, Ehrlinger J, Kogalur UB, Blackstone Eugene H, Ishwaran H. Boosted multivariate trees for longitudinal data. Mach Learn. 2017;106(2):277–305.
Article MathSciNet Google Scholar
Tutz G, Reithinger F. A boosting approach to flexible semi parametric mixed models. Stat Med:26(14),2872–2900;2007.
Tutz G, Groll A. Generalized linear mixed models based on boosting. Stat Model Regress Struct:197–215;2010.
Yue M, Li J, Cheng MY. Two-step sparse boosting for high dimensional longitudinal data with varying coefficients. Comput Stat Data Anal. 2019;131:222–34.
Article MathSciNet Google Scholar
Hothorn T, Buhlmann P, Kneib T, Schmid M, Hofner B. Model-based boosting 2.0. J Mach Learn Res. 2010;11:2109–13.
MathSciNet MATH Google Scholar
Lutz RW, Buhlmann P. Boosting for high multivariate responses in high dimensional linear regression. Stat Sin. 2006;16:471–94.
MathSciNet MATH Google Scholar
Buhlmann P, Yu B. Boosting with L$_2$ loss: regression and classification. J Am Stat Assoc. 2003;98(462):324–39.
Article Google Scholar
Buhlmann P. Boosting for high-dimensional linear models. Ann Stat. 2006;34(2):559–83.
Article MathSciNet Google Scholar
De Boor C. A practical guide to splines. Berlin: Springer; 1978.
Book Google Scholar
Pande A. Boosting model for longitudinal data. Ph.D. dissertation, University of Miami; 2017.
Pande A, Ishwaran H. BoostMLR: boosting for multivariate longitudinal response, 2021. R package version 1.0.3.
Asar O, Ilk O. mmm: an R package for analyzing multivariate longitudinal data with multivariate marginal models. Comput Methods Programs Biomed. 2013;112:649–54.
Article Google Scholar
Hunt SA, Abraham WT, Chin MH, et al. American College of Cardiology, American Heart Association,. guideline update for the diagnosis and management of chronic heart failure in the adult. Circulation. 2005;112:1824–1852.

Download references

Acknowledgements

Heart, Vascular and Thoracic Institute, Cleveland Clinic provided funding for this research.

Author information

Hemant Ishwaran and Eugene Blackstone contributed equally to this work.

Authors and Affiliations

Heart,Vascular and Thoracic Institute, Cleveland Clinic, 9500 Euclid Avenue, Cleveland, OH, 44016, USA
Amol Pande & Eugene Blackstone
Division of Biostatistics, University of Miami, 1120 NW 14th St, Miami, FL, 33136, USA
Hemant Ishwaran

Authors

Amol Pande
View author publications
You can also search for this author inPubMed Google Scholar
Hemant Ishwaran
View author publications
You can also search for this author inPubMed Google Scholar
Eugene Blackstone
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Amol Pande.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (PDF 553 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pande, A., Ishwaran, H. & Blackstone, E. Boosting for Multivariate Longitudinal Responses. SN COMPUT. SCI. 3, 186 (2022). https://doi.org/10.1007/s42979-022-01072-6

Download citation

Received: 24 January 2022
Accepted: 13 February 2022
Published: 09 March 2022
DOI: https://doi.org/10.1007/s42979-022-01072-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting for Multivariate Longitudinal Responses

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Boosted multivariate trees for longitudinal data

Empirical investigations of boosting with pseudo-outcome imputation for missing responses

A boosting first-hitting-time model for survival analysis in high-dimensional settings

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (PDF 553 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now