Abstract
Boosting, a machine learning approach, has gained popularity over the years in its application to various types of data, including longitudinal data. However, its application to data involving multivariate responses is limited. In this article, we present a new approach where we apply gradient boosting, a generic form of boosting, to model multivariate longitudinal responses. Our approach can handle time-varying covariates as well as high dimensionality of covariates and responses when some of the covariates and responses are pure noise. A key feature of our approach is that it is designed to select covariates that affect responses differently at different time intervals; thereby, an overall effect of any covariate can be dissected and represented as a function of time. A novel feature of our approach is that, in addition to covariate selection, we also perform response selection for different time intervals. This helps to identify and order responses based on their importance for a given time interval. Simulation results show that the prediction performance of our approach does not deteriorate in high dimensionality and can approximate the true model. Application of our approach to a clinical laboratory data evaluates the behavior of bilirubin and creatinine for the heart failure patients before and after the heart transplant, and identifies important risk factors that affect their behavior. Our approach can be implemented using the R package BoostMLR







Similar content being viewed by others
References
Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis. 2nd ed. Hoboken: Wiley Press; 2011.
Majid M, Farveh V, Ahmad A. Liver diseases in heart failure. Heart Asia;143–149;2011.
Mark Sarnak. A patient with heart failure and worsening kidney function. Clin J Am Soc Nephrol. 2014;9(10):1790–8.
Rajeswaran J, Blackstone EH, Bernard J. Evolution of association between renal and liver function while awaiting for the heart transplant: an application using bivariate multiphase nonlinear mixed effect model. Stat Methods Med Res. 2018;27(7):2216–30.
Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
Laird NM, Ware JH. Random-effect models for longitudinal data. Biometrics. 1982;38:963–74.
Cho H. The analysis of multivariate longitudinal data using multivariate marginal models. J Multivar Anal. 2016;143:481–91.
Asar O. On multivariate binary longitudinal data models and their application in forecasting. MS Thesis, Middle East Technical University; 2012.
Komarek A, Komarkova L. Capabilities of R package mixAK for clustering based on multivariate continuous and discrete longitudinal data. J Stat Softw. 2014;59(12):1–38.
Giltinan D, Davidian M. Nonlinear models for repeated measurement data. London: Chapman & Hall; 1995.
Staniswalis JG, Lee JJ. Nonparametric regression analysis of longitudinal data. J Am Stat Assoc. 1998;93(444):1403–18.
Lin X, Carroll RJ. Nonparametric function estimation for cluster data when the predictor is measured without/with error. J Am Stat Assoc. 2000;95(450):520–34.
Welsh AH, Lin X, Carroll RJ. Marginal longitudinal nonparametric regression: locality and efficiency of spline and kernel methods. J Am Stat Assoc. 2002;97(458):482–93.
Fan J, Zhang W. Statistical estimation in varying coefficient models. Ann Stat. 1999;27(5):1491–518.
Cai Z, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. J Am Stat Assoc. 2000;95(451):888–902.
Fan J, Zhang W. Statistical methods for varying coefficient models. Stat Infer. 2008;1:179–95.
Sela RJ, Simonoff JS. RE-EM trees: a data mining approach for longitudinal and clustered data. Mach Learn. 2012;86:169–207.
Mandel F, Ghosh RP, Barnett I. Neural networks for clustered and longitudinal data using mixed effects models. Biometrics. https://doi.org/10.1111/biom.13615.
Wood SN. Low rank scale invariant tensor product smooths for generalized additive mixed models. Biometrics. 2006;62(4):1025–36.
Hoover DR, Rice JA, Wu CO, Yang L-P. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85(4):809–22.
Huang JZ, Wu CO, Zhou L. Varying coefficient models and basis function approximations for the analysis of repreated measurements. Biometrika. 2002;89(1):111–28.
Chiang CT, Rice JA, Wu CO. Smoothing splines estimation for varying coefficient models with repeatedly measured dependent variables. J Am Stat Assoc. 2001;96(454):605–19.
Blackstone EH, Naftel DC, Turner ME Jr. The decomposition of time-varying hazard into phases, each incorporating a separate stream of concomitant information. J Am Stat Assoc. 1986;81:615–24.
Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29:1189–232.
Wang L, Li H, Huang JZ. Variable selection in nonparametric varying coefficient models for analysis of repeated measurements. J Am Stat Assoc. 2008;103(484):1556–69.
Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat. 2000;28(2):337–74.
Pande A, Li L, Rajeswaran J, Ehrlinger J, Kogalur UB, Blackstone Eugene H, Ishwaran H. Boosted multivariate trees for longitudinal data. Mach Learn. 2017;106(2):277–305.
Tutz G, Reithinger F. A boosting approach to flexible semi parametric mixed models. Stat Med:26(14),2872–2900;2007.
Tutz G, Groll A. Generalized linear mixed models based on boosting. Stat Model Regress Struct:197–215;2010.
Yue M, Li J, Cheng MY. Two-step sparse boosting for high dimensional longitudinal data with varying coefficients. Comput Stat Data Anal. 2019;131:222–34.
Hothorn T, Buhlmann P, Kneib T, Schmid M, Hofner B. Model-based boosting 2.0. J Mach Learn Res. 2010;11:2109–13.
Lutz RW, Buhlmann P. Boosting for high multivariate responses in high dimensional linear regression. Stat Sin. 2006;16:471–94.
Buhlmann P, Yu B. Boosting with L\(_2\) loss: regression and classification. J Am Stat Assoc. 2003;98(462):324–39.
Buhlmann P. Boosting for high-dimensional linear models. Ann Stat. 2006;34(2):559–83.
De Boor C. A practical guide to splines. Berlin: Springer; 1978.
Pande A. Boosting model for longitudinal data. Ph.D. dissertation, University of Miami; 2017.
Pande A, Ishwaran H. BoostMLR: boosting for multivariate longitudinal response, 2021. R package version 1.0.3.
Asar O, Ilk O. mmm: an R package for analyzing multivariate longitudinal data with multivariate marginal models. Comput Methods Programs Biomed. 2013;112:649–54.
Hunt SA, Abraham WT, Chin MH, et al. American College of Cardiology, American Heart Association,. guideline update for the diagnosis and management of chronic heart failure in the adult. Circulation. 2005;112:1824–1852.
Acknowledgements
Heart, Vascular and Thoracic Institute, Cleveland Clinic provided funding for this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Pande, A., Ishwaran, H. & Blackstone, E. Boosting for Multivariate Longitudinal Responses. SN COMPUT. SCI. 3, 186 (2022). https://doi.org/10.1007/s42979-022-01072-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01072-6