skip to main content
research-article
Open access

Flexible Modelling of Longitudinal Medical Data: A Bayesian Nonparametric Approach

Published: 02 March 2020 Publication History

Abstract

Using electronic medical records to learn personalized risk trajectories poses significant challenges because often very few samples are available in a patient’s history, and, when available, their information content is highly diverse. In this article, we consider how to integrate sparsely sampled longitudinal data, missing measurements informative of the underlying health status, and static information to estimate (dynamically, as new information becomes available) personalized survival distributions. We achieve this by developing a nonparametric probabilistic model that generates survival trajectories, and corresponding uncertainty estimates, from an ensemble of Bayesian trees in which time is incorporated explicitly to learn variable interactions over time, without needing to specify the longitudinal process beforehand. As such, the changing influence on survival of variables over time is inferred from the data directly, which we analyze with post-processing statistics derived from our model.

References

[1]
Kartik Ahuja, William Zame, and Mihaela van der Schaar. 2017. DPSCREEN: Dynamic personalized screening. In Advances in Neural Information Processing Systems 30 (NIPS’17). 1321--1332.
[2]
Ahmed M. Alaa, Scott Hu, and Mihaela van der Schaar. 2017. Learning from clinical judgments: Semi-Markov-modulated marked Hawkes processes for risk prognosis. In Proceedings of theInternational Conference of Machine Learning.
[3]
James H. Albert and Siddhartha Chib. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88, 422 (1993), 669--679.
[4]
Eleni-Rosalina Andrinopoulou, Dimitris Rizopoulos, Ruyun Jin, Ad J. J. C. Bogers, Emmanuel Lesaffre, and Johanna J. M. Takkenberg. 2012. An introduction to mixed models and joint modeling: Analysis of valve function over time. Annals of Thoracic Surgery 93, 6 (2012), 1765--1772.
[5]
Peter C. Austin. 2012. Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Statistics in Medicine 31, 29 (2012), 3946--3958.
[6]
Edmon Begoli, Tanmoy Bhattacharya, and Dimitri Kusnezov. 2019. The need for uncertainty quantification in machine-assisted medical decision making. Nature Machine Intelligence 1, 1 (2019), 20.
[7]
Melanie L. Bell, Mallorie Fiero, Nicholas J. Horton, and Chiu-Hsieh Hsu. 2014. Handling missing data in RCTs; a review of the top medical journals. BMC Medical Research Methodology 14 (2014), 118.
[8]
Alexis Bellot and Mihaela van der Schaar. 2018. Boosted trees for risk prognosis. In Proceedings of the Machine Learning for Healthcare Conference. 2--16.
[9]
Alexis Bellot and Mihaela van der Schaar. 2018. Multitask boosting for survival analysis with competing risks. In Advances in Neural Information Processing Systems 31 (NIPS’18). 1390--1399.
[10]
Krishnan Bhaskaran, Harriet J. Forbes, Ian Douglas, David A. Leon, and Liam Smeeth. 2013. Representativeness and optimal use of body mass index (BMI) in the UK clinical practice research datalink (CPRD). BMJ Open 3, 9 (2013), e003389.
[11]
Hugh A. Chipman, Edward I. George, and Robert E. McCulloch. 2010. BART: Bayesian additive regression trees. Annals of Applied Statistics 4, 1 (2010), 266--298.
[12]
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. 2016. Doctor AI: Predicting clinical events via recurrent neural networks. In Proceedings of the Machine Learning for Healthcare Conference. 301--318.
[13]
Tamara Fernández, Nicolás Rivera, and Yee Whye Teh. 2016. Gaussian processes for survival analysis. In Advances in Neural Information Processing Systems. 5021--5029.
[14]
Loïc Ferrer, Hein Putter, and Cécile Proust-Lima. 2017. Individual dynamic predictions using landmarking and joint modelling: Validation of estimators and robustness assessment. arXiv:1707.03706.
[15]
Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 5 (2001), 1189--1232.
[16]
Thomas A. Gerds, Michael W. Kattan, Martin Schumacher, and Changhong Yu. 2013. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Statistics in Medicine 32, 13 (2013), 2173--2184.
[17]
Yanzhang Gong, S. L. Klingenberg, and C. Gluud. 2006. Systematic review and meta-analysis: D-Penicillamine vs. placebo/no intervention in patients with primary biliary cirrhosis—Cochrane Hepato-Biliary Group. Alimentary Pharmacology 8 Therapeutics 24, 11--12 (2006), 1535--1544.
[18]
Trevor Hastie and Robert Tibshirani. 2000. Bayesian backfitting. Statistical Science 15, 3 (2000), 196--223.
[19]
Robin Henderson, Peter Diggle, and Angela Dobson. 2000. Joint modelling of longitudinal measurements and event time data. Biostatistics 1, 4 (2000), 465--480.
[20]
Graeme L. Hickey, Pete Philipson, Andrea Jorgensen, and Ruwanthi Kolamunnage-Dona. 2016. Joint modelling of time-to-event and multivariate longitudinal outcomes: Recent developments and issues. BMC Medical Research Methodology 16, 1 (2016), 117.
[21]
Hemant Ishwaran, Udaya B. Kogalur, Eugene H. Blackstone, and Michael S. Lauer. 2008. Random survival forests. Annals of Applied Statistics 2, 3 (2008), 841--860.
[22]
Adam Kapelner and Justin Bleich. 2013. bartMachine: Machine learning with Bayesian additive regression trees. arXiv:1312.2171.
[23]
Silvan Licher, Alis Heshmatollah, Kimberly D. van der Willik, Bruno H. Ch Stricker, Rikje Ruiter, Emmely W. de Roos, Lies Lahousse. 2019. Lifetime risk and multimorbidity of non-communicable diseases and disease-free life expectancy in the general population: A population-based cohort study. PLoS Medicine 16, 2 (2019), e1002741.
[24]
Bryan Lim and Mihaela van der Schaar. 2018. Disease-Atlas: Navigating disease trajectories with deep learning. In Proceedings of the Machine Learning for Healthcare Conference.
[25]
Zachary C. Lipton. 2016. The mythos of model interpretability. arXiv:1606.03490.
[26]
Zachary C. Lipton. 2017. The doctor just won’t accept that! arXiv:1711.08037.
[27]
Roderick J. A. Little and Donald B. Rubin. 2014. Statistical Analysis with Missing Data. Vol. 333. John Wiley 8 Sons.
[28]
Matthew Powney, Paula Williamson, Jamie Kirkham, and Ruwanthi Kolamunnage-Dona. 2014. A review of the handling of missing longitudinal outcome data in clinical trials. Trials 15, 1 (2014), 237.
[29]
Rajesh Ranganath, Adler Perotte, Noémie Elhadad, and David Blei. 2016. Deep survival analysis. In Proceedings of the Machine Learning for Healthcare Conference. 101--114.
[30]
Dimitris Rizopoulos. 2011. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics 67, 3 (2011), 819--829.
[31]
Patrick Royston. 2004. Multiple imputation of missing values. Stata Journal 4, 3 (2004), 227--41.
[32]
Judith D. Singer and John B. Willett. 1993. It’s about time: Using discrete-time survival analysis to study duration and the timing of events. Journal of Educational Statistics 18, 2 (1993), 155--195.
[33]
Hossein Soleimani, James Hensman, and Suchi Saria. 2017. Scalable joint models for reliable uncertainty-aware event prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 8 (2017), 1948--1963.
[34]
Rodney A. Sparapani, Brent R. Logan, Robert E. McCulloch, and Purushottam W. Laud. 2016. Nonparametric survival analysis using Bayesian additive regression trees (BART). Statistics in Medicine 35, 16 (2016), 2741--2753.
[35]
Damian C. Stanziano, Michael Whitehurst, Patricia Graham, and Bernard A. Roos. 2010. A review of selected longitudinal studies on aging: Past findings and future directions. Journal of the American Geriatrics Society 58 (2010), S292--S297.
[36]
Terry M. Therneau and P. M. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model, P. Bickel, P. Diggle, S. Fienberg, et al. (Eds.). Statistics for Biology and Health. Springer.
[37]
Beth Twala, M. C. Jones, and David J. Hand. 2008. Good methods for coping with missing data in decision trees. Pattern Recognition Letters 29, 7 (2008), 950--956.
[38]
Hans C. Van Houwelingen. 2007. Dynamic prediction by landmarking in event history analysis. Scandinavian Journal of Statistics 34, 1 (2007), 70--85.
[39]
Andrew K. Wills, Debbie A. Lawlor, Fiona E. Matthews, Avan Aihie Sayer, Eleni Bakra, Yoav Ben-Shlomo, Michaela Benzeval, et al. 2011. Life course trajectories of systolic blood pressure using longitudinal data from eight UK cohorts. PLoS Medicine 8, 6 (2011), e1000440.

Cited By

View all
  • (2024)HEalthRecordBERT (HERBERT): Leveraging Transformers on Electronic Health Records for Chronic Kidney Disease Risk StratificationACM Transactions on Computing for Healthcare10.1145/36658995:3(1-18)Online publication date: 18-Sep-2024
  • (2021)Real-world Patient Trajectory Prediction from Clinical Notes Using Artificial Neural Networks and UMLS-Based Extraction of ConceptsJournal of Healthcare Informatics Research10.1007/s41666-021-00100-z5:4(474-496)Online publication date: 5-Jun-2021
  • (2021)Batch and online variational learning of hierarchical Dirichlet process mixtures of multivariate Beta distributions in medical applicationsPattern Analysis & Applications10.1007/s10044-021-01023-624:4(1731-1744)Online publication date: 1-Nov-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computing for Healthcare
ACM Transactions on Computing for Healthcare  Volume 1, Issue 1
January 2020
99 pages
EISSN:2637-8051
DOI:10.1145/3386261
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2020
Accepted: 01 October 2019
Received: 01 October 2019
Published in HEALTH Volume 1, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bayesian nonparametrics
  2. survival analysis

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)188
  • Downloads (Last 6 weeks)35
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)HEalthRecordBERT (HERBERT): Leveraging Transformers on Electronic Health Records for Chronic Kidney Disease Risk StratificationACM Transactions on Computing for Healthcare10.1145/36658995:3(1-18)Online publication date: 18-Sep-2024
  • (2021)Real-world Patient Trajectory Prediction from Clinical Notes Using Artificial Neural Networks and UMLS-Based Extraction of ConceptsJournal of Healthcare Informatics Research10.1007/s41666-021-00100-z5:4(474-496)Online publication date: 5-Jun-2021
  • (2021)Batch and online variational learning of hierarchical Dirichlet process mixtures of multivariate Beta distributions in medical applicationsPattern Analysis & Applications10.1007/s10044-021-01023-624:4(1731-1744)Online publication date: 1-Nov-2021
  • (2020)How artificial intelligence and machine learning can help healthcare systems respond to COVID-19Machine Learning10.1007/s10994-020-05928-x110:1(1-14)Online publication date: 9-Dec-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media