Abstract
The extremely low prevalence of rare diseases exacerbates many of the typical challenges to prognostic model development, resulting, at the same time, in low data availability and difficulties in procuring additional data due to, e.g., privacy concerns over the risk of patient reidentification. Yet, developing prognostic models with possibly limited in-house data is often of interest for many applications (e.g., prototyping, hypothesis confirmation, exploratory analyses).
Several options exist beyond simply training a model with the available data: data from a larger database might be acquired; or, lacking that, to sidestep limitations to data sharing, one might resort to simulators based, e.g., on dynamic Bayesian networks (DBNs). Additionally, transfer learning techniques might be applied to integrate external and in-house data sources.
Here, we compare the effectiveness of these strategies in developing a predictive model of 3-year mortality in amyotrophic lateral sclerosis (ALS, a rare neurodegenerative disease with <0.01% prevalence) using the in-house dataset of a single ALS clinic in Milan, Italy (N = 116). We test several combinations of direct and transfer-learning-mediated development based on additional real data from the Italian PARALS register (N = 568). We also train two DBNs, one for each dataset, and use them to simulate large numbers of virtual subjects whose variables are linked by the same probabilistic relationships as in the real data.
We show that, compared to a baseline model developed on the smaller dataset (AUROC = 0.633), the largest performance increase was obtained using data simulated using a DBN trained on the larger PARALS register (AUROC = 0.734).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
van Buuren, S., Groothuis-Oudshoorn, K.: mice: Multivariate imputation by chained equations in R. J. Stat. Softw. Articles 45(3), 1–67 (2011)
Chiò, A., Hammond, E.R., Mora, G., Bonito, V., Filippini, G.: Development and evaluation of a clinical staging system for amyotrophic lateral sclerosis. J. Neurol. Neurosurg. Psychiatry 86(1), 38–44 (2015)
Chio, A., Logroscino, G., Hardiman, O., et al.: Prognostic factors in ALS: a critical review. Amyotroph. Lateral Scler. 10(5–6), 310–323 (2009)
Chió, A., Mora, G., Moglia, C., Manera, U., Canosa, A., et al.: Secular trends of amyotrophic lateral sclerosis: the Piemonte and Valle d’Aosta register. JAMA Neurol. 74(9), 1097–1104 (2017)
Dagum, P., Galper, A., Horvitz, E.: Dynamic network models for forecasting. In: Uncertainty in Artificial Intelligence, pp. 41–48. Elsevier (1992)
Daumé III, H.: Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815 (2009)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Franzin, A., Sambo, F., Di Camillo, B.: BNStruct: an R package for Bayesian network structure learning in the presence of missing data. Bioinformatics 33(8), 1250–1252 (2017)
Heagerty, P.J., Zheng, Y.: Survival model predictive accuracy and ROC curves. Biometrics 61(1), 92–105 (2005)
Marini, S., Trifoglio, E., Barbarini, N., et al.: A dynamic Bayesian network model for long-term simulation of clinical complications in type 1 diabetes. J. Biomed. Inform. 57, 369–376 (2015)
Pezoulas, V.C., Grigoriadis, G.I., Gkois, G., Tachos, N.S., et al.: A computational pipeline for data augmentation towards the improvement of disease classification and risk stratification models: a case study in two clinical domains. Comput. Biol. Med. 134, 104520 (2021)
Roversi, C., Tavazzi, E., Vettoretti, M., Di Camillo, B.: A dynamic Bayesian network model for simulating the progression to diabetes onset in the ageing population. In: 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 1–4. IEEE (2021)
Schmidt, E.P., Drachman, D.B., Wiener, C.M., Clawson, L., Kimball, R., Lechtzin, N.: Pulmonary predictors of survival in amyotrophic lateral sclerosis: use in clinical trial design. Muscle Nerve: Off. J. Am. Assoc. Electrodiagnos. Med. 33(1), 127–132 (2006)
Tavazzi, E., Daberdaku, S., et al.: Predicting functional impairment trajectories in ALS: a probabilistic, multifactorial model of disease progression. J. Neurol. 1–21 (2022)
Viceconti, M., Henney, A., Morley-Fletcher, E.: In silico clinical trials: how computer simulation will transform the biomedical industry. Int. J. Clin. Trials 3(2), 37–46 (2016)
Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016). https://doi.org/10.1186/s40537-016-0043-6
Acknowledgements
This research was supported by the University of Padova project C94I19001730001, by the Italian Ministry of Health grant RF-2016-02362405, and by the Italian Ministry of Education, University and Research (PRIN) grant 2017SNW5MB.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Longato, E., Tavazzi, E., Chió, A., Mora, G., Sparacino, G., Di Camillo, B. (2023). Dealing with Data Scarcity in Rare Diseases: Dynamic Bayesian Networks and Transfer Learning to Develop Prognostic Models of Amyotrophic Lateral Sclerosis. In: Juarez, J.M., Marcos, M., Stiglic, G., Tucker, A. (eds) Artificial Intelligence in Medicine. AIME 2023. Lecture Notes in Computer Science(), vol 13897. Springer, Cham. https://doi.org/10.1007/978-3-031-34344-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-34344-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34343-8
Online ISBN: 978-3-031-34344-5
eBook Packages: Computer ScienceComputer Science (R0)