Using Distribution Divergence to Predict Changes in the Performance of Clinical Predictive Models

Tajgardoon, Mohammadamin; Visweswaran, Shyam

doi:10.1007/978-3-030-77211-6_14

Using Distribution Divergence to Predict Changes in the Performance of Clinical Predictive Models

Conference paper
First Online: 08 June 2021

1900 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12721))

Abstract

Clinical predictive models are vulnerable to degradation in performance due to changes in the distribution of the data (distribution divergence) at application time. Significant reductions in model performance can lead to suboptimal medical decisions and harm to patients. Distribution divergence in healthcare data can arise from changes in medical practice, patient demographics, equipment, and measurement standards. However, estimating model performance at application time is challenging when labels are not readily available, which is often the case in healthcare. One solution to this challenge is to develop unsupervised methods of measuring distribution divergence that are predictive of changes in performance of clinical models. In this article, we investigate the capability of divergence metrics that can be computed without labels in estimating model performance under conditions of distribution divergence. In particular, we examine two popular integral probability metrics, i.e., Wasserstein distance and maximum mean discrepancy, and measure their correlation with model performance in the context of predicting mortality and prolonged stay in the intensive care unit (ICU). When models were trained on data from one hospital’s ICU and assessed on data from ICUs in other hospitals, model performance was significantly correlated with the degree of divergence across hospitals as measured by the distribution divergence metrics. Moreover, regression models could predict model performance from divergence metrics with small errors.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alvarez-Melis, D., Fusi, N.: Geometric dataset distances via optimal transport. Adv. Neural. Inf. Process. Syst. 33, 21428–21439 (2020)
Google Scholar
Balachandar, N., Chang, K., Kalpathy-Cramer, J., Rubin, D.L.: Accounting for data variability in multi-institutional distributed deep learning for medical imaging. J. Am. Med. Inform. Assoc. 27(5), 700–708 (2020)
Article Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., et al.: A theory of learning from different domains. Mach. Learn. 79(1–2), 151–175 (2010)
Article MathSciNet Google Scholar
Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 1–12 (2018)
Article Google Scholar
Chuang, C.Y., Torralba, A., Jegelka, S.: Estimating generalization under distribution shifts via domain-invariant representations. In: International Conference on Machine Learning, pp. 1984–1994. PMLR (2020)
Google Scholar
Davis, S.E., Lasko, T.A., Chen, G., Matheny, M.E.: Calibration drift among regression and machine learning models for hospital mortality. In: AMIA Annual Symposium Proceedings, pp. 625–634 (2017)
Google Scholar
Davis, S.E., Lasko, T.A., Chen, G., Siew, E.D., Matheny, M.E.: Calibration drift in regression and machine learning models for acute kidney injury. J. Am. Med. Inform. Assoc. 24(6), 1052–1061 (2017)
Article Google Scholar
Elsahar, H., Gallé, M.: To annotate or not? Predicting performance drop under domain shift. In: Proceedings of EMNLP-IJCNLP, pp. 2163–2173 (2019)
Google Scholar
Flamary, R., Courty, N.: POT python optimal transport library (2017)
Google Scholar
Ghassemi, M., Naumann, T., Schulam, P., et al.: A review of challenges and opportunities in machine learning for health. AMIA Summits Transl. Sci. Proc. 2020, 191–200 (2020)
Google Scholar
Gretton, A., Borgwardt, K.M., Rasch, M.J., et al.: A kernel two-sample test. J. Mach. Learn. Res. 13(1), 723–773 (2012)
MathSciNet Google Scholar
Jaffe, A., Nadler, B., Kluger, Y.: Estimating the accuracies of multiple classifiers without labeled data. In: Artificial Intelligence and Statistics, pp. 407–415 (2015)
Google Scholar
Kashyap, A.R., Hazarika, D., Kan, M.Y., Zimmermann, R.: Domain divergences: a survey and empirical analysis. arXiv preprint arXiv:2010.12198 (2020)
King, A.J., Cooper, G.F., Clermont, G., et al.: Using machine learning to selectively highlight patient information. J. Biomed. Informat. 100, 103327 (2019)
Article Google Scholar
Kullback, S.: Information theory and statistics. Courier Corporation (1997)
Google Scholar
Miotto, R., Wang, F., Wang, S., et al.: Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinform. 19(6), 1236–1246 (2018)
Article Google Scholar
Moons, K.G., Kengne, A.P., Grobbee, D.E., et al.: Risk prediction models: II. External validation, model updating, and impact assessment. Heart 98(9), 691–698 (2012)
Article Google Scholar
Nestor, B., McDermott, M.B., Boag, W., et al.: Feature robustness in non-stationary health records: caveats to deployable model performance in common clinical machine learning tasks. In: Machine Learning for Healthcare Conference, pp. 381–405. PMLR (2019)
Google Scholar
S. Panda, S. Palaniappan, J. Xiong, et al. hyppo: A comprehensive multivariate hypothesis testing python package, 2020
Google Scholar
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural. Inf. Process. Syst. 32, 8026–8037 (2019)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Platanios, E., Poon, H., Mitchell, T.M., Horvitz, E.J.: Estimating accuracy from unlabeled data: a probabilistic logic approach. Adv. Neural. Inf. Process. Syst. 30, 4361–4370 (2017)
Google Scholar
Rabanser, S., Günnemann, S., Lipton, Z.: Failing loudly: an empirical study of methods for detecting dataset shift. In: Advances in Neural Information Processing Systems, vol. 32, pp. 1396–1408 (2019)
Google Scholar
Sriperumbudur, B.K., Fukumizu, K., Gretton, A., et al.: On integral probability metrics, \(\varphi \)-divergences and binary classification. arXiv:0901.2698 (2009)
Steyerberg, E.W., et al.: Clinical Prediction Models. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16399-010.1007/978-3-030-16399-0
Book Google Scholar
Subbaswamy, A., Saria, S.: From development to deployment: Dataset shift, causality, and shift-stable models in health AI. Biostatistics 21(2), 345–352 (2020)
MathSciNet Google Scholar
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-71050-9
Book Google Scholar
Wang, S., McDermott, M.B., Chauhan, G., et al.: MIMIC-Extract: a data extraction, preprocessing, and representation pipeline for MIMIC-III. In: Proceedings of the ACM Conference on Health, Inference, and Learning, pp. 222–235 (2020)
Google Scholar
Zech, J.R., Badgeley, M.A., Liu, M., et al.: Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 15(11):e1002683 (2018)
Google Scholar
Žliobaitė, I., Pechenizkiy, M., Gama, J.: An overview of concept drift applications. In: Japkowicz, N., Stefanowski, J. (eds.) Big Data Analysis: New Algorithms for a New Society. SBD, vol. 16, pp. 91–114. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-26989-4_4
Chapter Google Scholar

Download references

Acknowledgements

The research reported in this publication was supported by the National Library of Medicine of the National Institutes of Health under award number R01 LM012095, and a Provost Fellowship in Intelligent Systems at the University of Pittsburgh (awarded to M.T.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Intelligent Systems Program, University of Pittsburgh, Pittsburgh, USA
Mohammadamin Tajgardoon & Shyam Visweswaran
Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, USA
Shyam Visweswaran

Authors

Mohammadamin Tajgardoon
View author publications
You can also search for this author in PubMed Google Scholar
Shyam Visweswaran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammadamin Tajgardoon .

Editor information

Editors and Affiliations

Brunel University London, Uxbridge, UK
Allan Tucker
University of Coimbra, Coimbra, Portugal
Pedro Henriques Abreu
University of Porto, Porto, Portugal
Jaime Cardoso
University of Porto, Porto, Portugal
Pedro Pereira Rodrigues
Universitat Rovira i Virgili, Tarragona, Spain
David Riaño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tajgardoon, M., Visweswaran, S. (2021). Using Distribution Divergence to Predict Changes in the Performance of Clinical Predictive Models. In: Tucker, A., Henriques Abreu, P., Cardoso, J., Pereira Rodrigues, P., Riaño, D. (eds) Artificial Intelligence in Medicine. AIME 2021. Lecture Notes in Computer Science(), vol 12721. Springer, Cham. https://doi.org/10.1007/978-3-030-77211-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-77211-6_14
Published: 08 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77210-9
Online ISBN: 978-3-030-77211-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics