Skip to main content

Using Distribution Divergence to Predict Changes in the Performance of Clinical Predictive Models

  • Conference paper
  • First Online:
  • 1900 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12721))

Abstract

Clinical predictive models are vulnerable to degradation in performance due to changes in the distribution of the data (distribution divergence) at application time. Significant reductions in model performance can lead to suboptimal medical decisions and harm to patients. Distribution divergence in healthcare data can arise from changes in medical practice, patient demographics, equipment, and measurement standards. However, estimating model performance at application time is challenging when labels are not readily available, which is often the case in healthcare. One solution to this challenge is to develop unsupervised methods of measuring distribution divergence that are predictive of changes in performance of clinical models. In this article, we investigate the capability of divergence metrics that can be computed without labels in estimating model performance under conditions of distribution divergence. In particular, we examine two popular integral probability metrics, i.e., Wasserstein distance and maximum mean discrepancy, and measure their correlation with model performance in the context of predicting mortality and prolonged stay in the intensive care unit (ICU). When models were trained on data from one hospital’s ICU and assessed on data from ICUs in other hospitals, model performance was significantly correlated with the degree of divergence across hospitals as measured by the distribution divergence metrics. Moreover, regression models could predict model performance from divergence metrics with small errors.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alvarez-Melis, D., Fusi, N.: Geometric dataset distances via optimal transport. Adv. Neural. Inf. Process. Syst. 33, 21428–21439 (2020)

    Google Scholar 

  2. Balachandar, N., Chang, K., Kalpathy-Cramer, J., Rubin, D.L.: Accounting for data variability in multi-institutional distributed deep learning for medical imaging. J. Am. Med. Inform. Assoc. 27(5), 700–708 (2020)

    Article  Google Scholar 

  3. Ben-David, S., Blitzer, J., Crammer, K., et al.: A theory of learning from different domains. Mach. Learn. 79(1–2), 151–175 (2010)

    Article  MathSciNet  Google Scholar 

  4. Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 1–12 (2018)

    Article  Google Scholar 

  5. Chuang, C.Y., Torralba, A., Jegelka, S.: Estimating generalization under distribution shifts via domain-invariant representations. In: International Conference on Machine Learning, pp. 1984–1994. PMLR (2020)

    Google Scholar 

  6. Davis, S.E., Lasko, T.A., Chen, G., Matheny, M.E.: Calibration drift among regression and machine learning models for hospital mortality. In: AMIA Annual Symposium Proceedings, pp. 625–634 (2017)

    Google Scholar 

  7. Davis, S.E., Lasko, T.A., Chen, G., Siew, E.D., Matheny, M.E.: Calibration drift in regression and machine learning models for acute kidney injury. J. Am. Med. Inform. Assoc. 24(6), 1052–1061 (2017)

    Article  Google Scholar 

  8. Elsahar, H., Gallé, M.: To annotate or not? Predicting performance drop under domain shift. In: Proceedings of EMNLP-IJCNLP, pp. 2163–2173 (2019)

    Google Scholar 

  9. Flamary, R., Courty, N.: POT python optimal transport library (2017)

    Google Scholar 

  10. Ghassemi, M., Naumann, T., Schulam, P., et al.: A review of challenges and opportunities in machine learning for health. AMIA Summits Transl. Sci. Proc. 2020, 191–200 (2020)

    Google Scholar 

  11. Gretton, A., Borgwardt, K.M., Rasch, M.J., et al.: A kernel two-sample test. J. Mach. Learn. Res. 13(1), 723–773 (2012)

    MathSciNet  Google Scholar 

  12. Jaffe, A., Nadler, B., Kluger, Y.: Estimating the accuracies of multiple classifiers without labeled data. In: Artificial Intelligence and Statistics, pp. 407–415 (2015)

    Google Scholar 

  13. Kashyap, A.R., Hazarika, D., Kan, M.Y., Zimmermann, R.: Domain divergences: a survey and empirical analysis. arXiv preprint arXiv:2010.12198 (2020)

  14. King, A.J., Cooper, G.F., Clermont, G., et al.: Using machine learning to selectively highlight patient information. J. Biomed. Informat. 100, 103327 (2019)

    Article  Google Scholar 

  15. Kullback, S.: Information theory and statistics. Courier Corporation (1997)

    Google Scholar 

  16. Miotto, R., Wang, F., Wang, S., et al.: Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinform. 19(6), 1236–1246 (2018)

    Article  Google Scholar 

  17. Moons, K.G., Kengne, A.P., Grobbee, D.E., et al.: Risk prediction models: II. External validation, model updating, and impact assessment. Heart 98(9), 691–698 (2012)

    Article  Google Scholar 

  18. Nestor, B., McDermott, M.B., Boag, W., et al.: Feature robustness in non-stationary health records: caveats to deployable model performance in common clinical machine learning tasks. In: Machine Learning for Healthcare Conference, pp. 381–405. PMLR (2019)

    Google Scholar 

  19. S. Panda, S. Palaniappan, J. Xiong, et al. hyppo: A comprehensive multivariate hypothesis testing python package, 2020

    Google Scholar 

  20. Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural. Inf. Process. Syst. 32, 8026–8037 (2019)

    Google Scholar 

  21. Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  22. Platanios, E., Poon, H., Mitchell, T.M., Horvitz, E.J.: Estimating accuracy from unlabeled data: a probabilistic logic approach. Adv. Neural. Inf. Process. Syst. 30, 4361–4370 (2017)

    Google Scholar 

  23. Rabanser, S., Günnemann, S., Lipton, Z.: Failing loudly: an empirical study of methods for detecting dataset shift. In: Advances in Neural Information Processing Systems, vol. 32, pp. 1396–1408 (2019)

    Google Scholar 

  24. Sriperumbudur, B.K., Fukumizu, K., Gretton, A., et al.: On integral probability metrics, \(\varphi \)-divergences and binary classification. arXiv:0901.2698 (2009)

  25. Steyerberg, E.W., et al.: Clinical Prediction Models. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16399-010.1007/978-3-030-16399-0

    Book  Google Scholar 

  26. Subbaswamy, A., Saria, S.: From development to deployment: Dataset shift, causality, and shift-stable models in health AI. Biostatistics 21(2), 345–352 (2020)

    MathSciNet  Google Scholar 

  27. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-71050-9

    Book  Google Scholar 

  28. Wang, S., McDermott, M.B., Chauhan, G., et al.: MIMIC-Extract: a data extraction, preprocessing, and representation pipeline for MIMIC-III. In: Proceedings of the ACM Conference on Health, Inference, and Learning, pp. 222–235 (2020)

    Google Scholar 

  29. Zech, J.R., Badgeley, M.A., Liu, M., et al.: Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 15(11):e1002683 (2018)

    Google Scholar 

  30. Žliobaitė, I., Pechenizkiy, M., Gama, J.: An overview of concept drift applications. In: Japkowicz, N., Stefanowski, J. (eds.) Big Data Analysis: New Algorithms for a New Society. SBD, vol. 16, pp. 91–114. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-26989-4_4

    Chapter  Google Scholar 

Download references

Acknowledgements

The research reported in this publication was supported by the National Library of Medicine of the National Institutes of Health under award number R01 LM012095, and a Provost Fellowship in Intelligent Systems at the University of Pittsburgh (awarded to M.T.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammadamin Tajgardoon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tajgardoon, M., Visweswaran, S. (2021). Using Distribution Divergence to Predict Changes in the Performance of Clinical Predictive Models. In: Tucker, A., Henriques Abreu, P., Cardoso, J., Pereira Rodrigues, P., Riaño, D. (eds) Artificial Intelligence in Medicine. AIME 2021. Lecture Notes in Computer Science(), vol 12721. Springer, Cham. https://doi.org/10.1007/978-3-030-77211-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77211-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77210-9

  • Online ISBN: 978-3-030-77211-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics