Skip to main content

AutoMap: Automatic Medical Code Mapping for Clinical Prediction Model Deployment

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13714))

  • 1095 Accesses

Abstract

Given a deep learning model trained on data from a source hospital, how to deploy the model to a target hospital automatically? How to accommodate heterogeneous medical coding systems across different hospitals? Standard approaches rely on existing medical code mapping tools, which have several practical limitations.

To tackle this problem, we propose AutoMap to automatically map the medical codes across different EHR systems in a coarse-to-fine manner: (1) Ontology-level Alignment: We leverage the ontology structure to learn a coarse alignment between the source and target medical coding systems; (2) Code-level Refinement: We refine the alignment at a fine-grained code level for the downstream tasks using a teacher-student framework.

We evaluate AutoMap using several deep learning models with two real-world EHR datasets: eICU and MIMIC-III. Results show that AutoMap achieves relative improvements up to 3.9% (AUC-ROC) and 8.7% (AUC-PR) for mortality prediction, and up to 4.7% (AUC-ROC) and 3.7% (F1) for length-of-stay estimation. Further, we show that AutoMap can provide accurate mapping across coding systems. Lastly, we demonstrate that AutoMap can adapt to two challenging scenarios: (1) mapping between completely different coding systems and (2) between completely different hospitals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In practice, the isometry requirement will not hold exactly, but it can be assumed to hold approximately, or the problem of mapping two code embedding spaces without supervision would be impossible.

  2. 2.

    https://github.com/zzachw/AutoMap.

References

  1. Artetxe, M., Labaka, G., Agirre, E.: Learning bilingual word embeddings with (almost) no bilingual data. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, (Volume 1: Long Papers), pp. 451–462. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1042. https://www.aclweb.org/anthology/P17-1042

  2. Artetxe, M., Labaka, G., Agirre, E.: A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, (Volume 1: Long Papers), pp. 789–798. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/P18-1073. https://www.aclweb.org/anthology/P18-1073

  3. Birkhead, G.S., Klompas, M., Shah, N.R.: Uses of electronic health records for public health surveillance to advance public health. Ann. Rev. Public Health 36(1), 345–359 (2015). https://doi.org/10.1146/annurev-publhealth-031914-122747, pMID: 25581157

  4. Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue), D267–270 (2004)

    Google Scholar 

  5. Choi, E., Bahadori, M.T., Kulas, J.A., Schuetz, A., Stewart, W.F., Sun, J.: Retain: an interpretable predictive model for healthcare using reverse time attention mechanism. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS 2016, pp. 3512–3520. Curran Associates Inc., Red Hook (2016)

    Google Scholar 

  6. Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor AI: predicting clinical events via recurrent neural networks. In: Doshi-Velez, F., Fackler, J., Kale, D., Wallace, B., Wiens, J. (eds.) Proceedings of the 1st Machine Learning for Healthcare Conference. Proceedings of Machine Learning Research, vol. 56, pp. 301–318. PMLR, Northeastern University, Boston, MA, USA (2016). https://proceedings.mlr.press/v56/Choi16.html

  7. Choi, E., et al.: Multi-layer representation learning for medical concepts. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 1495–1504. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2939672.2939823

  8. Choi, E., et al.: Learning the graphical structure of electronic health records with graph convolutional transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 606–613 (2020). https://doi.org/10.1609/aaai.v34i01.5400

  9. Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data (2018)

    Google Scholar 

  10. Gupta, P., Malhotra, P., Narwariya, J., Vig, L., Shroff, G.: Transfer learning for clinical time series analysis using deep neural networks (2019)

    Google Scholar 

  11. Harutyunyan, H., Khachatrian, H., Kale, D.C., Ver Steeg, G., Galstyan, A.: Multitask learning and benchmarking with clinical time series data. Sci. Data 6(1) (2019). https://doi.org/10.1038/s41597-019-0103-9

  12. Hripcsak, G., et al.: Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud. Health Technol. Inform. 216, 574–578 (2015)

    Google Scholar 

  13. Johnson, A.E., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3, 160035 (2016)

    Article  Google Scholar 

  14. Li, Y., et al.: BEHRT: transformer for electronic health records. Sci. Rep. 10(1), 7155 (2020). https://doi.org/10.1038/s41598-020-62922-y

  15. Luo, J., Ye, M., Xiao, C., Ma, F.: HiTANet: hierarchical time-aware attention networks for risk prediction on electronic health records, pp. 647–656. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3394486.3403107

  16. Ma, F., Chitta, R., Zhou, J., You, Q., Sun, T., Gao, J.: Dipole: diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 1903–1911. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3097983.3098088

  17. Ma, L., et al.: AdaCare: explainable clinical health status representation learning via scale-adaptive feature extraction and recalibration. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 February 2020, pp. 825–832. AAAI Press (2020). https://aaai.org/ojs/index.php/AAAI/article/view/5427

  18. Ma, L., et al.: CovidCare: transferring knowledge from existing EMR to emerging epidemic for interpretable prognosis (2020)

    Google Scholar 

  19. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2013)

    Google Scholar 

  20. Mandel, J.C., Kreda, D.A., Mandl, K.D., Kohane, I.S., Ramoni, R.B.: SMART on FHIR: a standards-based, interoperable apps platform for electronic health records. J. Am. Med. Inform. Assoc. 23(5), 899–908 (2016)

    Article  Google Scholar 

  21. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)

    Google Scholar 

  22. Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation (2013)

    Google Scholar 

  23. Nguyen, P., Tran, T., Wickramasinghe, N., Venkatesh, S.: \(\texttt{Deepr}\): a convolutional net for medical records. IEEE J. Biomed. Health Inform. 21(1), 22–30 (2017). https://doi.org/10.1109/JBHI.2016.2633963

    Article  Google Scholar 

  24. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). http://www.aclweb.org/anthology/D14-1162

  25. Pollard, T.J., Johnson, A.E.W., Raffa, J.D., Celi, L.A., Mark, R.G., Badawi, O.: The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5(1), 180178 (2018). https://doi.org/10.1038/sdata.2018.178

    Article  Google Scholar 

  26. Rajkomar, A., et al.: Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1(1), 18 (2018). https://doi.org/10.1038/s41746-018-0029-1. http://www.nature.com/articles/s41746-018-0029-1

  27. Schönemann, P.H.: A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1), 1–10 (1966). https://doi.org/10.1007/BF02289451

  28. Shang, J., Xiao, C., Ma, T., Li, H., Sun, J.: GameNet: graph augmented memory networks for recommending medication combination. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, 27 January–1 February 2019, pp. 1126–1133. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33011126

  29. Søgaard, A., Ruder, S., Vulić, I.: On the limitations of unsupervised bilingual dictionary induction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, (Volume 1: Long Papers), pp. 778–788. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/P18-1072. https://www.aclweb.org/anthology/P18-1072

  30. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html

  31. Tang, S., Davarmanesh, P., Song, Y., Koutra, D., Sjoding, M.W., Wiens, J.: Democratizing EHR analyses with FIDDLE: a flexible data- driven preprocessing pipeline for structured clinical data. J. Am. Med. Inform. Assoc. 14 (2020)

    Google Scholar 

  32. Wojcik, B.E., Stein, C.R., Devore, R.B., Hassell, L.H.: The challenge of mapping between two medical coding systems. Mil. Med. 171(11), 1128–1136 (2006). https://doi.org/10.7205/MILMED.171.11.1128. https://academic.oup.com/milmed/article/171/11/1128-1136/4578127

  33. Xiao, C., Choi, E., Sun, J.: Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 25(10), 1419–1428 (2018). https://doi.org/10.1093/jamia/ocy068

    Article  Google Scholar 

  34. Zhang, C., Gao, X., Ma, L., Wang, Y., Wang, J., Tang, W.: GRASP: generic framework for health status representation learning based on incorporating knowledge from similar patients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1, pp. 715–723 (2021). https://ojs.aaai.org/index.php/AAAI/article/view/16152

  35. Zhang, H., Dullerud, N., Seyyed-Kalantari, L., Morris, Q., Joshi, S., Ghassemi, M.: An empirical framework for domain generalization in clinical settings. In: Proceedings of the Conference on Health, Inference, and Learning, CHIL 2021, pp. 279–290. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3450439.3451878

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jimeng Sun .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3354 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Z., Xiao, C., Glass, L.M., Liebovitz, D.M., Sun, J. (2023). AutoMap: Automatic Medical Code Mapping for Clinical Prediction Model Deployment. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13714. Springer, Cham. https://doi.org/10.1007/978-3-031-26390-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26390-3_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26389-7

  • Online ISBN: 978-3-031-26390-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics