AutoMap: Automatic Medical Code Mapping for Clinical Prediction Model Deployment

Wu, Zhenbang; Xiao, Cao; Glass, Lucas M.; Liebovitz, David M.; Sun, Jimeng

doi:10.1007/978-3-031-26390-3_29

Zhenbang Wu¹³,
Cao Xiao¹⁴,
Lucas M. Glass¹⁵,
David M. Liebovitz¹⁶ &
…
Jimeng Sun¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13714))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1095 Accesses

Abstract

Given a deep learning model trained on data from a source hospital, how to deploy the model to a target hospital automatically? How to accommodate heterogeneous medical coding systems across different hospitals? Standard approaches rely on existing medical code mapping tools, which have several practical limitations.

To tackle this problem, we propose AutoMap to automatically map the medical codes across different EHR systems in a coarse-to-fine manner: (1) Ontology-level Alignment: We leverage the ontology structure to learn a coarse alignment between the source and target medical coding systems; (2) Code-level Refinement: We refine the alignment at a fine-grained code level for the downstream tasks using a teacher-student framework.

We evaluate AutoMap using several deep learning models with two real-world EHR datasets: eICU and MIMIC-III. Results show that AutoMap achieves relative improvements up to 3.9% (AUC-ROC) and 8.7% (AUC-PR) for mortality prediction, and up to 4.7% (AUC-ROC) and 3.7% (F1) for length-of-stay estimation. Further, we show that AutoMap can provide accurate mapping across coding systems. Lastly, we demonstrate that AutoMap can adapt to two challenging scenarios: (1) mapping between completely different coding systems and (2) between completely different hospitals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optimising the paradigms of human AI collaborative clinical coding

Article Open access 19 December 2024

A Deep Learning Based Approach to Automate Clinical Coding of Electronic Health Records

JLAN: medical code prediction via joint learning attention networks and denoising mechanism

Article Open access 13 December 2021

Notes

1.
In practice, the isometry requirement will not hold exactly, but it can be assumed to hold approximately, or the problem of mapping two code embedding spaces without supervision would be impossible.
2.
https://github.com/zzachw/AutoMap.

References

Artetxe, M., Labaka, G., Agirre, E.: Learning bilingual word embeddings with (almost) no bilingual data. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, (Volume 1: Long Papers), pp. 451–462. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1042. https://www.aclweb.org/anthology/P17-1042
Artetxe, M., Labaka, G., Agirre, E.: A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, (Volume 1: Long Papers), pp. 789–798. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/P18-1073. https://www.aclweb.org/anthology/P18-1073
Birkhead, G.S., Klompas, M., Shah, N.R.: Uses of electronic health records for public health surveillance to advance public health. Ann. Rev. Public Health 36(1), 345–359 (2015). https://doi.org/10.1146/annurev-publhealth-031914-122747, pMID: 25581157
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue), D267–270 (2004)
Google Scholar
Choi, E., Bahadori, M.T., Kulas, J.A., Schuetz, A., Stewart, W.F., Sun, J.: Retain: an interpretable predictive model for healthcare using reverse time attention mechanism. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS 2016, pp. 3512–3520. Curran Associates Inc., Red Hook (2016)
Google Scholar
Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor AI: predicting clinical events via recurrent neural networks. In: Doshi-Velez, F., Fackler, J., Kale, D., Wallace, B., Wiens, J. (eds.) Proceedings of the 1st Machine Learning for Healthcare Conference. Proceedings of Machine Learning Research, vol. 56, pp. 301–318. PMLR, Northeastern University, Boston, MA, USA (2016). https://proceedings.mlr.press/v56/Choi16.html
Choi, E., et al.: Multi-layer representation learning for medical concepts. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 1495–1504. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2939672.2939823
Choi, E., et al.: Learning the graphical structure of electronic health records with graph convolutional transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 606–613 (2020). https://doi.org/10.1609/aaai.v34i01.5400
Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data (2018)
Google Scholar
Gupta, P., Malhotra, P., Narwariya, J., Vig, L., Shroff, G.: Transfer learning for clinical time series analysis using deep neural networks (2019)
Google Scholar
Harutyunyan, H., Khachatrian, H., Kale, D.C., Ver Steeg, G., Galstyan, A.: Multitask learning and benchmarking with clinical time series data. Sci. Data 6(1) (2019). https://doi.org/10.1038/s41597-019-0103-9
Hripcsak, G., et al.: Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud. Health Technol. Inform. 216, 574–578 (2015)
Google Scholar
Johnson, A.E., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3, 160035 (2016)
Article Google Scholar
Li, Y., et al.: BEHRT: transformer for electronic health records. Sci. Rep. 10(1), 7155 (2020). https://doi.org/10.1038/s41598-020-62922-y
Luo, J., Ye, M., Xiao, C., Ma, F.: HiTANet: hierarchical time-aware attention networks for risk prediction on electronic health records, pp. 647–656. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3394486.3403107
Ma, F., Chitta, R., Zhou, J., You, Q., Sun, T., Gao, J.: Dipole: diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 1903–1911. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3097983.3098088
Ma, L., et al.: AdaCare: explainable clinical health status representation learning via scale-adaptive feature extraction and recalibration. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 February 2020, pp. 825–832. AAAI Press (2020). https://aaai.org/ojs/index.php/AAAI/article/view/5427
Ma, L., et al.: CovidCare: transferring knowledge from existing EMR to emerging epidemic for interpretable prognosis (2020)
Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2013)
Google Scholar
Mandel, J.C., Kreda, D.A., Mandl, K.D., Kohane, I.S., Ramoni, R.B.: SMART on FHIR: a standards-based, interoperable apps platform for electronic health records. J. Am. Med. Inform. Assoc. 23(5), 899–908 (2016)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Google Scholar
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation (2013)
Google Scholar
Nguyen, P., Tran, T., Wickramasinghe, N., Venkatesh, S.: $\texttt{Deepr}$: a convolutional net for medical records. IEEE J. Biomed. Health Inform. 21(1), 22–30 (2017). https://doi.org/10.1109/JBHI.2016.2633963
Article Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). http://www.aclweb.org/anthology/D14-1162
Pollard, T.J., Johnson, A.E.W., Raffa, J.D., Celi, L.A., Mark, R.G., Badawi, O.: The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5(1), 180178 (2018). https://doi.org/10.1038/sdata.2018.178
Article Google Scholar
Rajkomar, A., et al.: Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1(1), 18 (2018). https://doi.org/10.1038/s41746-018-0029-1. http://www.nature.com/articles/s41746-018-0029-1
Schönemann, P.H.: A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1), 1–10 (1966). https://doi.org/10.1007/BF02289451
Shang, J., Xiao, C., Ma, T., Li, H., Sun, J.: GameNet: graph augmented memory networks for recommending medication combination. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, 27 January–1 February 2019, pp. 1126–1133. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33011126
Søgaard, A., Ruder, S., Vulić, I.: On the limitations of unsupervised bilingual dictionary induction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, (Volume 1: Long Papers), pp. 778–788. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/P18-1072. https://www.aclweb.org/anthology/P18-1072
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html
Tang, S., Davarmanesh, P., Song, Y., Koutra, D., Sjoding, M.W., Wiens, J.: Democratizing EHR analyses with FIDDLE: a flexible data- driven preprocessing pipeline for structured clinical data. J. Am. Med. Inform. Assoc. 14 (2020)
Google Scholar
Wojcik, B.E., Stein, C.R., Devore, R.B., Hassell, L.H.: The challenge of mapping between two medical coding systems. Mil. Med. 171(11), 1128–1136 (2006). https://doi.org/10.7205/MILMED.171.11.1128. https://academic.oup.com/milmed/article/171/11/1128-1136/4578127
Xiao, C., Choi, E., Sun, J.: Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 25(10), 1419–1428 (2018). https://doi.org/10.1093/jamia/ocy068
Article Google Scholar
Zhang, C., Gao, X., Ma, L., Wang, Y., Wang, J., Tang, W.: GRASP: generic framework for health status representation learning based on incorporating knowledge from similar patients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1, pp. 715–723 (2021). https://ojs.aaai.org/index.php/AAAI/article/view/16152
Zhang, H., Dullerud, N., Seyyed-Kalantari, L., Morris, Q., Joshi, S., Ghassemi, M.: An empirical framework for domain generalization in clinical settings. In: Proceedings of the Conference on Health, Inference, and Learning, CHIL 2021, pp. 279–290. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3450439.3451878

Download references

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, Champaign, USA
Zhenbang Wu & Jimeng Sun
Amplitude, San Francisco, USA
Cao Xiao
IQVIA, Durham, USA
Lucas M. Glass
Northwestern University, Evanston, USA
David M. Liebovitz

Authors

Zhenbang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Cao Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Lucas M. Glass
View author publications
You can also search for this author in PubMed Google Scholar
David M. Liebovitz
View author publications
You can also search for this author in PubMed Google Scholar
Jimeng Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jimeng Sun .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d'Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3354 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Z., Xiao, C., Glass, L.M., Liebovitz, D.M., Sun, J. (2023). AutoMap: Automatic Medical Code Mapping for Clinical Prediction Model Deployment. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13714. Springer, Cham. https://doi.org/10.1007/978-3-031-26390-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-26390-3_29
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26389-7
Online ISBN: 978-3-031-26390-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

AutoMap: Automatic Medical Code Mapping for Clinical Prediction Model Deployment