Deep Learning for ICD Coding: Looking for Medical Concepts in Clinical Documents in English and in French

Miftahutdinov, Zulfat; Tutubalina, Elena

doi:10.1007/978-3-319-98932-7_19

Zulfat Miftahutdinov^22,23 &
Elena Tutubalina^22,23

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11018))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1161 Accesses
2 Citations
3 Altmetric

Abstract

Medical Concept Coding (MCD) is a crucial task in biomedical information extraction. Recent advances in neural network modeling have demonstrated its usefulness in the task of natural language processing. Modern framework of sequence-to-sequence learning that was initially used for recurrent neural networks has been shown to provide powerful solution to tasks such as Named Entity Recognition or Medical Concept Coding. We have addressed the identification of clinical concepts within the International Classification of Diseases version 10 (ICD-10) in two benchmark data sets of death certificates provided for the task 1 in the CLEF eHealth shared task 2017. A proposed architecture combines ideas from recurrent neural networks and traditional text retrieval term weighting schemes. We found that our models reach accuracy of 75% and 86% as evaluated by the F-measure on the CépiDc corpus of French texts and on the CDC corpus of English texts, respectfully. The proposed models can be employed for coding electronic medical records with ICD codes including diagnosis and procedure codes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/dartrevan/clef_2017.

References

Pradhan, S., Elhadad, N., Chapman, W.W., Manandhar, S., Savova, G.: SemEval-2014 task 7: analysis of clinical text. In: SemEval@ COLING, pp. 54–62 (2014)
Google Scholar
Dougherty, M., Seabold, S., White, S.E.: Study reveals hard facts on CAC. J. AHIMA 84(7), 54–56 (2013)
Google Scholar
Stanfill, M.H., Williams, M., Fenton, S.H., Jenders, R.A., Hersh, W.R.: A systematic literature review of automated clinical coding and classification systems. J. Am. Med. Inform. Assoc. 17(6), 646–651 (2010)
Article Google Scholar
Miftahutdinov, Z., Tutubalina, E.: KFU at CLEF ehealth 2017 task 1: ICD-10 coding of English death certificates with recurrent neural networks. In: CEUR Workshop Proceedings, vol. 1866 (2017)
Google Scholar
Karimi, S., Dai, X., Hassanzadeh, H., Nguyen, A.: Automatic diagnosis coding of radiology reports: a comparison of deep learning and conventional classification methods. In: BioNLP 2017, pp. 328–332 (2017)
Google Scholar
Duarte, F., Martins, B., Pinto, C.S., Silva, M.J.: Deep neural models for ICD-10 coding of death certificates and autopsy reports in free-text. J. Biomed. Inform. 80, 64–77 (2018)
Article Google Scholar
Zhang, Y., et al.: Uth\_CCB: a report for SemEval 2014-task 7 analysis of clinical text. In: SemEval 2014, p. 802 (2014)
Google Scholar
Ghiasvand, O., Kate, R.J.: UWM: disorder mention extraction from clinical text using CRFS and normalization using learned edit distance patterns. In: SemEval@ COLING, pp. 828–832 (2014)
Google Scholar
Van Mulligen, E., Afzal, Z., Akhondi, S.A., Vo, D., Kors, J.A.: Erasmus MC at CLEF eHealth 2016: concept recognition and coding in French texts. In: CLEF (2016)
Google Scholar
Cabot, C., Soualmia, L.F., Dahamna, B., Darmoni, S.J.: SIBM at CLEF eHealth evaluation lab 2016: extracting concepts in French medical yexts with ECMT and CIMIND. In: CLEF (2016)
Google Scholar
Mottin, L., Gobeill, J., Mottaz, A., Pasche, E., Gaudinat, A., Ruch, P.: BiTeM at CLEF eHealth evaluation lab 2016 task 2: multilingual information extraction. In: CEUR Workshop Proceedings, vol. 1609, pp. 94–102 (2016)
Google Scholar
Dermouche, M., Looten, V., Flicoteaux, R., Chevret, S., Velcin, J., Taright, N.: ECSTRA-INSERM@ CLEF eHealth2016-task 2: ICD10 code extraction from death certificates. In: CLEF (2016)
Google Scholar
Zweigenbaum, P., Lavergne, T.: LIMSI ICD10 coding experiments on CépiDC death certificate statements. In: CLEF (2016)
Google Scholar
Leaman, R., Khare, R., Lu, Z.: NCBI at 2013 shARe/CLEF ehealth shared task: disorder normalization in clinical notes with DNorm. Radiology 42(21.1), 1–941 (2011)
Google Scholar
Suominen, H., et al.: Overview of the ShARe/CLEF eHealth evaluation lab 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 212–231. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40802-1_24
Chapter Google Scholar
Leaman, R., Islamaj Doğan, R., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Article Google Scholar
Névéol, A., et al.: CLEF ehealth 2017 multilingual information extraction task overview: ICD10 coding of death certificates in English and French. In: CLEF 2017 Evaluation Labs and Workshop: Online Working Notes, CEUR-WS (2017)
Google Scholar
Névéol, A., et al.: Clinical information extraction at the CLEF eHealth evaluation lab 2016. In: Proceedings of CLEF 2016 Evaluation Labs and Workshop: Online Working Notes, CEUR-WS, September 2016 (2016)
Google Scholar
Zweigenbaum, P., Lavergne, T.: Hybrid methods for ICD-10 coding of death certificates. In: EMNLP 2016, p. 96 (2016)
Google Scholar
Cabot, C., Soualmia, L.F., Darmoni, S.J.: SIBM at CLEF ehealth evaluation lab 2017: multilingual information extraction with CIM-IND. In: CLEF (2017)
Google Scholar
Tutubalina, E., Miftahutdinov, Z., Nikolenko, S., Malykh, V.: Medical concept normalization in social media posts with recurrent neural networks. J. Biomed. Inform. 84, 93–102 (2018)
Article Google Scholar
Rios, A., Kavuluru, R.: EMR coding with semi-parametric multi-head matching networks. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol. 1, pp. 2081–2091 (2018)
Google Scholar
Schuemie, M.J., Kors, J.A., Mons, B.: Word sense disambiguation in the biomedical domain: an overview. J. Comput. Biol. 12(5), 554–565 (2005)
Article Google Scholar
Névéol, A., et al.: CLEF eHealth 2017 Multilingual information extraction task overview: ICD10 coding of death certificates in English and French. In: Working Notes of Conference and Labs of the Evaluation (CLEF) Forum, CEUR Workshop Proceedings (2017)
Google Scholar
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Article Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Article Google Scholar
Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2016)
Article MathSciNet Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Proc. 45(11), 2673–2681 (1997)
Article Google Scholar
Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005). https://doi.org/10.1007/11550907_126
Chapter Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM networks. In: Proceedings of IEEE International Joint Conference on Neural Networks, IJCNN 2005, vol. 4, pp. 2047–2052. IEEE (2005)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Proc. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Miftahutdinov, Z., Tutubalina, E., Tropsha, A.: Identifying disease-related expressions in reviews using conditional random fields. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies Dialog, vol. 1, pp. 155–167 (2017)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Google Scholar
Moen, S., Ananiadou, T.S.S.: Distributional semantics resources for biomedical text processing (2013)
Google Scholar
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Kinga, D., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar

Download references

Acknowledgements

This work was supported by the Russian Science Foundation grant no. 18-11-00284. The authors are grateful to Prof. Alexander Tropsha and Valentin Malykh for useful discussions about this study.

Author information

Authors and Affiliations

Kazan (Volga Region) Federal University, Kazan, Russia
Zulfat Miftahutdinov & Elena Tutubalina
Neuromation OU, 10111, Tallinn, Estonia
Zulfat Miftahutdinov & Elena Tutubalina

Authors

Zulfat Miftahutdinov
View author publications
You can also search for this author in PubMed Google Scholar
Elena Tutubalina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elena Tutubalina .

Editor information

Editors and Affiliations

Aix-Marseille University, Marseille Cedex 20, France
Patrice Bellot
Virtual University of Tunis, Tunis, Tunisia
Chiraz Trabelsi
Systèmes d’informations, Big Data et Rec, Institut de Recherche en Informatique de, Toulouse Cedex 04, France
Josiane Mothe
Department of Computer Science, University of Huddersfield, Huddersfield, United Kingdom
Fionn Murtagh
DIRO, Universite de Montreal, Montreal, Québec, Canada
Jian Yun Nie
Pierre and Marie Curie University, Paris Cedex 05, France
Laure Soulier
Université d'Avignon et des Pays de, Avignon, France
Eric SanJuan
Department of Information Engineering, University of Padua, Padua, Padova, Italy
Linda Cappellato
University of Padua, Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miftahutdinov, Z., Tutubalina, E. (2018). Deep Learning for ICD Coding: Looking for Medical Concepts in Clinical Documents in English and in French. In: Bellot, P., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2018. Lecture Notes in Computer Science(), vol 11018. Springer, Cham. https://doi.org/10.1007/978-3-319-98932-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-98932-7_19
Published: 15 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98931-0
Online ISBN: 978-3-319-98932-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics