Abstract
This paper addresses the issue of semi-automatic patient discharge summaries encoding into medical classifications such as ICD-9-CM. The methods detailed in this paper focus on symbolic approaches which allow the processing of unannotated corpora without any machine learning. The first method is based on the morphological analysis (MA) of medical terms extracted with hand-crafted linguistic resources. The second one (ELP) relies on the automatic extraction of variants of ICD-9-CM code labels. Each method was evaluated on a set of 19,692 discharge summaries in French from a General Internal Medicine unit. Depending on the number of suggested classes, the MA method resulted in a maximal F-measure of 28.00 and a highest recall of 46.13%. The best F-measure for the second method was 29.43 while the maximal recall was 52.74%. Both methods were then combined. The best recall increased to 60.21% and the maximal F-measure reached 31.64.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ananiadou, S., McNaught, J.: Introduction to text mining in biology. In: Text Mining for Biology and Biomedicine, pp. 1–12. Artech House Books (2006)
Ceusters, W., Michel, C., Penson, D., Mauclet, E.: Semi-automated encoding of diagnoses and medical procedures combining ICD-9-CM with computational-linguistic tools. Ann. Med. Milit. Belg. 8(2), 53–58 (1994)
Zweigenbaum, P., Consortium Menelas: Menelas: Coding and information retrieval from natural language patient discharge summaries. In: Laires, M., Ladeira, M., Christensen, J. (eds.) Advances in Health Telematics, pp. 82–89. IOS Press, Amsterdam (1995)
Friedman, C., Shagina, L., Lussier, Y., Hripcsak, G.: Automated encoding of clinical documents based on natural language processing. Journal of the American Medical Informatics Association 11(5), 392–402 (2004)
Pakhomov, S.V., Buntrock, J.D., Chute, C.G.: Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques. JAMIA 13(5), 516–525 (2006)
Pestian, J.P., Brew, C., Matykiewicz, P., Hovermale, D.J., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, ACL, Prague, Czech Republic, pp. 97–104 (2007)
Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM coding systems. BMC Bioinformatics 9(Suppl. 3), S10 (2008)
Goldstein, I., Arzrumtsyan, A., Uzuner, O.: Three approaches to automatic assignment of ICD-9-CM codes to radiology reports. In: Proceedings of AMIA Annual Symposium, pp. 279–283 (2007)
Pereira, S., Névéol, A., Massari, P., Joubert, M., Darmoni, S.: Construction of a semi-automated ICD-10 coding help system to optimize medical and economic coding. Studies in Health Technology and Informatics 124, 845–850 (2006)
Medori, J.: From free text to ICD: development of a coding help. In: Proc. of the 1st Louhi Workshop on Text and Data Mining of Health Documents, Turku (2008)
Paumier, S.: De la reconnaissance de formes linguistiques à l’analyse syntaxique. PhD thesis, Université de Marne-la-Vallée (2003)
Deléger, L., Namer, F., Zweigenbaum, P.: Morphosemantic parsing of medical compound words: transferring a french analyzer to english. International Journal of Medical Informatics 78(Suppl. 1), S48–S55 (2009)
Namer, F.: Automatiser l’analyse morpho-sémantique non affixale: le système DériF. Cahiers de grammaire 28, 31–48 (2003)
Kevers, L.: Indexation semi-automatique de textes: thésaurus et transducteurs. In: Actes de la 6e Conférence Francophone en Recherche d’Information et Applications, Presqu’île de Giens, France, pp. 151–167 (May 2009)
Kevers, L., Mantrach, A., Fairon, C., Bersini, H., Saerens, M.: Classification supervisée hybride par motifs lexicaux étendus et classificateurs SVM. In: Actes Des 10e Journées Internationales D’analyse Des Données Textuelles, Rome (June 2010)
Porter, M.F.: An algorithm for suffix stripping. In: Readings in Information Retrieval, pp. 313–316. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Funk, M.E., Reid, C.A., McGoogan, L.S.: Indexing consistency in MEDLINE. Bulletin of the Medical Library Association 71(2), 176–183 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kevers, L., Medori, J. (2010). Symbolic Classification Methods for Patient Discharge Summaries Encoding into ICD. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds) Advances in Natural Language Processing. NLP 2010. Lecture Notes in Computer Science(), vol 6233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14770-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-14770-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14769-2
Online ISBN: 978-3-642-14770-8
eBook Packages: Computer ScienceComputer Science (R0)