ICD Code Retrieval: Novel Approach for Assisted Disease Classification

Rizzo, Stefano Giovanni; Montesi, Danilo; Fabbri, Andrea; Marchesini, Giulio

doi:10.1007/978-3-319-21843-4_12

Stefano Giovanni Rizzo⁶,
Danilo Montesi⁶,
Andrea Fabbri⁷ &
…
Giulio Marchesini⁸

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9162))

Included in the following conference series:

International Conference on Data Integration in the Life Sciences

985 Accesses
6 Citations

Abstract

The task of assigning classification codes to short medical text is a hard text classification problem, especially when the set of possible codes is as big as the ICD-9-CM set. The problem, which has been only partially tamed for a subset of ICD-9-CM, becomes even harder in real world applications, where the labeled data are scarce and noisy. In this paper we first show the ineffectivenesss of current Text Classification algorithms on large datasets, then we present a novel incremental approach to clinical Text Classification, which overcomes the low accuracy problem through the top-K retrieval, exploits Transfer Learning techniques in order to expand a skewed dataset and improves the overall accuracy over time, learning from user selection.

The presentation of this work has been partly funded by FIRB project Information monitoring, propagation analysis and community detection in Social Network Sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://lucene.apache.org.

References

Results: Medical nlp challenge, computational medicine center (2007). https://web.archive.org/web/20080111141103/, http://www.computationalmedicine.org/challenge/res.php
Chen, H., Dumais, S.: Bringing order to the web: Automatically categorizing search results. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 145–152. ACM (2000)
Google Scholar
Crammer, K., Dredze, M., Ganchev, K., Talukdar, P.P., Carroll, S.: Automatic code assignment to medical text. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp. 129–136. Association for Computational Linguistics (2007)
Google Scholar
Debole, F., Sebastiani, F.: An analysis of the relative hardness of reuters-21578 subsets. J. Am. Soc. Inf. Sci. Technol. 56(6), 584–596 (2005)
Article Google Scholar
Fabbri, A., Montesi, D., Rizzo, S.G.: ITA50 corpus of 50 thousands icd-9 labeled medical text (2015). http://smartdata.cs.unibo.it/ITA50/
Fan, R.E., Lin, C.J.: A study on threshold selection for multi-labelclassification. Department of Computer Science, National Taiwan University,pp. 1–23 (2007)
Google Scholar
Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM codingsystems. BMC Bioinform. 9(Suppl 3), S10 (2008)
Article Google Scholar
Goldstein, I., Arzumtsyan, A., Uzuner, Ö.: Three approaches to automatic assignment of icd-9-cm codes to radiology reports. In: AMIA Annual Symposium Proceedings. vol. 2007, p. 279. American Medical Informatics Association (2007)
Google Scholar
Jansen, B.J., Spink, A.: How are we searching the world wide web? a comparison of nine search engine transaction logs. Inf. Process. Manag. 42(1), 248–263 (2006)
Article Google Scholar
Larkey, L.S., Croft, W.B.: Automatic assignment of icd9 codes to discharge summaries. Technical report (1995)
Google Scholar
LIU, T.Y., Yang, Y., WAN, H., ZENG, H.J., CHEN, Z., MA, W.Y.: Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explor. Newsl. 7(1), 36–43 (2005)
Article Google Scholar
Liu, T.Y., Yang, Y., Wan, H., Zhou, Q., Gao, B., Zeng, H.J., Chen, Z., Ma, W.Y.: An experimental study on large-scale web categorization. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 1106–1107. ACM (2005)
Google Scholar
Lussier, Y.A., Shagina, L., Friedman, C.: Automating icd-9-cm encoding using medical language processing: A feasibility study. In: Proceedings of the AMIA Symposium, p. 1072. American Medical Informatics Association (2000)
Google Scholar
Martinez-Alvarez, M., Yahyaei, S., Roelleke, T.: Semi-automatic Document Classification: Exploiting Document Difficulty. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 468–471. Springer, Heidelberg (2012)
Chapter Google Scholar
Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI-99 Workshop on Machine Learning for Information Filtering, vol. 1, pp. 61–67 (1999)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Pestian, J.P., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp. 97–104. Association for Computational Linguistics (2007)
Google Scholar
Sandu Popa, I., Zeitouni, K., Gardarin, G., Nakache, D., Métais, E.: Text categorization for multi-label documents and many categories. In: Twentieth IEEE International Symposium on Computer-Based Medical Systems. CBMS 2007, pp. 421–426. IEEE (2007)
Google Scholar
Sujeevan, A., Youns, B.: Semi-structured document categorization with a semantic kernel. Pattern Recogn. 42(9), 2067–2076 (2009)
Article Google Scholar
Suominen, H., Ginter, F., Pyysalo, S., Airola, A., Pahikkala, T., Salanter, S., Salakoski, T.: Machine learning to automate the assignment of diagnosis codes to free-text radiology reports: a method description. In: Proceedings of the ICML/UAI/COLT Workshop on Machine Learning for Health-Care Applications (2008)
Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49. ACM (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Bologna, Mura Anteo Zamboni 7, 40127, Bologna, Italy
Stefano Giovanni Rizzo & Danilo Montesi
Local Public Health Unit of Forlì, Emergency Department, Hospital Morgagni-Pierantoni, via Forlanini 34, 40121, Forlì, Italy
Andrea Fabbri
Department of Medicine, University of Bologna, via Massarenti 9, 40138, Bologna, Italy
Giulio Marchesini

Authors

Stefano Giovanni Rizzo
View author publications
You can also search for this author in PubMed Google Scholar
Danilo Montesi
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Fabbri
View author publications
You can also search for this author in PubMed Google Scholar
Giulio Marchesini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefano Giovanni Rizzo .

Editor information

Editors and Affiliations

USC Stevens Neuroimaging and Informatics Institute, Los Angeles, California, USA
Naveen Ashish
USC Information Sciences Institute, Marina del Rey, California, USA
Jose-Luis Ambite

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rizzo, S.G., Montesi, D., Fabbri, A., Marchesini, G. (2015). ICD Code Retrieval: Novel Approach for Assisted Disease Classification. In: Ashish, N., Ambite, JL. (eds) Data Integration in the Life Sciences. DILS 2015. Lecture Notes in Computer Science(), vol 9162. Springer, Cham. https://doi.org/10.1007/978-3-319-21843-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-21843-4_12
Published: 08 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21842-7
Online ISBN: 978-3-319-21843-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics