Skip to main content

Diagnostic Code Group Prediction by Integrating Structured and Unstructured Clinical Data

  • Conference paper
  • First Online:
Big Data Analytics (BDA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13147))

Included in the following conference series:

  • 651 Accesses

Abstract

Diagnostic coding is a process by which written, verbal and other patient-case related documentation are used for enabling disease prediction, accurate documentation, and insurance settlements. It is a prevalently manual process even in countries that have successfully adopted Electronic Health Record (EHR) systems. The problem is exacerbated in developing countries where widespread adoption of EHR systems is still not at par with Western counterparts. EHRs contain a wealth of patient information embedded in numerical, text, and image formats. A disease prediction model that exploits all this information, enabling accurate and faster diagnosis would be quite beneficial. We address this challenging task by proposing mixed ensemble models consisting of boosting and deep learning architectures for the task of diagnostic code group prediction. The models are trained on a dataset created by integrating features from structured (lab test reports) as well as unstructured (clinical text) data. We analyze the proposed model’s performance on MIMIC-III, an open dataset of clinical data using standard multi-label metrics. Empirical evaluations underscored the significant performance of our approach for this task, compared to state-of-the-art works which rely on a single data source. Our novelty lies in effectively integrating relevant information from both data sources thereby ensuring larger ICD-9 code coverage, handling the inherent class imbalance, and adopting a novel approach to form the ensemble models.

A. Prabhakar and S. Srinivasan—Equal contribution.

G. S. Krishnan—Author contributed to this work as part of Ph.D. research in HALE Lab, NITK.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ayyar, S., Don, O., Iv, W.: Tagging patient notes with icd-9 codes. In: Proceedings of the 29th Conference on Neural Information Processing Systems, pp. 1–8 (2016)

    Google Scholar 

  2. Huang, J., Osorio, C., Sy, L.W.: An empirical evaluation of deep learning for icd-9 code assignment using mimic-iii clinical notes. Comput. Methods Programs Biomed. 177, 141–153 (2019)

    Google Scholar 

  3. Perotte, A., et al.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inf. Assoc. JAMIA 21 (2013)

    Google Scholar 

  4. Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor ai: predicting clinical events via recurrent neural networks. JMLR Workshop and Conf. Proc. 56, 301–318 (2016)

    Google Scholar 

  5. Purushotham, S., Meng, C., Che, Z., Liu, Y.: Benchmarking deep learning models on large healthcare datasets. J. Biomed. Inf. 83 (2018)

    Google Scholar 

  6. Gangavarapu, T., Jayasimha, A., Krishnan, G.S., S., S.K.: Predicting icd-9 code groups with fuzzy similarity based supervised multi-label classification of unstructured clinical nursing notes. Knowl. Based Syst. 190, 105321 (2020)

    Google Scholar 

  7. Lipton, Z.C., Kale, D.C., Elkan, C., Wetzel, R.: Learning to diagnose with LSTM recurrent neural networks. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico(2016)

    Google Scholar 

  8. Xie, P., Xing, E.: A neural architecture for automated ICD coding. In: Proceedings of the 56th Annual Meeting of the ACL. ACL, pp. 1066-1076 (2018)

    Google Scholar 

  9. Krishnan, G.S., Kamath S.S.: Ontology-driven text feature modeling for disease prediction using unstructured radiological notes. Computación y Sistemas 23(3) (2019)

    Google Scholar 

  10. Larkey, L.S., Croft, W.B.: Combining classifiers in text categorization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 289-297 (1996)

    Google Scholar 

  11. Prakash, A., et al.: Condensed memory networks for clinical diagnostic inferencing. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  12. Sukhbaatar, S., Szlam, A., Weston, J., Fergus, R.: End-to-end memory networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. Vol. 2, pp. 2440–2448. NIPS’15, MIT Press, Cambridge, MA, USA (2015)

    Google Scholar 

  13. Akshara, P., Shidharth, S., Krishnan, G.S., Kamath, S.: Integrating structured and unstructured patient data for icd9 disease code group prediction. In: 8th ACM IKDD CODS and 26th COMAD, p. 436. CODS COMAD 2021, Association for Computing Machinery, New York, NY, USA (2021)

    Google Scholar 

  14. Johnson, A.E., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)

    Google Scholar 

  15. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)

    Google Scholar 

  16. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: Catboost: unbiased boosting with categorical features. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 6639–6649. NIPS’18, Curran Associates Inc., Red Hook, NY, USA (2017)

    Google Scholar 

  17. Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, p. 3149–3157. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)

    Google Scholar 

  18. Vaswani, A., et al.: Attention is All You Need, pp. 6000–6010. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)

    Google Scholar 

  19. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 3859–3869. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)

    Google Scholar 

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems. vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012)

    Google Scholar 

  21. Sluban, B., Lavrac, N.: Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160, 120–131 (2015)

    Google Scholar 

  22. Wu, X.-Z., Zhou, Z.-H.: A unified view of multi-label performance measures. In: Proceedings of the 34th International Conference on Machine Learning. Vol. 70, pp. 3780–3788. ICML’17, JMLR.org, Sydney, NSW, Australia (2017)

    Google Scholar 

  23. Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)

    Google Scholar 

  24. Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis. IEEE J. Biomed. Health Inf. 22(5), 1589–1604 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akshara Prabhakar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Prabhakar, A., Srinivasan, S., Krishnan, G.S., Kamath, S.S. (2021). Diagnostic Code Group Prediction by Integrating Structured and Unstructured Clinical Data. In: Srirama, S.N., Lin, J.CW., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds) Big Data Analytics. BDA 2021. Lecture Notes in Computer Science(), vol 13147. Springer, Cham. https://doi.org/10.1007/978-3-030-93620-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93620-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93619-8

  • Online ISBN: 978-3-030-93620-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics