Skip to main content

Unstructured Data in Predictive Process Monitoring: Lexicographic and Semantic Mapping to ICD-9-CM Codes for the Home Hospitalization Service

  • Conference paper
  • First Online:
AIxIA 2021 – Advances in Artificial Intelligence (AIxIA 2021)

Abstract

The large availability of hospital administrative and clinical data has encouraged the application of Process Mining techniques to the healthcare domain. Predictive Process Monitoring techniques can be used in order to learn from these data related to past historical executions and predict the future of incomplete cases. However, some of these data, possibly the most informative ones, are often available in natural language text, while structured information—extracted from these data—would be more beneficial for training predictive models.

In this paper we focus on the scenario of the Home Hospitalization Service, supporting the team in making decisions on the home hospitalization of a patient, by predicting whether it is likely that a new patient will successfully undergo home hospitalization. We aim at investigating whether, in this scenario, we can take advantage of mapping unstructured textual diagnoses, reported by the doctor in the Emergency Department, into structured information, as the standardized disease ICD-9-CM codes, to provide more accurate predictions. To this aim, we devise two different approaches involving respectively lexicographic and semantic distance for mapping textual diagnoses in ICD-9-CM codes and leverage the structured information for making predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.cdc.gov/nchs/icd/icd9cm.htm.

  2. 2.

    https://www.salute.gov.it/portale/documentazione/p6_2_2_1.jsp?lingua=italiano &id=2251.

  3. 3.

    We used snowball stemmer from nlkt package https://www.nltk.org/_modules/nltk/stem/snowball.html.

  4. 4.

    The percentages in Table 1 refer to the number of mappings per diagnosis. Note that these are in principle different from the number of mappings per trace in which the diagnosis appears, since the same diagnosis may appear in more than one trace.

References

  1. van der Aalst, W.M.P.: Process Mining - Data Science in Action. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4

  2. Akshara, P., Shidharth, S., Gokul S., K., Sowmya, K.: Integrating structured and unstructured patient data for ICD9 disease code group prediction. In: 8th ACM IKDD CODS and 26th COMAD, p. 436. Association for Computing Machinery (2021)

    Google Scholar 

  3. van der Aalst, W., et al.: Process mining Manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28108-2_19

    Chapter  Google Scholar 

  4. Amantea, I.A., et al.: A process mining application for the analysis of hospital-at-home admissions. Stud. Health Technol. Inform. 270, 522–526 (2020)

    Google Scholar 

  5. Aringhieri, R., et al.: Leveraging structured data in predictive process monitoring: the case of the ICD-9-CM in the scenario of the home hospitalization service. In: Proceedings of the Workshop on Towards Smarter Health Care: Can Artificial Intelligence Help? Co-Located with AIxIA2021. CEUR Workshop Proceedings, vol. 3060, pp. 48–60. CEUR-WS.org (2021)

    Google Scholar 

  6. Bagheri, A., Sammani, A., Heijden, P.G., Asselbergs, F., Oberski, D.: Automatic ICD-10 classification of diseases from Dutch discharge letters, pp. 281–289, January 2020

    Google Scholar 

  7. Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1), 6 (2020)

    Article  Google Scholar 

  8. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of, NAACL-HLT 2019, pp. 4171–4186. Association for Computational Linguistics (2019)

    Google Scholar 

  9. Di Francescomarino, C., Dumas, M., Maggi, F.M., Teinemaa, I.: Clustering-based predictive process monitoring. IEEE Trans. Serv. Comput. 12(6), 896–909 (2019)

    Article  Google Scholar 

  10. Di Francescomarino, C., Ghidini, C., Maggi, F.M., Milani, F.: Predictive process monitoring methods: which one suits me best? In: Weske, M., Montali, M., Weber, I., vom Brocke, J. (eds.) BPM 2018. LNCS, vol. 11080, pp. 462–479. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98648-7_27

    Chapter  Google Scholar 

  11. Duarte, F., Martins, B., Pinto, C., Silva, M.: A deep learning method for ICD-10 coding of free-text death certificates, pp. 137–149, August 2017

    Google Scholar 

  12. Gangavarapu, T., Jayasimha, A., Krishnan, G.S., Kamath, S.: Predicting ICD-9 code groups with fuzzy similarity based supervised multi-label classification of unstructured clinical nursing notes. Knowl.-Based Syst. 190, 105321 (2020)

    Google Scholar 

  13. Gangavarapu, T., Krishnan, G.S., Kamath, S., Jeganathan, J.: Farsight: long-term disease prediction using unstructured clinical nursing notes. IEEE Trans. Emerg. Top. Comput. 9(3), 1151–1169 (2021)

    Google Scholar 

  14. Isaia, G., Bertone, P., Isaia, G.C., Ricauda, N.: Home care for patients with chronic obstructive pulmonary disease. Arch. Phys. Med. Rehabil. 100, 664–665 (2010)

    Article  Google Scholar 

  15. Koopman, B., Zuccon, G., Nguyen, A., Bergheim, A., Grayson, N.: Automatic ICD-10 classification of cancers from free-text death certificates. Int. J. Med. Inform. 84 (2015)

    Google Scholar 

  16. Leontjeva, A., Conforti, R., Di Francescomarino, C., Dumas, M., Maggi, F.M.: Complex symbolic sequence encodings for predictive monitoring of business processes. In: Motahari-Nezhad, H.R., Recker, J., Weidlich, M. (eds.) BPM 2015. LNCS, vol. 9253, pp. 297–313. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23063-4_21

    Chapter  Google Scholar 

  17. Maggi, F.M., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitoring of business processes. In: Jarke, M., Mylopoulos, J., Quix, C., Rolland, C., Manolopoulos, Y., Mouratidis, H., Horkoff, J. (eds.) CAiSE 2014. LNCS, vol. 8484, pp. 457–472. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07881-6_31

    Chapter  Google Scholar 

  18. Matthews, B.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Struct. 405(2), 442–451 (1975)

    Google Scholar 

  19. Nkolele, R.: Mapping of narrative text fields to ICD-10 codes using natural language processing and machine learning. In: Proceedings of the The Fourth Widening Natural Language Processing Workshop, pp. 131–135. Association for Computational Linguistics, Seattle, July 2020

    Google Scholar 

  20. Pegoraro, M., Uysal, M.S., Georgi, D., Aalst, W.: Text-aware predictive monitoring of business processes, April 2021

    Google Scholar 

  21. Rizzi, W., Simonetto, L., Di Francescomarino, C., Ghidini, C., Kasekamp, T., Maggi, F.M.: Nirdizati 2.0: new features and redesigned backend. In: Demonstration Track at BPM 2019. CEUR Workshop Proceedings, vol. 2420, pp. 154–158. CEUR-WS.org (2019)

    Google Scholar 

  22. Sulis, E., et al.: Monitoring patients with fragilities in the context of de-hospitalization services: an ambient assisted living healthcare framework for e-health applications. In: 23rd ISCT, pp. 216–219. IEEE (2019)

    Google Scholar 

  23. Sulis, E., Terna, P., Di Leva, A., Boella, G., Boccuzzi, A.: Agent-oriented decision support system for business processes management with genetic algorithm optimization: an application in healthcare. J. Med. Syst. 44(9), 1–7 (2020)

    Article  Google Scholar 

  24. Teinemaa, I., Dumas, M., Maggi, F.M., Di Francescomarino, C.: Predictive business process monitoring with structured and unstructured data. In: La Rosa, M., Loos, P., Pastor, O. (eds.) BPM 2016. LNCS, vol. 9850, pp. 401–417. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45348-4_23

    Chapter  Google Scholar 

  25. Verenich, I., Dumas, M., La Rosa, M., Maggi, F.M., Di Francescomarino, C.: Complex symbolic sequence clustering and multiple classifiers for predictive process monitoring. In: Reichert, M., Reijers, H.A. (eds.) BPM 2015. LNBIP, vol. 256, pp. 218–229. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42887-1_18

    Chapter  Google Scholar 

  26. Yuan, Z., Zhao, Z., Sun, H., Li, J., Wang, F., Yu, S.: Coder: knowledge infused cross-lingual medical term embedding for term normalization (2021)

    Google Scholar 

Download references

Acknowledgments

This research has been partially carried out within the “Circular Health for Industry” project, funded by “Compagnia San Paolo” under the call “Intelligenza Artificiale, uomo e società”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chiara Di Francescomarino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ronzani, M. et al. (2022). Unstructured Data in Predictive Process Monitoring: Lexicographic and Semantic Mapping to ICD-9-CM Codes for the Home Hospitalization Service. In: Bandini, S., Gasparini, F., Mascardi, V., Palmonari, M., Vizzari, G. (eds) AIxIA 2021 – Advances in Artificial Intelligence. AIxIA 2021. Lecture Notes in Computer Science(), vol 13196. Springer, Cham. https://doi.org/10.1007/978-3-031-08421-8_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08421-8_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08420-1

  • Online ISBN: 978-3-031-08421-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics