Skip to main content

Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications

  • Chapter
Professional Search in the Modern World

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8830))

Abstract

This article describes information retrieval, natural language processing and text mining of electronic patient record text, also called clinical text. Clinical text is written by physicians and nurses to document the health care process of the patient. First we describe some characteristics of clinical text, followed by the automatic preprocessing of the text that is necessary for making it usable for some applications. We also describe some applications for clinicians including spelling and grammar checking, ICD-10 diagnosis code assignment, as well as other applications for hospital management such as ICD-10 diagnosis code validation and detection of adverse events such as hospital acquired infections. Part of the preprocessing makes the clinical text useful for faceted search, although clinical text already has some keys for performing faceted search such as gender, age, ICD-10 diagnosis codes, ATC drug codes, etc. Preprocessing makes use of ICD-10 codes and the SNOMED-CT textual descriptions. ICD-10 codes and SNOMED-CT are available in several languages and can be considered the modern Greek or Latin of medical language. The basic research presented here has its roots in the challenges described by the health care sector. These challenges have been partially solved in academia, and we believe the solutions will be adapted to the health care sector in real world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Allvin, H., Carlsson, E., Dalianis, H., Danielsson-Ojala, R., Daudaravicius, V., Hassel, M., Kokkinakis, D., Lundgrén-Laine, H., Nilsson, G.H., Nytrø, Ø., Sanna, S., Hanna, S., Sumithra, V.: Characteristics of Finnish and Swedish intensive care nursing narratives: A comparative analysis to support the development of clinical language technologies. Journal of Biomedical Semantics 2(suppl. 3), 1–11 (2011)

    Article  Google Scholar 

  2. Carlberger, J., Dalianis, H., Hassel, M., Knutsson, O.: Improving precision in information retrieval for Swedish using stemming. In: Proceedings of NODALIDA 2001 - 13th Nordic Conference on Computational Linguistics (2001)

    Google Scholar 

  3. Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: Evaluation of negation phrases in narrative clinical reports. In: Proceedings of the AMIA Symposium, p. 105. American Medical Informatics Association (2001)

    Google Scholar 

  4. Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics 34(5), 301–310 (2001)

    Article  Google Scholar 

  5. Chen, A., Gey, F.C.: Combining query translation and document translation in cross-language retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 108–121. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  6. Dalianis, H.: Evaluating a spelling support in a search engine. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2253, pp. 183–190. Springer, Heidelberg (2002)

    Google Scholar 

  7. Dalianis, H.: Aggregation in natural language generation. Computational Intelligence 15(4), 384–414 (1999)

    Article  Google Scholar 

  8. Dalianis, H.: Improving search engine retrieval using a compound splitter for Swedish. In: Proceedings of the 15th Nordic Conference of Computational Linguistics, Joensuu, Finland, University of Joensuu, pp. 38–42. Citeseer (2005)

    Google Scholar 

  9. Dalianis, H., Hassel, M., Henriksson, A., Skeppstedt, M.: Stockholm EPR Corpus: A clinical database used to improve health care. In: Swedish Language Technology Conference, pp. 17–18 (2012)

    Google Scholar 

  10. Dalianis, H., Hassel, M., Velupillai, S.: The Stockholm EPR Corpus-Characteristics and Some Initial Findings. In: Proceedings of ISHIMR 2009, Evaluation and Implementation of e-Health and Health Information Initiatives: International Perspectives, 14th International Symposium for Health Information Management Research, pp. 243–249 (2009)

    Google Scholar 

  11. Dalianis, H., Skeppstedt, M.: Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 5–13. Association for Computational Linguistics (2010)

    Google Scholar 

  12. Ehrentraut, C., Tanushi, H., Tiedemann, J., Dalianis, H.: Detection of hospital acquired infections in sparse and noisy Swedish patient records. In: Proceedings of the Sixth Workshop on Analytics for Noisy Unstructured Text Data (AND 2012) held in conjunction with Coling 2012, Bombay (2012)

    Google Scholar 

  13. Falck, L., Samadi, O.: Compound splitting of Swedish medical words - An evaluation of the Compound Splitter software. Scientific course report, Stockholm University (2012), http://dsv.su.se/health/Falck_Samadi_Compound_splitting.pdf

  14. Freeman, R., Moore, L.S.P., Álvarez, L.G., Charlett, A., Holmes, A.: Advances in electronic surveillance for healthcare-associated infections in the 21st century: A systematic review. Journal of Hospital Infection (2013)

    Google Scholar 

  15. Gardner, M.: Information retrieval for patient care. BMJ 314(7085), 950 (1997)

    Article  Google Scholar 

  16. Gerdes, L.U., Hardahl, C.: Text mining electronic health records to identify hospital adverse events. Studies in Health Technology and Informatics 192, 1145–1145 (2012)

    Google Scholar 

  17. Griffin, F.A., Resar, R.K.: IHI global trigger tool for measuring adverse events. IHI Innovation Series White Paper (2009)

    Google Scholar 

  18. Groopman, J.E.: How doctors think. Houghton Mifflin Company, New York (2007)

    Google Scholar 

  19. Henriksson, A., Hassel, M.: Optimizing the dimensionality of clinical term spaces for improved diagnosis coding support. In: Proceedings of Louhi 2013 4th International Workshop on Health Document Text Mining and Information Analysis (2013)

    Google Scholar 

  20. HIPAA Health Insurance Portability and Accountability (HIPAA): U.S. Department of Health and Human Services (2003), http://www.cdc.gov/mmwr/preview/mmwrhtml/m2e411a1.htm

  21. Humphreys, H., Smyth, E.T.: Prevalence surveys of healthcare-associated infections: what do they tell us, if anything? Clinical Microbiology and Infection 12(1), 2–4 (2006)

    Article  Google Scholar 

  22. IHTSDO: SNOMED-CT, Systematized Nomenclature of Medicine-Clinical Terms, http://www.ihtsdo.org/snomed-ct/ (accessed April 09, 2014)

  23. Isenius, N., Velupillai, S., Kvist, M.: Initial results in the development of SCAN. a Swedish clinical abbreviation normalizer. In: CLEFeHealth 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis, Rome (2012)

    Google Scholar 

  24. Jongejan, B., Dalianis, H.: Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 145–153 (2009)

    Google Scholar 

  25. Kukich, K.: Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR) 24(4), 377–439 (1992)

    Article  Google Scholar 

  26. Kvist, M., Velupillai, S.: Professional language in swedish radiology reports–characterization for patient-adapted text simplification. In: Scandinavian Conference on Health Informatics 2013. Linköping University Electronic Press (2013)

    Google Scholar 

  27. Lewis, J.D., Schinnar, R., Bilker, W.B., Wang, X., Strom, B.L.: Validation studies of the health improvement network (thin) database for pharmacoepidemiology research. Pharmacoepidemiology and Drug Safety 16(4), 393–401 (2007)

    Article  Google Scholar 

  28. Meystre, S., Friedlin, F., South, B., Shen, S., Samore, M.: Automatic de-identification of textual documents in the electronic health record: A review of recent research. BMC Medical Research Methodology 10(1), 70 (2010)

    Google Scholar 

  29. Meystre, S.M., Savova, G.K., Kipper-Schuler, K.C., Hurdle, J.F.: Extracting information from textual documents in the electronic health record: A review of recent research. Yearb Med. Inform. 35, 128–144 (2008)

    Google Scholar 

  30. Nilsson, I.: Medicinsk dokumentation genom tiderna: En studie av den svenska patientjournalens utveckling under 1700-talet, 1800-talet och 1900-talet. Enheten för medicinens historia, Medicinska fakulteten, Lunds universitet (2007) (in Swedish)

    Google Scholar 

  31. Nizamuddin, N., Dalianis, H.: Detection of spelling errors in Swedish clinical text (submitted, 2014)

    Google Scholar 

  32. Pakhomov, S., Pedersen, T., Chute, C.G.: Abbreviation and acronym disambiguation in clinical discourse. In: AMIA Annual Symposium Proceedings, vol. 2005, p. 589. American Medical Informatics Association (2005)

    Google Scholar 

  33. Patrick, J., Nguyen, D.: Automated proof reading of clinical notes. In: PACLIC, 25th Pacific Asia Conference on Language, Information and Computation, pp. 303–312 (2011)

    Google Scholar 

  34. Pestian, J.P., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp. 97–104. Association for Computational Linguistics (2007)

    Google Scholar 

  35. Polepalli, R.B., Houston, T., Brandt, C., Fang, H., Yu, H.: Improving patients’ electronic health record comprehension with noteaid. Studies in Health Technology and Informatics 192, 714–718 (2012)

    Google Scholar 

  36. Proux, D., Hagège, C., et al.: Architecture and systems for monitoring hospital acquired infections inside a hospital information workflow. In: Proceedings of the Workshop on Biomedical Natural Language Processing, pp. 43–48 (2011)

    Google Scholar 

  37. Ruch, P., Baud, R., Geissbühler, A.: Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine 29(1), 169–184 (2003)

    Article  Google Scholar 

  38. Saeed, M., Villarroel, M., Reisner, A.T., Clifford, G., Lehman, L.W., Moody, G., Heldt, T., Kyaw, T.H., Moody, B., Mark, R.G.: Multiparameter intelligent monitoring in intensive care ii (mimic-ii): A public-access intensive care unit database. Critical Care Medicine 39(5), 952 (2011)

    Article  Google Scholar 

  39. Schulz, S., Hahn, U.: Morpheme-based, cross-lingual indexing for medical document retrieval. International Journal of Medical Informatics 58, 87–99 (2000)

    Article  Google Scholar 

  40. Skeppstedt, M.: Negation detection in Swedish clinical text: An adaption of NegEx to Swedish. Journal of Biomedical Semantics 2(suppl. 3), S3 (2011)

    Google Scholar 

  41. Skeppstedt, M., Kvist, M., Dalianis, H.: Rule-based entity recognition and coverage of SNOMED CT in Swedish clinical text. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pp. 1250–1257 (2012)

    Google Scholar 

  42. Skeppstedt, M., Kvist, M., Nilsson, G., Dalianis, H.: Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. Journal of Biomedical Informatics 49, 148–158 (2014)

    Article  Google Scholar 

  43. SKL: Sveriges Kommuner och Landsting, Swedish Association of Local Authorities and Regions (SALAR), Markörbaserad journalgranskning för att identifiera och mäta skador i vården (2012), http://webbutik.skl.se/bilder/artiklar/pdf/7164-847-1.pdf (in Swedish)

  44. Socialstyrelsen: The National Board of Health and Welfare, Diagnosgranskningar utförda i Sverige 1997-2005 samt råd inför granskning (2006), http://www.socialstyrelsen.se/Lists/Artikelkatalog/Attachments/9740/2006-131-30_200613131.pdf (in Swedish)

  45. Socialstyrelsen: The National Board of Health and Welfare, Kodningskvalitet i patientregistret, Slutenvård 2008 (2010), http://www.socialstyrelsen.se/Lists/Artikelkatalog/Attachments/18082/2010-6-27.pdf (in Swedish)

  46. Stanfill, M.H., Williams, M., Fenton, S.H., Jenders, R.A., Hersh, W.R.: A systematic literature review of automated clinical coding and classification systems. J. Am. Med. Inform. Assoc. 17, 646–651 (2010)

    Article  Google Scholar 

  47. Suominen, H., et al.: Overview of the ShARe/CLEF eHealth Evaluation Lab 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 212–231. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  48. Tomlinson, S.: Experiments in 8 European languages with Hummingbird SearchserverTM at CLEF 2002. In: Advances in Cross-Language Information Retrieval, pp. 242–256. Springer (2003)

    Google Scholar 

  49. Velupillai, S.: Shades of Certainty: Annotation and Classification of Swedish Medical Records. Ph.D. thesis, Stockholm University (2012)

    Google Scholar 

  50. Vincze, V., Szarvas, G., Farkas, R., Móra, G., Csirik, J.: The BioScope Corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics 9(suppl. 11), S9 (2008)

    Google Scholar 

  51. Voorhees, E., Tong, R.: Overview of the TREC 2011 medical records track. In: Proc. of TREC (2011)

    Google Scholar 

  52. Wang, P., Berry, M.W., Yang, Y.: Mining longitudinal web queries: Trends and patterns. Journal of the American Society for Information Science and Technology 54(8), 743–758 (2003)

    Article  Google Scholar 

  53. WHO: International Classification of Diseases (ICD), http://www.who.int/classifications/icd/en/ (accessed April 09, 2014)

  54. Wong, W., Glance, D.: Statistical semantic and clinician confidence analysis for real-time clinical progress note cleaning. Artificial Intelligence in Medicine 53, 171–180 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Dalianis, H. (2014). Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications. In: Paltoglou, G., Loizides, F., Hansen, P. (eds) Professional Search in the Modern World. Lecture Notes in Computer Science, vol 8830. Springer, Cham. https://doi.org/10.1007/978-3-319-12511-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12511-4_8

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12510-7

  • Online ISBN: 978-3-319-12511-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics