Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications

Dalianis, Hercules

doi:10.1007/978-3-319-12511-4_8

Hercules Dalianis¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8830))

923 Accesses
6 Citations

Abstract

This article describes information retrieval, natural language processing and text mining of electronic patient record text, also called clinical text. Clinical text is written by physicians and nurses to document the health care process of the patient. First we describe some characteristics of clinical text, followed by the automatic preprocessing of the text that is necessary for making it usable for some applications. We also describe some applications for clinicians including spelling and grammar checking, ICD-10 diagnosis code assignment, as well as other applications for hospital management such as ICD-10 diagnosis code validation and detection of adverse events such as hospital acquired infections. Part of the preprocessing makes the clinical text useful for faceted search, although clinical text already has some keys for performing faceted search such as gender, age, ICD-10 diagnosis codes, ATC drug codes, etc. Preprocessing makes use of ICD-10 codes and the SNOMED-CT textual descriptions. ICD-10 codes and SNOMED-CT are available in several languages and can be considered the modern Greek or Latin of medical language. The basic research presented here has its roots in the challenges described by the health care sector. These challenges have been partially solved in academia, and we believe the solutions will be adapted to the health care sector in real world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Natural Language Processing and Text Mining (Turning Unstructured Data into Structured)

UTP: A Unified Term Presentation Tool for Clinical Textual Data Using Pattern-Matching Rules and Dictionary-Based Ontologies

Evaluation of Doc’EDS: a French semantic search tool to query health documents from a clinical data warehouse

Article Open access 08 February 2022

References

Allvin, H., Carlsson, E., Dalianis, H., Danielsson-Ojala, R., Daudaravicius, V., Hassel, M., Kokkinakis, D., Lundgrén-Laine, H., Nilsson, G.H., Nytrø, Ø., Sanna, S., Hanna, S., Sumithra, V.: Characteristics of Finnish and Swedish intensive care nursing narratives: A comparative analysis to support the development of clinical language technologies. Journal of Biomedical Semantics 2(suppl. 3), 1–11 (2011)
Article Google Scholar
Carlberger, J., Dalianis, H., Hassel, M., Knutsson, O.: Improving precision in information retrieval for Swedish using stemming. In: Proceedings of NODALIDA 2001 - 13th Nordic Conference on Computational Linguistics (2001)
Google Scholar
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: Evaluation of negation phrases in narrative clinical reports. In: Proceedings of the AMIA Symposium, p. 105. American Medical Informatics Association (2001)
Google Scholar
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics 34(5), 301–310 (2001)
Article Google Scholar
Chen, A., Gey, F.C.: Combining query translation and document translation in cross-language retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 108–121. Springer, Heidelberg (2004)
Chapter Google Scholar
Dalianis, H.: Evaluating a spelling support in a search engine. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2253, pp. 183–190. Springer, Heidelberg (2002)
Google Scholar
Dalianis, H.: Aggregation in natural language generation. Computational Intelligence 15(4), 384–414 (1999)
Article Google Scholar
Dalianis, H.: Improving search engine retrieval using a compound splitter for Swedish. In: Proceedings of the 15th Nordic Conference of Computational Linguistics, Joensuu, Finland, University of Joensuu, pp. 38–42. Citeseer (2005)
Google Scholar
Dalianis, H., Hassel, M., Henriksson, A., Skeppstedt, M.: Stockholm EPR Corpus: A clinical database used to improve health care. In: Swedish Language Technology Conference, pp. 17–18 (2012)
Google Scholar
Dalianis, H., Hassel, M., Velupillai, S.: The Stockholm EPR Corpus-Characteristics and Some Initial Findings. In: Proceedings of ISHIMR 2009, Evaluation and Implementation of e-Health and Health Information Initiatives: International Perspectives, 14th International Symposium for Health Information Management Research, pp. 243–249 (2009)
Google Scholar
Dalianis, H., Skeppstedt, M.: Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 5–13. Association for Computational Linguistics (2010)
Google Scholar
Ehrentraut, C., Tanushi, H., Tiedemann, J., Dalianis, H.: Detection of hospital acquired infections in sparse and noisy Swedish patient records. In: Proceedings of the Sixth Workshop on Analytics for Noisy Unstructured Text Data (AND 2012) held in conjunction with Coling 2012, Bombay (2012)
Google Scholar
Falck, L., Samadi, O.: Compound splitting of Swedish medical words - An evaluation of the Compound Splitter software. Scientific course report, Stockholm University (2012), http://dsv.su.se/health/Falck_Samadi_Compound_splitting.pdf
Freeman, R., Moore, L.S.P., Álvarez, L.G., Charlett, A., Holmes, A.: Advances in electronic surveillance for healthcare-associated infections in the 21st century: A systematic review. Journal of Hospital Infection (2013)
Google Scholar
Gardner, M.: Information retrieval for patient care. BMJ 314(7085), 950 (1997)
Article Google Scholar
Gerdes, L.U., Hardahl, C.: Text mining electronic health records to identify hospital adverse events. Studies in Health Technology and Informatics 192, 1145–1145 (2012)
Google Scholar
Griffin, F.A., Resar, R.K.: IHI global trigger tool for measuring adverse events. IHI Innovation Series White Paper (2009)
Google Scholar
Groopman, J.E.: How doctors think. Houghton Mifflin Company, New York (2007)
Google Scholar
Henriksson, A., Hassel, M.: Optimizing the dimensionality of clinical term spaces for improved diagnosis coding support. In: Proceedings of Louhi 2013 4th International Workshop on Health Document Text Mining and Information Analysis (2013)
Google Scholar
HIPAA Health Insurance Portability and Accountability (HIPAA): U.S. Department of Health and Human Services (2003), http://www.cdc.gov/mmwr/preview/mmwrhtml/m2e411a1.htm
Humphreys, H., Smyth, E.T.: Prevalence surveys of healthcare-associated infections: what do they tell us, if anything? Clinical Microbiology and Infection 12(1), 2–4 (2006)
Article Google Scholar
IHTSDO: SNOMED-CT, Systematized Nomenclature of Medicine-Clinical Terms, http://www.ihtsdo.org/snomed-ct/ (accessed April 09, 2014)
Isenius, N., Velupillai, S., Kvist, M.: Initial results in the development of SCAN. a Swedish clinical abbreviation normalizer. In: CLEFeHealth 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis, Rome (2012)
Google Scholar
Jongejan, B., Dalianis, H.: Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 145–153 (2009)
Google Scholar
Kukich, K.: Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR) 24(4), 377–439 (1992)
Article Google Scholar
Kvist, M., Velupillai, S.: Professional language in swedish radiology reports–characterization for patient-adapted text simplification. In: Scandinavian Conference on Health Informatics 2013. Linköping University Electronic Press (2013)
Google Scholar
Lewis, J.D., Schinnar, R., Bilker, W.B., Wang, X., Strom, B.L.: Validation studies of the health improvement network (thin) database for pharmacoepidemiology research. Pharmacoepidemiology and Drug Safety 16(4), 393–401 (2007)
Article Google Scholar
Meystre, S., Friedlin, F., South, B., Shen, S., Samore, M.: Automatic de-identification of textual documents in the electronic health record: A review of recent research. BMC Medical Research Methodology 10(1), 70 (2010)
Google Scholar
Meystre, S.M., Savova, G.K., Kipper-Schuler, K.C., Hurdle, J.F.: Extracting information from textual documents in the electronic health record: A review of recent research. Yearb Med. Inform. 35, 128–144 (2008)
Google Scholar
Nilsson, I.: Medicinsk dokumentation genom tiderna: En studie av den svenska patientjournalens utveckling under 1700-talet, 1800-talet och 1900-talet. Enheten för medicinens historia, Medicinska fakulteten, Lunds universitet (2007) (in Swedish)
Google Scholar
Nizamuddin, N., Dalianis, H.: Detection of spelling errors in Swedish clinical text (submitted, 2014)
Google Scholar
Pakhomov, S., Pedersen, T., Chute, C.G.: Abbreviation and acronym disambiguation in clinical discourse. In: AMIA Annual Symposium Proceedings, vol. 2005, p. 589. American Medical Informatics Association (2005)
Google Scholar
Patrick, J., Nguyen, D.: Automated proof reading of clinical notes. In: PACLIC, 25th Pacific Asia Conference on Language, Information and Computation, pp. 303–312 (2011)
Google Scholar
Pestian, J.P., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp. 97–104. Association for Computational Linguistics (2007)
Google Scholar
Polepalli, R.B., Houston, T., Brandt, C., Fang, H., Yu, H.: Improving patients’ electronic health record comprehension with noteaid. Studies in Health Technology and Informatics 192, 714–718 (2012)
Google Scholar
Proux, D., Hagège, C., et al.: Architecture and systems for monitoring hospital acquired infections inside a hospital information workflow. In: Proceedings of the Workshop on Biomedical Natural Language Processing, pp. 43–48 (2011)
Google Scholar
Ruch, P., Baud, R., Geissbühler, A.: Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine 29(1), 169–184 (2003)
Article Google Scholar
Saeed, M., Villarroel, M., Reisner, A.T., Clifford, G., Lehman, L.W., Moody, G., Heldt, T., Kyaw, T.H., Moody, B., Mark, R.G.: Multiparameter intelligent monitoring in intensive care ii (mimic-ii): A public-access intensive care unit database. Critical Care Medicine 39(5), 952 (2011)
Article Google Scholar
Schulz, S., Hahn, U.: Morpheme-based, cross-lingual indexing for medical document retrieval. International Journal of Medical Informatics 58, 87–99 (2000)
Article Google Scholar
Skeppstedt, M.: Negation detection in Swedish clinical text: An adaption of NegEx to Swedish. Journal of Biomedical Semantics 2(suppl. 3), S3 (2011)
Google Scholar
Skeppstedt, M., Kvist, M., Dalianis, H.: Rule-based entity recognition and coverage of SNOMED CT in Swedish clinical text. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pp. 1250–1257 (2012)
Google Scholar
Skeppstedt, M., Kvist, M., Nilsson, G., Dalianis, H.: Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. Journal of Biomedical Informatics 49, 148–158 (2014)
Article Google Scholar
SKL: Sveriges Kommuner och Landsting, Swedish Association of Local Authorities and Regions (SALAR), Markörbaserad journalgranskning för att identifiera och mäta skador i vården (2012), http://webbutik.skl.se/bilder/artiklar/pdf/7164-847-1.pdf (in Swedish)
Socialstyrelsen: The National Board of Health and Welfare, Diagnosgranskningar utförda i Sverige 1997-2005 samt råd inför granskning (2006), http://www.socialstyrelsen.se/Lists/Artikelkatalog/Attachments/9740/2006-131-30_200613131.pdf (in Swedish)
Socialstyrelsen: The National Board of Health and Welfare, Kodningskvalitet i patientregistret, Slutenvård 2008 (2010), http://www.socialstyrelsen.se/Lists/Artikelkatalog/Attachments/18082/2010-6-27.pdf (in Swedish)
Stanfill, M.H., Williams, M., Fenton, S.H., Jenders, R.A., Hersh, W.R.: A systematic literature review of automated clinical coding and classification systems. J. Am. Med. Inform. Assoc. 17, 646–651 (2010)
Article Google Scholar
Suominen, H., et al.: Overview of the ShARe/CLEF eHealth Evaluation Lab 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 212–231. Springer, Heidelberg (2013)
Chapter Google Scholar
Tomlinson, S.: Experiments in 8 European languages with Hummingbird SearchserverTM at CLEF 2002. In: Advances in Cross-Language Information Retrieval, pp. 242–256. Springer (2003)
Google Scholar
Velupillai, S.: Shades of Certainty: Annotation and Classification of Swedish Medical Records. Ph.D. thesis, Stockholm University (2012)
Google Scholar
Vincze, V., Szarvas, G., Farkas, R., Móra, G., Csirik, J.: The BioScope Corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics 9(suppl. 11), S9 (2008)
Google Scholar
Voorhees, E., Tong, R.: Overview of the TREC 2011 medical records track. In: Proc. of TREC (2011)
Google Scholar
Wang, P., Berry, M.W., Yang, Y.: Mining longitudinal web queries: Trends and patterns. Journal of the American Society for Information Science and Technology 54(8), 743–758 (2003)
Article Google Scholar
WHO: International Classification of Diseases (ICD), http://www.who.int/classifications/icd/en/ (accessed April 09, 2014)
Wong, W., Glance, D.: Statistical semantic and clinician confidence analysis for real-time clinical progress note cleaning. Artificial Intelligence in Medicine 53, 171–180 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Systems Sciences, Stockholm University, P.O. Box 7003, 164 07, Kista, Sweden
Hercules Dalianis

Authors

Hercules Dalianis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Technology, University of Wolverhampton, Wulfruna Street, WV1 1LY, Wolverhampton, UK
Georgios Paltoglou
Department of Multimedia and Graphic Arts, Cyprus University of Technology, Limassol, Cyprus
Fernando Loizides
Swedish Institute of Computer Science, Isafjordsgatan 22, SE-164 28, Kista, Sweden
Preben Hansen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dalianis, H. (2014). Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications. In: Paltoglou, G., Loizides, F., Hansen, P. (eds) Professional Search in the Modern World. Lecture Notes in Computer Science, vol 8830. Springer, Cham. https://doi.org/10.1007/978-3-319-12511-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-12511-4_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12510-7
Online ISBN: 978-3-319-12511-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics