Abstract
This article describes information retrieval, natural language processing and text mining of electronic patient record text, also called clinical text. Clinical text is written by physicians and nurses to document the health care process of the patient. First we describe some characteristics of clinical text, followed by the automatic preprocessing of the text that is necessary for making it usable for some applications. We also describe some applications for clinicians including spelling and grammar checking, ICD-10 diagnosis code assignment, as well as other applications for hospital management such as ICD-10 diagnosis code validation and detection of adverse events such as hospital acquired infections. Part of the preprocessing makes the clinical text useful for faceted search, although clinical text already has some keys for performing faceted search such as gender, age, ICD-10 diagnosis codes, ATC drug codes, etc. Preprocessing makes use of ICD-10 codes and the SNOMED-CT textual descriptions. ICD-10 codes and SNOMED-CT are available in several languages and can be considered the modern Greek or Latin of medical language. The basic research presented here has its roots in the challenges described by the health care sector. These challenges have been partially solved in academia, and we believe the solutions will be adapted to the health care sector in real world applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allvin, H., Carlsson, E., Dalianis, H., Danielsson-Ojala, R., Daudaravicius, V., Hassel, M., Kokkinakis, D., Lundgrén-Laine, H., Nilsson, G.H., Nytrø, Ø., Sanna, S., Hanna, S., Sumithra, V.: Characteristics of Finnish and Swedish intensive care nursing narratives: A comparative analysis to support the development of clinical language technologies. Journal of Biomedical Semantics 2(suppl. 3), 1–11 (2011)
Carlberger, J., Dalianis, H., Hassel, M., Knutsson, O.: Improving precision in information retrieval for Swedish using stemming. In: Proceedings of NODALIDA 2001 - 13th Nordic Conference on Computational Linguistics (2001)
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: Evaluation of negation phrases in narrative clinical reports. In: Proceedings of the AMIA Symposium, p. 105. American Medical Informatics Association (2001)
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics 34(5), 301–310 (2001)
Chen, A., Gey, F.C.: Combining query translation and document translation in cross-language retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 108–121. Springer, Heidelberg (2004)
Dalianis, H.: Evaluating a spelling support in a search engine. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2253, pp. 183–190. Springer, Heidelberg (2002)
Dalianis, H.: Aggregation in natural language generation. Computational Intelligence 15(4), 384–414 (1999)
Dalianis, H.: Improving search engine retrieval using a compound splitter for Swedish. In: Proceedings of the 15th Nordic Conference of Computational Linguistics, Joensuu, Finland, University of Joensuu, pp. 38–42. Citeseer (2005)
Dalianis, H., Hassel, M., Henriksson, A., Skeppstedt, M.: Stockholm EPR Corpus: A clinical database used to improve health care. In: Swedish Language Technology Conference, pp. 17–18 (2012)
Dalianis, H., Hassel, M., Velupillai, S.: The Stockholm EPR Corpus-Characteristics and Some Initial Findings. In: Proceedings of ISHIMR 2009, Evaluation and Implementation of e-Health and Health Information Initiatives: International Perspectives, 14th International Symposium for Health Information Management Research, pp. 243–249 (2009)
Dalianis, H., Skeppstedt, M.: Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 5–13. Association for Computational Linguistics (2010)
Ehrentraut, C., Tanushi, H., Tiedemann, J., Dalianis, H.: Detection of hospital acquired infections in sparse and noisy Swedish patient records. In: Proceedings of the Sixth Workshop on Analytics for Noisy Unstructured Text Data (AND 2012) held in conjunction with Coling 2012, Bombay (2012)
Falck, L., Samadi, O.: Compound splitting of Swedish medical words - An evaluation of the Compound Splitter software. Scientific course report, Stockholm University (2012), http://dsv.su.se/health/Falck_Samadi_Compound_splitting.pdf
Freeman, R., Moore, L.S.P., Álvarez, L.G., Charlett, A., Holmes, A.: Advances in electronic surveillance for healthcare-associated infections in the 21st century: A systematic review. Journal of Hospital Infection (2013)
Gardner, M.: Information retrieval for patient care. BMJ 314(7085), 950 (1997)
Gerdes, L.U., Hardahl, C.: Text mining electronic health records to identify hospital adverse events. Studies in Health Technology and Informatics 192, 1145–1145 (2012)
Griffin, F.A., Resar, R.K.: IHI global trigger tool for measuring adverse events. IHI Innovation Series White Paper (2009)
Groopman, J.E.: How doctors think. Houghton Mifflin Company, New York (2007)
Henriksson, A., Hassel, M.: Optimizing the dimensionality of clinical term spaces for improved diagnosis coding support. In: Proceedings of Louhi 2013 4th International Workshop on Health Document Text Mining and Information Analysis (2013)
HIPAA Health Insurance Portability and Accountability (HIPAA): U.S. Department of Health and Human Services (2003), http://www.cdc.gov/mmwr/preview/mmwrhtml/m2e411a1.htm
Humphreys, H., Smyth, E.T.: Prevalence surveys of healthcare-associated infections: what do they tell us, if anything? Clinical Microbiology and Infection 12(1), 2–4 (2006)
IHTSDO: SNOMED-CT, Systematized Nomenclature of Medicine-Clinical Terms, http://www.ihtsdo.org/snomed-ct/ (accessed April 09, 2014)
Isenius, N., Velupillai, S., Kvist, M.: Initial results in the development of SCAN. a Swedish clinical abbreviation normalizer. In: CLEFeHealth 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis, Rome (2012)
Jongejan, B., Dalianis, H.: Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 145–153 (2009)
Kukich, K.: Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR) 24(4), 377–439 (1992)
Kvist, M., Velupillai, S.: Professional language in swedish radiology reports–characterization for patient-adapted text simplification. In: Scandinavian Conference on Health Informatics 2013. Linköping University Electronic Press (2013)
Lewis, J.D., Schinnar, R., Bilker, W.B., Wang, X., Strom, B.L.: Validation studies of the health improvement network (thin) database for pharmacoepidemiology research. Pharmacoepidemiology and Drug Safety 16(4), 393–401 (2007)
Meystre, S., Friedlin, F., South, B., Shen, S., Samore, M.: Automatic de-identification of textual documents in the electronic health record: A review of recent research. BMC Medical Research Methodology 10(1), 70 (2010)
Meystre, S.M., Savova, G.K., Kipper-Schuler, K.C., Hurdle, J.F.: Extracting information from textual documents in the electronic health record: A review of recent research. Yearb Med. Inform. 35, 128–144 (2008)
Nilsson, I.: Medicinsk dokumentation genom tiderna: En studie av den svenska patientjournalens utveckling under 1700-talet, 1800-talet och 1900-talet. Enheten för medicinens historia, Medicinska fakulteten, Lunds universitet (2007) (in Swedish)
Nizamuddin, N., Dalianis, H.: Detection of spelling errors in Swedish clinical text (submitted, 2014)
Pakhomov, S., Pedersen, T., Chute, C.G.: Abbreviation and acronym disambiguation in clinical discourse. In: AMIA Annual Symposium Proceedings, vol. 2005, p. 589. American Medical Informatics Association (2005)
Patrick, J., Nguyen, D.: Automated proof reading of clinical notes. In: PACLIC, 25th Pacific Asia Conference on Language, Information and Computation, pp. 303–312 (2011)
Pestian, J.P., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp. 97–104. Association for Computational Linguistics (2007)
Polepalli, R.B., Houston, T., Brandt, C., Fang, H., Yu, H.: Improving patients’ electronic health record comprehension with noteaid. Studies in Health Technology and Informatics 192, 714–718 (2012)
Proux, D., Hagège, C., et al.: Architecture and systems for monitoring hospital acquired infections inside a hospital information workflow. In: Proceedings of the Workshop on Biomedical Natural Language Processing, pp. 43–48 (2011)
Ruch, P., Baud, R., Geissbühler, A.: Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine 29(1), 169–184 (2003)
Saeed, M., Villarroel, M., Reisner, A.T., Clifford, G., Lehman, L.W., Moody, G., Heldt, T., Kyaw, T.H., Moody, B., Mark, R.G.: Multiparameter intelligent monitoring in intensive care ii (mimic-ii): A public-access intensive care unit database. Critical Care Medicine 39(5), 952 (2011)
Schulz, S., Hahn, U.: Morpheme-based, cross-lingual indexing for medical document retrieval. International Journal of Medical Informatics 58, 87–99 (2000)
Skeppstedt, M.: Negation detection in Swedish clinical text: An adaption of NegEx to Swedish. Journal of Biomedical Semantics 2(suppl. 3), S3 (2011)
Skeppstedt, M., Kvist, M., Dalianis, H.: Rule-based entity recognition and coverage of SNOMED CT in Swedish clinical text. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pp. 1250–1257 (2012)
Skeppstedt, M., Kvist, M., Nilsson, G., Dalianis, H.: Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. Journal of Biomedical Informatics 49, 148–158 (2014)
SKL: Sveriges Kommuner och Landsting, Swedish Association of Local Authorities and Regions (SALAR), Markörbaserad journalgranskning för att identifiera och mäta skador i vården (2012), http://webbutik.skl.se/bilder/artiklar/pdf/7164-847-1.pdf (in Swedish)
Socialstyrelsen: The National Board of Health and Welfare, Diagnosgranskningar utförda i Sverige 1997-2005 samt råd inför granskning (2006), http://www.socialstyrelsen.se/Lists/Artikelkatalog/Attachments/9740/2006-131-30_200613131.pdf (in Swedish)
Socialstyrelsen: The National Board of Health and Welfare, Kodningskvalitet i patientregistret, Slutenvård 2008 (2010), http://www.socialstyrelsen.se/Lists/Artikelkatalog/Attachments/18082/2010-6-27.pdf (in Swedish)
Stanfill, M.H., Williams, M., Fenton, S.H., Jenders, R.A., Hersh, W.R.: A systematic literature review of automated clinical coding and classification systems. J. Am. Med. Inform. Assoc. 17, 646–651 (2010)
Suominen, H., et al.: Overview of the ShARe/CLEF eHealth Evaluation Lab 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 212–231. Springer, Heidelberg (2013)
Tomlinson, S.: Experiments in 8 European languages with Hummingbird SearchserverTM at CLEF 2002. In: Advances in Cross-Language Information Retrieval, pp. 242–256. Springer (2003)
Velupillai, S.: Shades of Certainty: Annotation and Classification of Swedish Medical Records. Ph.D. thesis, Stockholm University (2012)
Vincze, V., Szarvas, G., Farkas, R., Móra, G., Csirik, J.: The BioScope Corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics 9(suppl. 11), S9 (2008)
Voorhees, E., Tong, R.: Overview of the TREC 2011 medical records track. In: Proc. of TREC (2011)
Wang, P., Berry, M.W., Yang, Y.: Mining longitudinal web queries: Trends and patterns. Journal of the American Society for Information Science and Technology 54(8), 743–758 (2003)
WHO: International Classification of Diseases (ICD), http://www.who.int/classifications/icd/en/ (accessed April 09, 2014)
Wong, W., Glance, D.: Statistical semantic and clinician confidence analysis for real-time clinical progress note cleaning. Artificial Intelligence in Medicine 53, 171–180 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Dalianis, H. (2014). Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications. In: Paltoglou, G., Loizides, F., Hansen, P. (eds) Professional Search in the Modern World. Lecture Notes in Computer Science, vol 8830. Springer, Cham. https://doi.org/10.1007/978-3-319-12511-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-12511-4_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12510-7
Online ISBN: 978-3-319-12511-4
eBook Packages: Computer ScienceComputer Science (R0)