Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Oldenbourg October 19, 2016

How to improve information extraction from German medical records

  • Johannes Starlinger

    Johannes Starlinger is a Research Associate at the Department of Computer Science at Humboldt-Universität zu Berlin. After studying medicine at Medical University of Vienna, and Computer Science at HU-Berlin, he joined the DFG-funded SOAMED graduate program in 2010 to research service-oriented architectures in a medical area of application. He received his PhD from HU-Berlin in 2015. Johannes' current research focus is on similarity search over data relevant to the biomedical domain, including scientific workflows, genomic, and medical data.

    Humboldt-Universität zu Berlin, Institut für Informatik, Unter den Linden 6, 10099 Berlin, Germany

    EMAIL logo
    , Madeleine Kittner

    Madeleine Kittner studied chemistry at TU Berlin and University of Strathclyde Glasgow. In 2011, she received a PhD in biochemistry from Universität Potsdam, Germany. She has experience in analyzing transcriptomics data, signaling pathways and text mining of Dutch medical records. Currently, she is a research associate at the Department of Computer Science at Humboldt-Universität zu Berlin focusing on text mining of biomedical documents.

    Humboldt-Universität zu Berlin, Institut für Informatik, Unter den Linden 6, 10099 Berlin, Germany

    , Oliver Blankenstein

    Oliver Blankenstein is a pediatrician at the Department of Pediatric Endocrinology and Diabetology and the head of the Newborn Screening Laboratory at Charité Universitätsmedizin Berlin. He also heads the department of endocrinology and metabolism at Labor Berlin. He has served as PI in a number of randomized controlled clinical trials and, for several years, has been advising computer science researchers in the DFG RTG SOAMED.

    Charité Universitätsmedizin Berlin, Pädiatrische Endokrinologie und Diabetologie, Augustenburger Platz 1, 13353 Berlin, Germany

    and Ulf Leser

    Ulf Leser studied informatics at TU München and obtained his PhD from TU Berlin. In 2002 be became professor for Knowledge Management in Bioinformatics at HU Berlin. His highly interdisciplinary research focuses on scientific data management, statistical Bioinformatics, biomedical text mining, and scientific workflows. He is speaker of the DFG-RTG SOAMED and a board member of the DFG-excellence RTG BSIO.

    Humboldt-Universität zu Berlin, Institut für Informatik, Unter den Linden 6, 10099 Berlin, Germany

Abstract

Vast amounts of medical information are still recorded as unstructured text. The knowledge contained in this textual data has a great potential to improve clinical routine care, to support clinical research, and to advance personalization of medicine. To access this knowledge, the underlying data has to be semantically integrated – an essential prerequisite to which is information extraction from clinical documents.

A body of work, and a good selection of openly available tools for information extraction and semantic integration in the medical domain exist, yet almost exclusively for English language documents. For German texts the situation is rather different: research work is sparse, tools are proprietary or unpublished, and rarely any freely available textual resources exist. In this survey, we (1) describe the challenges of information extraction from German medical documents and the hurdles posed to research in this area, (2) especially address the problems of missing German language resources and privacy implications, and (3) identify the steps necessary to overcome these hurdles and fuel research in semantic integration of textual clinical data.

About the authors

Johannes Starlinger

Johannes Starlinger is a Research Associate at the Department of Computer Science at Humboldt-Universität zu Berlin. After studying medicine at Medical University of Vienna, and Computer Science at HU-Berlin, he joined the DFG-funded SOAMED graduate program in 2010 to research service-oriented architectures in a medical area of application. He received his PhD from HU-Berlin in 2015. Johannes' current research focus is on similarity search over data relevant to the biomedical domain, including scientific workflows, genomic, and medical data.

Humboldt-Universität zu Berlin, Institut für Informatik, Unter den Linden 6, 10099 Berlin, Germany

Madeleine Kittner

Madeleine Kittner studied chemistry at TU Berlin and University of Strathclyde Glasgow. In 2011, she received a PhD in biochemistry from Universität Potsdam, Germany. She has experience in analyzing transcriptomics data, signaling pathways and text mining of Dutch medical records. Currently, she is a research associate at the Department of Computer Science at Humboldt-Universität zu Berlin focusing on text mining of biomedical documents.

Humboldt-Universität zu Berlin, Institut für Informatik, Unter den Linden 6, 10099 Berlin, Germany

Oliver Blankenstein

Oliver Blankenstein is a pediatrician at the Department of Pediatric Endocrinology and Diabetology and the head of the Newborn Screening Laboratory at Charité Universitätsmedizin Berlin. He also heads the department of endocrinology and metabolism at Labor Berlin. He has served as PI in a number of randomized controlled clinical trials and, for several years, has been advising computer science researchers in the DFG RTG SOAMED.

Charité Universitätsmedizin Berlin, Pädiatrische Endokrinologie und Diabetologie, Augustenburger Platz 1, 13353 Berlin, Germany

Ulf Leser

Ulf Leser studied informatics at TU München and obtained his PhD from TU Berlin. In 2002 be became professor for Knowledge Management in Bioinformatics at HU Berlin. His highly interdisciplinary research focuses on scientific data management, statistical Bioinformatics, biomedical text mining, and scientific workflows. He is speaker of the DFG-RTG SOAMED and a board member of the DFG-excellence RTG BSIO.

Humboldt-Universität zu Berlin, Institut für Informatik, Unter den Linden 6, 10099 Berlin, Germany

Acknowledgement

This work was partly funded by BMBF grants PERSONS (031L0030B) and PREDICT (031L0023A), and by DFG grant SOAMED (GRK1651).

Received: 2016-5-9
Revised: 2016-6-29
Accepted: 2016-9-21
Published Online: 2016-10-19
Published in Print: 2017-8-28

©2016 Walter de Gruyter Berlin/Boston

Downloaded on 26.4.2024 from https://www.degruyter.com/document/doi/10.1515/itit-2016-0027/html
Scroll to top button