Leveraging Wikipedia and Context Features for Clinical Event Extraction from Mixed-Language Discharge Summary

Jeong, Kwang-Yong; Yi, Wangjin; Seol, Jae-Wook; Choi, Jinwook; Lee, Kyung-Soon

doi:10.1007/978-3-319-12844-3_26

Leveraging Wikipedia and Context Features for Clinical Event Extraction from Mixed-Language Discharge Summary

Kwang-Yong Jeong²²,
Wangjin Yi²³,
Jae-Wook Seol²²,
Jinwook Choi²⁴ &
…
Kyung-Soon Lee²²

Conference paper

1418 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8870))

Abstract

Unstructured clinical texts contain patients’ disease related narratives, but it is required elaborate work to mine the kind of information. Especially for the classification of semantic types of a clinical term, implementations of domain knowledge from resources such as the Unified Medical Language System (UMLS) are essential. The UMLS has a limitation in dealing with other languages. In this paper, we leverage Wikipedia as well as UMLS for clinical event extraction, especially from clinical narratives written in mixed-language. Semantic features for clinical terms are extracted based on semantic networks of hierarchical categories in Wikipedia. Semantic types for Korean clinical terms are detected by using translation links and semantic networks in Wikipedia. An additional remarkable feature is a controlled vocabulary of clue words which can be contextual evidence to determine clinical semantic types of a word. The experimental result on 150 discharge summaries written in English and Korean showed 75.9% in F1-measure. This result shows that the proposed features are effective for clinical event extraction.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chaudhry, B., Wang, J., Wu, S., Maglione, M., Mojica, W., Roth, E., Shekelle, P.G.: Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Annals of Internal Medicine 144(10), 742–752 (2006)
Article Google Scholar
Goodwin, T., Harabagiu, S.M.: The Impact of Belief Values on the Identification of Patient Cohorts. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 155–166. Springer, Heidelberg (2013)
Chapter Google Scholar
Park, H., Choi, J.: V-Model: A New Innovative Model to Chronologically Visualize Narrative Clinical Texts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 453–462. ACM (2012)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)
Google Scholar
Xu, Y., Hong, K., Tsujii, J., Eric, I., Chang, C.: Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. Journal of the American Medical Informatics Association 19(5), 824–832 (2012)
Article Google Scholar
Sun, W., Rumshisky, A., Uzuner, O.: Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. Journal of the American Medical Informatics Association 20(5), 806–813 (2013)
Article Google Scholar
Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association 18(5), 552–556 (2011)
Article Google Scholar
Jang, H., Song, S.-k., Myaeng, S.-H.: Text Mining for Medical Documents Using a Hidden Markov Model. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 553–559. Springer, Heidelberg (2006)
Chapter Google Scholar
Kristjansson, T., Culotta, A., Viola, P., McCallum, A.: Interactive information extraction with constrained conditional random fields. In: AAAI, vol. 4, pp. 412–418 (2004)
Google Scholar
Rink, B., Harabagiu, S., Roberts, K.: Automatic extraction of relations between medical concepts in clinical texts. Journal of the American Medical Informatics Association 18(5), 594–600 (2011)
Article Google Scholar
Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 296–303. ACL (2006)
Google Scholar
UMLS Knowledge Base, http://www.nlm.nih.gov/research/umls
MESH Knowledge Base, http://www.ncbi.nlm.nih.gov/mesh
SNOMED CT, http://www.ihtsdo.org/snomed-ct
Wang, P., Domeniconi, C.: Building semantic kernels for text classification using Wikipedia. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713–721. ACM (2008)
Google Scholar
Oh, H.S., Kim, J.B., Myaeng, S.H.: Extracting targets and attributes of medical findings from radiology reports in a mixture of languages. In: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 550–552. ACM (2011)
Google Scholar
MALLET: A Machine Learning for Language Toolkit, http://mallet.cs.umass.edu
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Natural language processing using very large corpora. Text, Speech and Language Technology, vol. 11, pp. 157–176. Springer, Netherlands (1999)
Chapter Google Scholar
GENIA Tagger, http://www.nactem.ac.uk/tsujii/GENIA/tagger/
KLT: Korean Language Technology, http://nlp.kookmin.ac.kr/

Download references

Author information

Authors and Affiliations

Dept. of Computer Science & Engineering, CAIIT, Chonbuk National University, South Korea
Kwang-Yong Jeong, Jae-Wook Seol & Kyung-Soon Lee
Interdisciplinary Program of Bioengineering, Seoul National University, South Korea
Wangjin Yi
Dept. of Biomedical Engineering, College of Medicine, Seoul National University, South Korea
Jinwook Choi

Authors

Kwang-Yong Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Wangjin Yi
View author publications
You can also search for this author in PubMed Google Scholar
Jae-Wook Seol
View author publications
You can also search for this author in PubMed Google Scholar
Jinwook Choi
View author publications
You can also search for this author in PubMed Google Scholar
Kyung-Soon Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Visual Informatic, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Azizah Jaafar
Institute of Visual Informatics, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Nazlena Mohamad Ali
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Shahrul Azman Mohd Noah
Insight Centre for Data Analytics, Dublin City University, Glasnevin, 9, Dublin, Ireland
Alan F. Smeaton
Information Systems, Queensland University of Technology, 4001, Brisbane, QLD, Australia
Peter Bruza
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450, Shah Alam, Selangor, Malaysia
Zainab Abu Bakar & Nursuriati Jamil &
Cyber Security Center, Universiti Pertahanan Nasional Malaysia, Kem Sungai Besi, 57000, Kuala Lumpur, Malaysia
Tengku Mohd Tengku Sembok

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jeong, KY., Yi, W., Seol, JW., Choi, J., Lee, KS. (2014). Leveraging Wikipedia and Context Features for Clinical Event Extraction from Mixed-Language Discharge Summary. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-12844-3_26
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12843-6
Online ISBN: 978-3-319-12844-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics