Skip to main content

Leveraging Wikipedia and Context Features for Clinical Event Extraction from Mixed-Language Discharge Summary

  • Conference paper
  • 1418 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8870))

Abstract

Unstructured clinical texts contain patients’ disease related narratives, but it is required elaborate work to mine the kind of information. Especially for the classification of semantic types of a clinical term, implementations of domain knowledge from resources such as the Unified Medical Language System (UMLS) are essential. The UMLS has a limitation in dealing with other languages. In this paper, we leverage Wikipedia as well as UMLS for clinical event extraction, especially from clinical narratives written in mixed-language. Semantic features for clinical terms are extracted based on semantic networks of hierarchical categories in Wikipedia. Semantic types for Korean clinical terms are detected by using translation links and semantic networks in Wikipedia. An additional remarkable feature is a controlled vocabulary of clue words which can be contextual evidence to determine clinical semantic types of a word. The experimental result on 150 discharge summaries written in English and Korean showed 75.9% in F1-measure. This result shows that the proposed features are effective for clinical event extraction.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chaudhry, B., Wang, J., Wu, S., Maglione, M., Mojica, W., Roth, E., Shekelle, P.G.: Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Annals of Internal Medicine 144(10), 742–752 (2006)

    Article  Google Scholar 

  2. Goodwin, T., Harabagiu, S.M.: The Impact of Belief Values on the Identification of Patient Cohorts. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 155–166. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. Park, H., Choi, J.: V-Model: A New Innovative Model to Chronologically Visualize Narrative Clinical Texts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 453–462. ACM (2012)

    Google Scholar 

  4. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)

    Google Scholar 

  5. Xu, Y., Hong, K., Tsujii, J., Eric, I., Chang, C.: Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. Journal of the American Medical Informatics Association 19(5), 824–832 (2012)

    Article  Google Scholar 

  6. Sun, W., Rumshisky, A., Uzuner, O.: Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. Journal of the American Medical Informatics Association 20(5), 806–813 (2013)

    Article  Google Scholar 

  7. Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association 18(5), 552–556 (2011)

    Article  Google Scholar 

  8. Jang, H., Song, S.-k., Myaeng, S.-H.: Text Mining for Medical Documents Using a Hidden Markov Model. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 553–559. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Kristjansson, T., Culotta, A., Viola, P., McCallum, A.: Interactive information extraction with constrained conditional random fields. In: AAAI, vol. 4, pp. 412–418 (2004)

    Google Scholar 

  10. Rink, B., Harabagiu, S., Roberts, K.: Automatic extraction of relations between medical concepts in clinical texts. Journal of the American Medical Informatics Association 18(5), 594–600 (2011)

    Article  Google Scholar 

  11. Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 296–303. ACL (2006)

    Google Scholar 

  12. UMLS Knowledge Base, http://www.nlm.nih.gov/research/umls

  13. MESH Knowledge Base, http://www.ncbi.nlm.nih.gov/mesh

  14. SNOMED CT, http://www.ihtsdo.org/snomed-ct

  15. Wang, P., Domeniconi, C.: Building semantic kernels for text classification using Wikipedia. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713–721. ACM (2008)

    Google Scholar 

  16. Oh, H.S., Kim, J.B., Myaeng, S.H.: Extracting targets and attributes of medical findings from radiology reports in a mixture of languages. In: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 550–552. ACM (2011)

    Google Scholar 

  17. MALLET: A Machine Learning for Language Toolkit, http://mallet.cs.umass.edu

  18. Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Natural language processing using very large corpora. Text, Speech and Language Technology, vol. 11, pp. 157–176. Springer, Netherlands (1999)

    Chapter  Google Scholar 

  19. GENIA Tagger, http://www.nactem.ac.uk/tsujii/GENIA/tagger/

  20. KLT: Korean Language Technology, http://nlp.kookmin.ac.kr/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Jeong, KY., Yi, W., Seol, JW., Choi, J., Lee, KS. (2014). Leveraging Wikipedia and Context Features for Clinical Event Extraction from Mixed-Language Discharge Summary. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12844-3_26

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12843-6

  • Online ISBN: 978-3-319-12844-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics