skip to main content
10.1145/3239438.3239445acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmhiConference Proceedingsconference-collections
research-article

Towards Automated Knowledge Discovery of Hepatocellular Carcinoma: Extract Patient Information from Chinese Clinical Reports

Authors Info & Claims
Published:08 June 2018Publication History

ABSTRACT

Objectives: To accurately determine significant prognostic risk factors, patient information must be quantified accurately according to their extent of disease. An essential step for prediction of prognostic risk factors requires the determination of patient features which are typically hidden in electronic medical record(EMR). The goal of this study is to extract clinical entities of Chinese clinical reports, enabling automated hepatocellular carcinoma knowledge extraction.

Materials and Methods: In this paper, we annotated hepatocellular carcinoma corpora with patient records from EMR database. We present an information extraction solution based on assembled method. Our evaluation dataset contains 3996 training sentences and 1570 test sentences. The evaluation metrics are precision, recall, F1 of extract matching.

Results and Conclusions: NER of admission reports, radiology reports and discharge summaries with F1 of 0.8449, 0.5935 and 0.7320 respectively. RE of overall F1 is 0.9129. This study prepares a foundation for larger population studies to identify clinical features of hepatocellular carcinoma.

References

  1. Wang, Y., et al. 2014. Supervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: an empirical study. J Biomed Inform. 47: p. 91--104.Google ScholarGoogle ScholarCross RefCross Ref
  2. Zhang, S. and N. Elhadad. 2013. Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J Biomed Inform. 46(6): p. 1088--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Han, L.F., D.F. Wong, and L.S. Chao, Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics. 2013: Springer Berlin Heidelberg. 74--85.Google ScholarGoogle Scholar
  4. Sun, C., et al. 2006. Biomedical Named Entities Recognition Using Conditional Random Fields Model. in Fuzzy Systems and Knowledge Discovery, Third International Conference, FSKD 2006, Xi'an, China, September 24-28, 2006, Proceedings. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Zhao, S. 2004. Named entity recognition in biomedical texts using an HMM model. in International Joint Workshop on Natural Language Processing in Biomedicine and ITS Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Su, J. and J. Su. 2002. Named entity recognition using an HMM-based chunk tagger. in Meeting on Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lafferty, et al., Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. 2001.Google ScholarGoogle Scholar
  8. Varone, M., et al. 2017. Conditional random fields with semantic enhancement for named-entity recognition. in International Conference on Web Intelligence, Mining and Semantics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. YukunChen, et al. 2011. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. Journal of the American Medical Informatics Association Jamia. 18(5): p. 601.Google ScholarGoogle ScholarCross RefCross Ref
  10. Settles, B. 2004. Biomedical named intity recognition using conditional random fields and rich feature sets. In Proceedings of COLING 2004, International Joint Workshop On Natural Language Processing in Biomedicine and its Applications (NLPBA. p. 104--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xia, Y. and Q. Wang. Clinical Named Entity Recognition: ECUST in the CCKS-2017 Shared Task 2.Google ScholarGoogle Scholar
  12. Wu, J., et al. Clinical Named Entity Recognition via Bi-directional LSTM-CRF Model.Google ScholarGoogle Scholar
  13. Author, et al. Chinese Named Entity Recognition.Google ScholarGoogle Scholar
  14. Jianglu Hu, X.S., Zengjian Liu. HITSZ_CNER: A hybrid system for entity recognition from Chinese clinical text.Google ScholarGoogle Scholar
  15. Kingma, D.P. and J. Ba. 2014. Adam: A Method for Stochastic Optimization. Computer Science.Google ScholarGoogle Scholar
  16. Shao, Y., et al. 2017. Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF.Google ScholarGoogle Scholar
  17. Cogswell, M., et al. 2015. Reducing Overfitting in Deep Networks by Decorrelating Representations. Computer Science.Google ScholarGoogle Scholar
  18. Srivastava, N., et al. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research. 15(1): p. 1929--1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Huang, Z., W. Xu, and K. Yu. 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. Computer Science.Google ScholarGoogle Scholar
  20. Mikolov, T., et al. 2013. Efficient Estimation of Word Representations in Vector Space. Computer Science.Google ScholarGoogle Scholar
  21. . http://brat.nlplab.org/.Google ScholarGoogle Scholar
  22. Uzuner, Ö., et al. 2011. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association Jamia. 18(5): p. 552.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Towards Automated Knowledge Discovery of Hepatocellular Carcinoma: Extract Patient Information from Chinese Clinical Reports

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICMHI '18: Proceedings of the 2nd International Conference on Medical and Health Informatics
      June 2018
      270 pages
      ISBN:9781450363891
      DOI:10.1145/3239438

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 June 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader