Skip to main content

A CRFs-Based Approach Empowered with Word Representation Features to Learning Biomedical Named Entities from Medical Text

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10676))

Abstract

Targeting at identifying specific types of entities, biomedical named entity recognition is a fundamental task of biomedical text processing. This paper presents a CRFs-based approach to learning disease entities by identifying their boundaries in texts. Two types of word representation features are proposed and used including word embedding features and cluster-based features. In addition, an external disease dictionary feature is also explored in the learning process. Based on a publically available NCBI disease corpus, we evaluate the performance of the CRFs-based model with the combination of these word representation features. The results show that using these features can significantly improve BNER performance with an increase of 24.7% on F1 measure, demonstrating the effectiveness of the proposed features and the feature-empowered approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://taku910.github.io/crfpp/.

  2. 2.

    https://www.ncbi.nlm.nih.gov/pubmed/.

  3. 3.

    http://www.nactem.ac.uk/GENIA/tagger/.

  4. 4.

    https://nlp.stanford.edu/projects/glove/.

  5. 5.

    https://github.com/percyliang/brown-cluster/.

  6. 6.

    https://www.pharmgkb.org/downloads/.

  7. 7.

    For 2013 NIH corpus, it takes about 19 h to cluster words into 500 clusters, but 89 h into 1000 clusters on a machine with i5 CPU and 8 GB memory.

References

  1. Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning support vector machines for biomedical named entity recognition. In: Proceedings of the ACL Workshop on Natural Language Processing in the Biomedical Domain, vol. 3, pp. 1–8 (2002)

    Google Scholar 

  2. Athenikos, S.J., Han, H.: Biomedical question answering: A survey. Comput. Methods Prog. Biomed. 99, 1–24 (2010)

    Article  Google Scholar 

  3. Leaman, R., Gonzalez, G.: BANNER: an executable survey of advances in biomedical named entity recognition. In: Pacific Symposium on Biocomputing, vol. 13, pp. 652–663 (2008)

    Google Scholar 

  4. Yao, L., Liu, H., Liu, Y., Li, X., Anwar, M.W.: Biomedical named entity recognition based on deep neural network. Int. J. Hybrid Inf. Technol. 8(8), 279–288 (2015)

    Article  Google Scholar 

  5. Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in biomedical named entity recognition tasks. Biomed. Res. Int. 1–6 (2014)

    Google Scholar 

  6. Wang, X., Yang, C., Guan, R.: A comparative study for biomedical named entity recognition. Int. J. Mach. Learn.Cybern., 1–10 (2015)

    Google Scholar 

  7. Li, K., Ai, W., Tang, Z., Zhang, F., Jiang, L., Li, K., Hwang, K.: Hadoop recognition of biomedical named entity using conditional random fields. IEEE Trans. Parallel Distrib. Syst. 26(11), 3040–3051 (2015)

    Article  Google Scholar 

  8. Fries, J., Wu, S., Ratner, A., Ré, C.: SwellShark: a generative model for biomedical named entity recognition without labeled data (2017). arXiv preprint arXiv:1704.06360

  9. Zhang, S., Elhadad, N.: Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J. Biomed. Inf. 46(6), 1088–1098 (2013)

    Article  Google Scholar 

  10. Kuksa, P.P., Qi, Y.: Semi-supervised bio-named entity recognition with word-codebook learning. In: Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 25–36. Society for Industrial and Applied Mathematics (2010)

    Google Scholar 

  11. Munkhdalai, T., Li, M., Yun, U., Namsrai, O.E., Ryu, K.H.: An active co-training algorithm for biomedical named-entity recognition. JIPS 8(4), 575–588 (2012)

    Google Scholar 

  12. Munkhdalai, T., Li, M., Batsuren, K., Park, H.A., Choi, N.H., Ryu, K.H.: Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations. J. Cheminf. 7(1), 1–8 (2015)

    Article  Google Scholar 

  13. Gridach, M.: Character-level neural network for biomedical named entity recognition. J. Biomed. Inf. 70, 85–91 (2017)

    Article  Google Scholar 

  14. Vlachos, A.: Tackling the BioCreative2 gene mention task with conditional random fields and syntactic parsing. In: Proceedings of the Second BioCreative Challenge Workshop, pp. 85–87 (2007)

    Google Scholar 

  15. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  16. John, V.: A survey of neural network techniques for feature extraction from text (2017). http://arxiv.org/abs/1704.08531

  17. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, vol. 8, pp. 282–289 (2001)

    Google Scholar 

  18. Jain, D.: Supervised named entity recognition for clinical data. CLEF 2015 Online Working Notes (2015)

    Google Scholar 

  19. Wang, S.K., Li, S., Chen, T.: Recognition of Chinese medicine named entity based on condition random field. J. Xiamen Univ. 48(3), 359–364 (2009)

    Google Scholar 

  20. Zweig, G., Nguyen, P., Van Compernolle, D., et al.: Speech recognition with segmental conditional random fields: a summary of the JHU CLSP 2010 summer workshop. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5044–5047 (2011)

    Google Scholar 

  21. Wallach, H.M.: Conditional random fields: an introduction. University of Pennsylvania (2004)

    Google Scholar 

  22. Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inf. 47, 1–10 (2014)

    Article  Google Scholar 

  23. Zhao, H., Huang, C.-N., Li, M.: An improved chinese word segmentation system with conditional random field. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 162–165 (2006)

    Google Scholar 

Download references

Acknowledgements

The work was substantially supported by the National Natural Science Foundation of China (Nos. 61572145 and 61403088), the Frontier and Key Technology Innovation Special Grant of Guangdong Province (No. 2014B010118005), the Public Interest Research and Capability Building Grant of Guangdong Province (No. 2014A020221039), and the Innovative School Project in Higher Education of Guangdong Province (No.YQ2015062).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shengyi Jiang or Tianyong Hao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xie, W., Fu, S., Jiang, S., Hao, T. (2017). A CRFs-Based Approach Empowered with Word Representation Features to Learning Biomedical Named Entities from Medical Text. In: Huang, TC., Lau, R., Huang, YM., Spaniol, M., Yuen, CH. (eds) Emerging Technologies for Education. SETE 2017. Lecture Notes in Computer Science(), vol 10676. Springer, Cham. https://doi.org/10.1007/978-3-319-71084-6_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-71084-6_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-71083-9

  • Online ISBN: 978-3-319-71084-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics