A CRFs-Based Approach Empowered with Word Representation Features to Learning Biomedical Named Entities from Medical Text

Xie, Wenxiu; Fu, Sihui; Jiang, Shengyi; Hao, Tianyong

doi:10.1007/978-3-319-71084-6_61

A CRFs-Based Approach Empowered with Word Representation Features to Learning Biomedical Named Entities from Medical Text

Wenxiu Xie¹⁸,
Sihui Fu¹⁸,
Shengyi Jiang¹⁸ &
…
Tianyong Hao^18,19

Conference paper
First Online: 18 November 2017

2368 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10676))

Abstract

Targeting at identifying specific types of entities, biomedical named entity recognition is a fundamental task of biomedical text processing. This paper presents a CRFs-based approach to learning disease entities by identifying their boundaries in texts. Two types of word representation features are proposed and used including word embedding features and cluster-based features. In addition, an external disease dictionary feature is also explored in the learning process. Based on a publically available NCBI disease corpus, we evaluate the performance of the CRFs-based model with the combination of these word representation features. The results show that using these features can significantly improve BNER performance with an increase of 24.7% on F1 measure, demonstrating the effectiveness of the proposed features and the feature-empowered approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://taku910.github.io/crfpp/.
2.
https://www.ncbi.nlm.nih.gov/pubmed/.
3.
http://www.nactem.ac.uk/GENIA/tagger/.
4.
https://nlp.stanford.edu/projects/glove/.
5.
https://github.com/percyliang/brown-cluster/.
6.
https://www.pharmgkb.org/downloads/.
7.
For 2013 NIH corpus, it takes about 19 h to cluster words into 500 clusters, but 89 h into 1000 clusters on a machine with i5 CPU and 8 GB memory.

References

Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning support vector machines for biomedical named entity recognition. In: Proceedings of the ACL Workshop on Natural Language Processing in the Biomedical Domain, vol. 3, pp. 1–8 (2002)
Google Scholar
Athenikos, S.J., Han, H.: Biomedical question answering: A survey. Comput. Methods Prog. Biomed. 99, 1–24 (2010)
Article Google Scholar
Leaman, R., Gonzalez, G.: BANNER: an executable survey of advances in biomedical named entity recognition. In: Pacific Symposium on Biocomputing, vol. 13, pp. 652–663 (2008)
Google Scholar
Yao, L., Liu, H., Liu, Y., Li, X., Anwar, M.W.: Biomedical named entity recognition based on deep neural network. Int. J. Hybrid Inf. Technol. 8(8), 279–288 (2015)
Article Google Scholar
Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in biomedical named entity recognition tasks. Biomed. Res. Int. 1–6 (2014)
Google Scholar
Wang, X., Yang, C., Guan, R.: A comparative study for biomedical named entity recognition. Int. J. Mach. Learn.Cybern., 1–10 (2015)
Google Scholar
Li, K., Ai, W., Tang, Z., Zhang, F., Jiang, L., Li, K., Hwang, K.: Hadoop recognition of biomedical named entity using conditional random fields. IEEE Trans. Parallel Distrib. Syst. 26(11), 3040–3051 (2015)
Article Google Scholar
Fries, J., Wu, S., Ratner, A., Ré, C.: SwellShark: a generative model for biomedical named entity recognition without labeled data (2017). arXiv preprint arXiv:1704.06360
Zhang, S., Elhadad, N.: Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J. Biomed. Inf. 46(6), 1088–1098 (2013)
Article Google Scholar
Kuksa, P.P., Qi, Y.: Semi-supervised bio-named entity recognition with word-codebook learning. In: Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 25–36. Society for Industrial and Applied Mathematics (2010)
Google Scholar
Munkhdalai, T., Li, M., Yun, U., Namsrai, O.E., Ryu, K.H.: An active co-training algorithm for biomedical named-entity recognition. JIPS 8(4), 575–588 (2012)
Google Scholar
Munkhdalai, T., Li, M., Batsuren, K., Park, H.A., Choi, N.H., Ryu, K.H.: Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations. J. Cheminf. 7(1), 1–8 (2015)
Article Google Scholar
Gridach, M.: Character-level neural network for biomedical named entity recognition. J. Biomed. Inf. 70, 85–91 (2017)
Article Google Scholar
Vlachos, A.: Tackling the BioCreative2 gene mention task with conditional random fields and syntactic parsing. In: Proceedings of the Second BioCreative Challenge Workshop, pp. 85–87 (2007)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on EMNLP, pp. 1532–1543 (2014)
Google Scholar
John, V.: A survey of neural network techniques for feature extraction from text (2017). http://arxiv.org/abs/1704.08531
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, vol. 8, pp. 282–289 (2001)
Google Scholar
Jain, D.: Supervised named entity recognition for clinical data. CLEF 2015 Online Working Notes (2015)
Google Scholar
Wang, S.K., Li, S., Chen, T.: Recognition of Chinese medicine named entity based on condition random field. J. Xiamen Univ. 48(3), 359–364 (2009)
Google Scholar
Zweig, G., Nguyen, P., Van Compernolle, D., et al.: Speech recognition with segmental conditional random fields: a summary of the JHU CLSP 2010 summer workshop. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5044–5047 (2011)
Google Scholar
Wallach, H.M.: Conditional random fields: an introduction. University of Pennsylvania (2004)
Google Scholar
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inf. 47, 1–10 (2014)
Article Google Scholar
Zhao, H., Huang, C.-N., Li, M.: An improved chinese word segmentation system with conditional random field. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 162–165 (2006)
Google Scholar

Download references

Acknowledgements

The work was substantially supported by the National Natural Science Foundation of China (Nos. 61572145 and 61403088), the Frontier and Key Technology Innovation Special Grant of Guangdong Province (No. 2014B010118005), the Public Interest Research and Capability Building Grant of Guangdong Province (No. 2014A020221039), and the Innovative School Project in Higher Education of Guangdong Province (No.YQ2015062).

Author information

Authors and Affiliations

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
Wenxiu Xie, Sihui Fu, Shengyi Jiang & Tianyong Hao
Collaborative Innovation Center for 21st-Century Maritime Silk Road Studies, Guangdong University of Foreign Studies, Guangzhou, China
Tianyong Hao

Authors

Wenxiu Xie
View author publications
You can also search for this author in PubMed Google Scholar
Sihui Fu
View author publications
You can also search for this author in PubMed Google Scholar
Shengyi Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Tianyong Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shengyi Jiang or Tianyong Hao .

Editor information

Editors and Affiliations

National Taichung University of Science and Technology, Taichung City, Taiwan
Tien-Chi Huang
City University of Hong Kong, Kowloon, Hong Kong
Rynson Lau
Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan
Yueh-Min Huang
University of Caen Normandie, Caen, France
Marc Spaniol
City University of Hong Kong, Kowloon, Hong Kong
Chun-Hung Yuen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, W., Fu, S., Jiang, S., Hao, T. (2017). A CRFs-Based Approach Empowered with Word Representation Features to Learning Biomedical Named Entities from Medical Text. In: Huang, TC., Lau, R., Huang, YM., Spaniol, M., Yuen, CH. (eds) Emerging Technologies for Education. SETE 2017. Lecture Notes in Computer Science(), vol 10676. Springer, Cham. https://doi.org/10.1007/978-3-319-71084-6_61

Download citation

DOI: https://doi.org/10.1007/978-3-319-71084-6_61
Published: 18 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71083-9
Online ISBN: 978-3-319-71084-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics