Abstract
Targeting at identifying specific types of entities, biomedical named entity recognition is a fundamental task of biomedical text processing. This paper presents a CRFs-based approach to learning disease entities by identifying their boundaries in texts. Two types of word representation features are proposed and used including word embedding features and cluster-based features. In addition, an external disease dictionary feature is also explored in the learning process. Based on a publically available NCBI disease corpus, we evaluate the performance of the CRFs-based model with the combination of these word representation features. The results show that using these features can significantly improve BNER performance with an increase of 24.7% on F1 measure, demonstrating the effectiveness of the proposed features and the feature-empowered approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
For 2013 NIH corpus, it takes about 19 h to cluster words into 500 clusters, but 89 h into 1000 clusters on a machine with i5 CPU and 8 GB memory.
References
Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning support vector machines for biomedical named entity recognition. In: Proceedings of the ACL Workshop on Natural Language Processing in the Biomedical Domain, vol. 3, pp. 1–8 (2002)
Athenikos, S.J., Han, H.: Biomedical question answering: A survey. Comput. Methods Prog. Biomed. 99, 1–24 (2010)
Leaman, R., Gonzalez, G.: BANNER: an executable survey of advances in biomedical named entity recognition. In: Pacific Symposium on Biocomputing, vol. 13, pp. 652–663 (2008)
Yao, L., Liu, H., Liu, Y., Li, X., Anwar, M.W.: Biomedical named entity recognition based on deep neural network. Int. J. Hybrid Inf. Technol. 8(8), 279–288 (2015)
Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in biomedical named entity recognition tasks. Biomed. Res. Int. 1–6 (2014)
Wang, X., Yang, C., Guan, R.: A comparative study for biomedical named entity recognition. Int. J. Mach. Learn.Cybern., 1–10 (2015)
Li, K., Ai, W., Tang, Z., Zhang, F., Jiang, L., Li, K., Hwang, K.: Hadoop recognition of biomedical named entity using conditional random fields. IEEE Trans. Parallel Distrib. Syst. 26(11), 3040–3051 (2015)
Fries, J., Wu, S., Ratner, A., Ré, C.: SwellShark: a generative model for biomedical named entity recognition without labeled data (2017). arXiv preprint arXiv:1704.06360
Zhang, S., Elhadad, N.: Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J. Biomed. Inf. 46(6), 1088–1098 (2013)
Kuksa, P.P., Qi, Y.: Semi-supervised bio-named entity recognition with word-codebook learning. In: Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 25–36. Society for Industrial and Applied Mathematics (2010)
Munkhdalai, T., Li, M., Yun, U., Namsrai, O.E., Ryu, K.H.: An active co-training algorithm for biomedical named-entity recognition. JIPS 8(4), 575–588 (2012)
Munkhdalai, T., Li, M., Batsuren, K., Park, H.A., Choi, N.H., Ryu, K.H.: Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations. J. Cheminf. 7(1), 1–8 (2015)
Gridach, M.: Character-level neural network for biomedical named entity recognition. J. Biomed. Inf. 70, 85–91 (2017)
Vlachos, A.: Tackling the BioCreative2 gene mention task with conditional random fields and syntactic parsing. In: Proceedings of the Second BioCreative Challenge Workshop, pp. 85–87 (2007)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on EMNLP, pp. 1532–1543 (2014)
John, V.: A survey of neural network techniques for feature extraction from text (2017). http://arxiv.org/abs/1704.08531
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, vol. 8, pp. 282–289 (2001)
Jain, D.: Supervised named entity recognition for clinical data. CLEF 2015 Online Working Notes (2015)
Wang, S.K., Li, S., Chen, T.: Recognition of Chinese medicine named entity based on condition random field. J. Xiamen Univ. 48(3), 359–364 (2009)
Zweig, G., Nguyen, P., Van Compernolle, D., et al.: Speech recognition with segmental conditional random fields: a summary of the JHU CLSP 2010 summer workshop. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5044–5047 (2011)
Wallach, H.M.: Conditional random fields: an introduction. University of Pennsylvania (2004)
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inf. 47, 1–10 (2014)
Zhao, H., Huang, C.-N., Li, M.: An improved chinese word segmentation system with conditional random field. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 162–165 (2006)
Acknowledgements
The work was substantially supported by the National Natural Science Foundation of China (Nos. 61572145 and 61403088), the Frontier and Key Technology Innovation Special Grant of Guangdong Province (No. 2014B010118005), the Public Interest Research and Capability Building Grant of Guangdong Province (No. 2014A020221039), and the Innovative School Project in Higher Education of Guangdong Province (No.YQ2015062).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Xie, W., Fu, S., Jiang, S., Hao, T. (2017). A CRFs-Based Approach Empowered with Word Representation Features to Learning Biomedical Named Entities from Medical Text. In: Huang, TC., Lau, R., Huang, YM., Spaniol, M., Yuen, CH. (eds) Emerging Technologies for Education. SETE 2017. Lecture Notes in Computer Science(), vol 10676. Springer, Cham. https://doi.org/10.1007/978-3-319-71084-6_61
Download citation
DOI: https://doi.org/10.1007/978-3-319-71084-6_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71083-9
Online ISBN: 978-3-319-71084-6
eBook Packages: Computer ScienceComputer Science (R0)