A Method for Identifying Local Drug Names in Xinjiang Based on BERT-BiLSTM-CRF

Song, Yuhang; Tian, Shengwei; Yu, Long

doi:10.3103/S0146411620030098

A Method for Identifying Local Drug Names in Xinjiang Based on BERT-BiLSTM-CRF

Published: 15 July 2020

Volume 54, pages 179–190, (2020)
Cite this article

Automatic Control and Computer Sciences Aims and scope Submit manuscript

Yuhang Song¹,
Shengwei Tian¹ &
Long Yu²

379 Accesses
11 Citations
Explore all metrics

Abstract

This paper proposes a BERT-BiLSTM-CRF Xinjiang local drug name recognition method embedded in the BERT (Bidirectional Encoder Representations from Transformers) pre-training language model. The method is pre-trained by the two-way Transformer structure. The training method of MaskLM is used to randomly select some Chinese characters of the input sequence to be replaced with special symbols. The word vector is dynamically generated according to the position information of Chinese characters in Xinjiang local drug names, and then the word vector sequence is input into two directions. The LSTM layer is trained to obtain the dependencies between the sequences. Finally, the CRF module takes the joint distribution probability of the entire marker sequence as the output, and obtains the global optimal test result. The model obtains the named entity recognition on the Xinjiang local drug corpus. The accuracy rate is 95.77%, the recall rate is 89.47%, and the F value is 92.52%. The experimental results show that BERT-BiLSTM-CRF can effectively improve the evaluation indexes of Xinjiang local drug name identification methods in practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid approach for named entity recognition in Chinese electronic medical record

Article Open access 09 April 2019

Korean clinical entity recognition from diagnosis text using BERT

Article Open access 30 September 2020

A Deep Learning Based Approach for Biomedical Named Entity Recognition Using Multitasking Transfer Learning with BiLSTM, BERT and CRF

Article 24 April 2024

REFERENCES

Nadeau, D. and Sekine, S., A survey of named entity recognition and classification, Lingvist. Invest., 2007, vol. 30, no. 1, pp. 3–26.
Article Google Scholar
Segun Taofeek Aroyehun and Gelbukh, A., Automatic identification of drugs and adverse drug reaction related tweets, Proceedings of the 3rd Social Media Mining for Health Applications (SMM4H) Workshop & Shared Task (ACL2018), 2018, pp. 54–55.
Peng, N.Y. and Dredze, M., Named entity recognition for Chinese social media with jointly trained embeddings, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, 2015, pp. 548–554.
He, J. and Wang, H., Chinese named entity recognition and word segmentation based on character, Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing, 2008.
Liu, Z., Zhu, C., and Zhao, T., Chinese named entity recognition with a sequence labeling approach: Based on characters, or based on words?, in Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence, Springer-Verlag Berlin Heidelberg, 2010.
Li, H., Hagiwara, M., Li, Q., et al., Comparison of the impact of word segmentation on name tagging for Chinese and Japanese, LREC, 2014, pp. 2532–2536.
Google Scholar
Yanan Lu, Yue Zhang, and Dong-Hong Ji, Multi-prototype Chinese character embedding, LREC, Berlin, 2016.
Google Scholar
Dong, C., Zhang, J., Zong, C., et al., Character-based LSTM-CRF with radical-level features for Chinese named entity recognition, in Natural Language Understanding and Intelligent Applications, Cham: Springer, 2016, pp. 239–250.
Google Scholar
Peng, N. and Dredze, M., Named entity recognition for Chinese social media with jointly trained embeddings, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 548–554.
He, H. and Sun, X., F-score driven max margin neural network for named entity recognition in Chinese social media, 2016. arXiv:1611.04234 [cs.CL]
Strubell, E., Verga, P., Belanger, D., et al., Fast and accurate entity recognition with iterated dilated convolutions, 2017. arXiv:1702.02098
Rei, M., Semi-supervised multitask learning for sequence labeling, 2017. arXiv:1704.07156
Omid Ghiasvand and Kate, R.J., Learning for clinical named entity recognition without manual annotations, Inf. Med. Unlocked, 2018, vol. 13, pp. 122–127.
Muhammad Khalifa and Khaled Shaalan, Character convolutions for Arabic named entity recognition with long short-term memory networks, Comput. Speech Lang., 2019, vol. 58, pp. 335–346.
Yao Chen, Changjiang Zhou, Tianxin Li, Hong Wu, Xia Zhao, Kai Ye, and Jun Liao, Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training, J. Biomed. Inf., 2019, vol. 96.
Vaswani, A., Shazeer, N., Parmar, N., et al., Attention is all you need, in Advances in Neural Information Processing Systems, Long Beach: NIPS, 2017, pp. 6000–6010.
Collobert, R., Bottou, J.W.L., Karlen, M., et al., Natural language processing (almost) from scratch, J. Mach. Learn. Res., 2011, vol. 12, pp. 2493–2537.
MATH Google Scholar
Li, L.S., Mao, T., Huang, D., et al., Hybrid models for Chinese named entity recognition, Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, Beijing, 2006, pp. 72–78.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J., Distributed representations of words and phrases and their compositionality, 2013. arXiv:1310.4546
Xuezhe Ma and Hovy, E., End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL2016), 2016, pp. 1064–1074.
Chen, Z.G., He, P.L., Sun, Y.H., et al., Research and implementation of text classification system based on VSP, J. Chin. Inf. Process., 2005, vol. 19, no. 1, pp. 37–41.
Article Google Scholar

Download references

ACKNOWLEDGMENTS

I would like to thank Yang Qimeng, Hu Wei, Kang Keming, Wang Xiaozhuo, Jiang Yuan and other students for their help and support in this article. I would like to extend my sincere gratitude and highest respect to them.

Funding

This work was supported by National Natural Science Foundation of China (nos. 61563051, 61662074, 61262064), The Key Project of Nation-al Natural Science Foundation of China (no. 61331011), Xinjiang Uygur Autonomous Region Scientific, Technological Personnel Training Project (no. QN2016YX0051) and Tianshan Excellent Youth Fund of Xinjiang Autonomous Region (Q011).

Author information

Authors and Affiliations

School of Software, Xinjiang University, 830008, Urumqi, China
Yuhang Song & Shengwei Tian
Network Center, Xinjiang University, 830046, Urumqi, China
Long Yu

Authors

Yuhang Song
View author publications
You can also search for this author in PubMed Google Scholar
Shengwei Tian
View author publications
You can also search for this author in PubMed Google Scholar
Long Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengwei Tian.

Ethics declarations

The authors declare that they have no conflicts of interest.

About this article

Cite this article

Song, Y., Tian, S. & Yu, L. A Method for Identifying Local Drug Names in Xinjiang Based on BERT-BiLSTM-CRF. Aut. Control Comp. Sci. 54, 179–190 (2020). https://doi.org/10.3103/S0146411620030098

Download citation

Received: 15 August 2019
Revised: 17 January 2020
Accepted: 17 January 2020
Published: 15 July 2020
Issue Date: May 2020
DOI: https://doi.org/10.3103/S0146411620030098

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions