Skip to main content
Log in

A Method for Identifying Local Drug Names in Xinjiang Based on BERT-BiLSTM-CRF

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

This paper proposes a BERT-BiLSTM-CRF Xinjiang local drug name recognition method embedded in the BERT (Bidirectional Encoder Representations from Transformers) pre-training language model. The method is pre-trained by the two-way Transformer structure. The training method of MaskLM is used to randomly select some Chinese characters of the input sequence to be replaced with special symbols. The word vector is dynamically generated according to the position information of Chinese characters in Xinjiang local drug names, and then the word vector sequence is input into two directions. The LSTM layer is trained to obtain the dependencies between the sequences. Finally, the CRF module takes the joint distribution probability of the entire marker sequence as the output, and obtains the global optimal test result. The model obtains the named entity recognition on the Xinjiang local drug corpus. The accuracy rate is 95.77%, the recall rate is 89.47%, and the F value is 92.52%. The experimental results show that BERT-BiLSTM-CRF can effectively improve the evaluation indexes of Xinjiang local drug name identification methods in practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

Similar content being viewed by others

REFERENCES

  1. Nadeau, D. and Sekine, S., A survey of named entity recognition and classification, Lingvist. Invest., 2007, vol. 30, no. 1, pp. 3–26.

    Article  Google Scholar 

  2. Segun Taofeek Aroyehun and Gelbukh, A., Automatic identification of drugs and adverse drug reaction related tweets, Proceedings of the 3rd Social Media Mining for Health Applications (SMM4H) Workshop & Shared Task (ACL2018), 2018, pp. 54–55.

  3. Peng, N.Y. and Dredze, M., Named entity recognition for Chinese social media with jointly trained embeddings, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, 2015, pp. 548–554.

  4. He, J. and Wang, H., Chinese named entity recognition and word segmentation based on character, Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing, 2008.

  5. Liu, Z., Zhu, C., and Zhao, T., Chinese named entity recognition with a sequence labeling approach: Based on characters, or based on words?, in Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence, Springer-Verlag Berlin Heidelberg, 2010.

  6. Li, H., Hagiwara, M., Li, Q., et al., Comparison of the impact of word segmentation on name tagging for Chinese and Japanese, LREC, 2014, pp. 2532–2536.

    Google Scholar 

  7. Yanan Lu, Yue Zhang, and Dong-Hong Ji, Multi-prototype Chinese character embedding, LREC, Berlin, 2016.

    Google Scholar 

  8. Dong, C., Zhang, J., Zong, C., et al., Character-based LSTM-CRF with radical-level features for Chinese named entity recognition, in Natural Language Understanding and Intelligent Applications, Cham: Springer, 2016, pp. 239–250.

    Google Scholar 

  9. Peng, N. and Dredze, M., Named entity recognition for Chinese social media with jointly trained embeddings, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 548–554.

  10. He, H. and Sun, X., F-score driven max margin neural network for named entity recognition in Chinese social media, 2016. arXiv:1611.04234 [cs.CL]

  11. Strubell, E., Verga, P., Belanger, D., et al., Fast and accurate entity recognition with iterated dilated convolutions, 2017. arXiv:1702.02098

  12. Rei, M., Semi-supervised multitask learning for sequence labeling, 2017. arXiv:1704.07156

  13. Omid Ghiasvand and Kate, R.J., Learning for clinical named entity recognition without manual annotations, Inf. Med. Unlocked, 2018, vol. 13, pp. 122–127.

  14. Muhammad Khalifa and Khaled Shaalan, Character convolutions for Arabic named entity recognition with long short-term memory networks, Comput. Speech Lang., 2019, vol. 58, pp. 335–346.

  15. Yao Chen, Changjiang Zhou, Tianxin Li, Hong Wu, Xia Zhao, Kai Ye, and Jun Liao, Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training, J. Biomed. Inf., 2019, vol. 96.

  16. Vaswani, A., Shazeer, N., Parmar, N., et al., Attention is all you need, in Advances in Neural Information Processing Systems, Long Beach: NIPS, 2017, pp. 6000–6010.

  17. Collobert, R., Bottou, J.W.L., Karlen, M., et al., Natural language processing (almost) from scratch, J. Mach. Learn. Res., 2011, vol. 12, pp. 2493–2537.

    MATH  Google Scholar 

  18. Li, L.S., Mao, T., Huang, D., et al., Hybrid models for Chinese named entity recognition, Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, Beijing, 2006, pp. 72–78.

  19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J., Distributed representations of words and phrases and their compositionality, 2013. arXiv:1310.4546

  20. Xuezhe Ma and Hovy, E., End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL2016), 2016, pp. 1064–1074.

  21. Chen, Z.G., He, P.L., Sun, Y.H., et al., Research and implementation of text classification system based on VSP, J. Chin. Inf. Process., 2005, vol. 19, no. 1, pp. 37–41.

    Article  Google Scholar 

Download references

ACKNOWLEDGMENTS

I would like to thank Yang Qimeng, Hu Wei, Kang Keming, Wang Xiaozhuo, Jiang Yuan and other students for their help and support in this article. I would like to extend my sincere gratitude and highest respect to them.

Funding

This work was supported by National Natural Science Foundation of China (nos. 61563051, 61662074, 61262064), The Key Project of Nation-al Natural Science Foundation of China (no. 61331011), Xinjiang Uygur Autonomous Region Scientific, Technological Personnel Training Project (no. QN2016YX0051) and Tianshan Excellent Youth Fund of Xinjiang Autonomous Region (Q011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shengwei Tian.

Ethics declarations

The authors declare that they have no conflicts of interest.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, Y., Tian, S. & Yu, L. A Method for Identifying Local Drug Names in Xinjiang Based on BERT-BiLSTM-CRF. Aut. Control Comp. Sci. 54, 179–190 (2020). https://doi.org/10.3103/S0146411620030098

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411620030098

Navigation