Abstract
The named entities of agricultural diseases and pests are featured by complex word-formation and universal phenomena of word combination and entity embedding; in particular, in the domain of Chinese agricultural diseases and pests, there exist a lot of problems including various entity naming modes, fuzzy entity boundary, inadequate feature extraction and inconsistent labeling of entity boundary. To address the above problems, this article combined discourse topic and attention mechanism and proposed the Attention-based SoftLexicon with TF-IDF (ASLT) for agricultural diseases and pests entity recognition. By dividing the words sets based on the positions of characters in the words, merging the discourse topic features into the calculation of lexical information, and introducing the attention mechanism, the recognition accuracy of Chinese agricultural diseases and pests entities can be enhanced. In order to improve the interpretability of the model, we designed a flow chart to explain the major principles and steps, and explained the model through visual methods. This article selected 1061 Chinese agricultural news texts and constructed the Corpus of Chinese Named Entities of Diseases and Pests (CCNEDP), in which 7806 agricultural diseases and pests named entities in total were labeled. According to the present experimental results, the proposed ASLT method can effectively recognize the entities in Chinese agricultural texts and achieve favorable recognition on CCNEDP, with the recognition accuracy, the recall rate and the value of F1 of 93.57, 92.79 and 93.18%, respectively. By contrast with the other entity recognition methods, ASLT shows enhanced recognition performance in terms of accuracy and operating efficiency. The implementation of this work is publicly available at https://github.com/azureskymoon/Lexicon-TFIDF-DTopic-master/tree/master.
Similar content being viewed by others
References
Liu L, Wang DB (2018) A review on named entity recognition. J China Soc Sci Tech Inf 37(3):329–340
Liu Q, Li Y, Duan H (2016) Knowledge graph construction techniques. J Comput Res Dev 53(3):582–600
Pan SJ, Toh Z, Su J (2013) Transfer joint embedding for cross-domain named entity recognition. ACM Trans Inf Syst 31(2):1–27
Zhou, Z, Zhang H (2019) Research on entity relationship extraction in financial and economic field based on deep learning. In: 2018 IEEE 4th International Conference on Computer and Communications (ICCC), pp 2430–2435
Kafle S, Silva ND, Dou D (2020) An overview of utilizing knowledge bases in neural networks for question answering. Inf Syst Front 22(5):1095–1111
Zhang J, Wu Q, Yang X Y, Wang B C, Wu X W, Xu X Y, Lu Q (2018) Chinese agricultural named entity recognition based on conditional random fields. Comput Modernization (1):123–126
Guo X, Zhou H, Su J, Hao X, Li L (2020) Chinese agricultural diseases and pests named entity recognition with multi-scale local context features and self-attention mechanism. Comput Electron Agric 179(5):105830
Sun JJ, Yu H, Feng YH, Peng S, Cheng M, Lu X L, Dong WT, Cui Z (2018) Recognition of nominated fishery domain entity based on deep learning architectures. J Dalian Ocean Univ 33(2):265–269
Shen L, Jiang H, Hu B, Xie Y (2020) A study on joint entity recognition and relation extraction for rice diseases pests weeds and drugs. J Nanjing Agric Univ 43(06):1151–1161
Ma R, Peng M, Zhang Q, Wei Z, Huang X (2020) Simplify the usage of Lexicon in Chinese NER. In: Proceedings of the 58th annual meeting of the association for computational linguistics
Krogh A, Larsson B, Heijne GV, Sonnhammer E (2001) Predicting transmembrane protein topology with a hidden markov model: application to complete genomes - sciencedirect. J Mol Biol 305(3):567–580
Chang CC, Lin CJ (2007) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3, article 27)
Lafferty J, Mccallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning, pp 282–289
Li X, Wei XH, Jia L, Chen X, Liu L, Zhang YE (2017) Recognition of crops, diseases and pesticides named entities in Chinese based in conditional random fields. Trans Chinese Soc Agri Mach 48(S1):178–185
Huang N, Huang H, Wang RJ (2017) Agriculture-related product name extraction and category labeling based on ontology and conditional random field. J Comput Appl 37(1):233–238
Qin Y, Shen GW, Zhao WB, Chen YP, Miao YU, Jin X (2019) A network security entity recognition method based on feature template and CNN-BiLSTM-CRF. Front Inform Technol Electr Eng 020(006):872–884
Cho M, Ha J, Park C, Park S (2020) Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition. J Biomed Inf 103:103381
Luo L, Yang Z, Yang P, Zhang Y, Wang L, Lin H, Wang J (2017) An attention-based bilstm-crf approach to document-level chemical named entity recognition. Bioinformatics 34:1381–1388
Xu K, Yang Z, Kang P, Wang Q, Liu W (2019) Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition. Comput Biology Med 108:122–132
He B, Guan Y (2019) Character-based CRF for medical entity recognition. Intell Comput Appl 9(2):130–134
Yin X, Zhao H, Zhao J, Yao W, Huang Z (2020) Multi-neural network collaboration for Chinese military named entity recognition. J Tsinghua Univ (Sci Technol) 60(8):648–655
Li Y, Zou L, Liu W, Wang X (2020) Research on chinese clinical named entity recognition: lattice lstm with contextualized character representations. JMIR Med Inform 8(9):e19848
Peng M, Ma R, Zhang Q, Zhao L, Huang X (2020) Toward recognizing more entity types in NER: an efficient implementation using only entity lexicons. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp 678–688
Na SH, Kim H, Min J, Kim K (2019) Improving lstm crfs using character-based compositions for korean named entity recognition. Comput Speech Lang 54:106–121
Feng YH, Hong YU, Sun G, Sun JJ (2018) Named entity recognition method based on BLSTM. Comput Sci 45(2):261–268
Le HQ, Nguyen TM, Vu ST, Dang TH (2018) D3ner: biomedical named entity recognition using crf-bilstm improved with fine-tuned embeddings of various linguistic information. Bioinformatics 34:3539–3546
Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Informat Theory 13(2):260–269
Zhong Z, Li J, Clausi DA, Wong A (2019) Generative adversarial networks and conditional random fields for hyperspectral image classification. IEEE Transactions on Cybernetics, pp 99
Li X, Yan H, Qiu X, Huang X (2020) FLAT: Chinese NER using flat-lattice transformer. In: Proceedings of the 58th annual meeting of the association for computational linguistics
Gui T, Zou Y, Zhang Q, Peng M, Huang X (2019) A lexicon-based graph neural network for Chinese NER. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 1040–1050
Acknowledgements
This work is partially supported by the Natural Science Foundation of China under Grant (31771679, 31671589), Major Science and Technology Project of Anhui Province, China, under Grant (18030901034, 201904e01020006), Natural Science Foundation of Anhui Province, China (2108085MF209), the Key Laboratory of Agricultural Electronic Commerce, Ministry of Agriculture of China under Grant (AEC2018001, AEC2021001), University collaborative innovation project of Anhui Province, China(GXXT-2019-013), Natural Science Research Project of Anhui Provincial Department of Education (KJ2020A0107, KJ2021A1550).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, C., Gao, J., Rao, H. et al. Named entity recognition (NER) for Chinese agricultural diseases and pests based on discourse topic and attention mechanism. Evol. Intel. 17, 457–466 (2024). https://doi.org/10.1007/s12065-022-00727-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-022-00727-w