Abstract
Due to the significant differences between the entity identification rules in the field of ethnic ancient books and the existing methods, the general model has poor accuracy in identifying specific terms in the field entity extraction task and fails to effectively solve the problems of ambiguity and nesting of Chinese entities by using boundary information. In this paper, we construct a small-scale named entity corpus of ethnic ancient books and propose an Ethnic Naming Entity Recognition (ENER) model integrating entity boundary detection. In ENER, BERT model is used to pre-train the corpus of ancient book text annotation, Bidirectional Gate Recurrent Unit (BiGRU) encodes the contextual features of ancient books. Conditional Random Field (CRF) adds an auxiliary task of entity boundary detection based on named entity identification task to enhance model’s ability to identify entity boundaries and generates the named entity tag sequence of ancient books. Experiments on the corpus of ancient books named entities and other general Chinese data sets show the effectiveness of our approach. On the one hand, ENER has improved the accuracy, recall and F1 value by 2.09%, 1.62% and 1.85% respectively. Compared with the baseline BERT-BiLSTM-CRF model and achieved higher indicators than other models. On the other hand, ENER shows better effect on the recognition of ancient book named entities in small-scale corpus and it is also stable on Chinese general data sets. It can be applied in dealing with text containing specific terms in the ethnic field and promoted to more tasks in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sundheim, B.: Named entity task definition. In: Proceedings of Message Understanding Conference (1995)
Lin, Y., Shen, S.: Neural relation extraction with selective attention over instances. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, vol. 1, pp. 2124–2133 (2016)
Guo, X.: CG-ANER: enhanced contextual embeddings and glyph features-based agricultural named entity recognition. Comput. Electron. Agric. 194, 106776 (2022)
Wu, Z.: Summary of research on named entity recognition technology for electronic medical records. Comput. Eng. Appl. 58(21), 13–29 (2021)
Tong, Z.: Research on military domain named entity recognition based on pre training model. Front. Data Comput. 4(5), 120–128 (2022)
Ma, K.: Ontology-based BERT model for automated information extraction from geological hazard reports. J. Earth Sci. 34(5), 1390–1405 (2023)
Fan, G.: Analysis of hot topics and evolution trends of ancient books digitization research based on Knowledge Mapping. View Publ. 3(11), 85–87 (2020)
Yingjie Wang, F.: A survey of Chinese named entity recognition. J. Front. Comput. Sci. Technol. 17(2), 324–341 (2023)
Liu, C.F., Huang, C.S.: Mining local gazetteers of literary Chinese with CRF and pattern based methods for biographical information in Chinese history. In: 2015 IEEE International Conference on Big Data, Santa Clara, USA, pp. 1629–1638 (2015)
Khanam, M.H., Khudhus, M.A., Babu, M.S.P.: Named entity recognition using machine learning techniques for Telugu language. In: 2016 7th IEEE International Conference on Software Engineering and Service Science, Beijing, China, pp. 940–944 (2016)
Li, N.: Construction of an automatic extraction model for local chronicles and ancient book aliases based on conditional random fields. J. Chin. Inf. Process. 32(11), 41–48 (2018)
Hinton, G.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Liu, L.: Automatic extraction of traditional musical terms from intangible cultural heritage. Data Anal. Knowl. Disc. 4(12), 68–75 (2020)
Zhao, Z., Zhou, Z., Xing, W., Wu, J., Chang, Y., Li, B.: A neural framework for Chinese medical named entity recognition. In: Xu, R., De, W., Zhong, W., Tian, L., Bai, Y., Zhang, L.-J. (eds.) AIMS 2020. LNCS, vol. 12401, pp. 74–83. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59605-7_6
Lv, H., Ning, Y., Ning, Ke.: ALBERT-based Chinese named entity recognition. In: Yang, Y., Yu, L., Zhang, L.-J. (eds.) ICCC 2020. LNCS, vol. 12408, pp. 79–87. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59585-2_7
Xie, X.: Geological named entity recognition based on BERT and BiGRU-Attention - CRF model. Geol. Bull. China 42(5), 846–855 (2021)
Zhou, F.: Named entity recognition of ancient poems based on Albert-BiLSTM-MHA-CRF model. Wirel. Commun. Mob. Comput. 2022, 1–11 (2022)
Wang, Y.: Geotechnical engineering entity recognition based on BERT-BiGRU-CRF model. Earth Sci. 48(8), 3137–3150 (2023)
Li, X.: Named entity recognition method based on joint entity boundary detection. J. Hebei Univ. Sci. Technol. 44(1), 20–28 (2023)
Chun, C., Kong, F.: Enhancing entity boundary detection for better Chinese named entity recognition. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 2, pp. 20–25. Online (2021)
Devlin, J.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv arXiv:1810.04805v1, 11 October 2018
Zhang, Y., Yang, J.: Chinese NER using lattice LSTM. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, vol. 1, pp. 1554–1564 (2018)
Gui, T., Ma, R.: CNN-based Chinese NER with lexicon rethinking. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, pp. 4982–4988 (2019)
Xue, M., Yu, B.: Porous lattice transformer encoder for Chinese NER. In: Proceedings of the 28th International Conference on Computational Linguistics, vol. 1, pp. 3831–3841 (2020). Online
Wu, S., Song, X.: MECT: multi-metadata embedding based cross-transformer for Chinese named entity recognition. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1, pp. 1529–1539 (2021). Online
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, L., Feng, Z., Sun, N., Lu, Y. (2024). ENER: Named Entity Recognition Model for Ethnic Ancient Books Based on Entity Boundary Detection. In: Pan, X., Jin, T., Zhang, LJ. (eds) Cognitive Computing – ICCC 2023. ICCC 2023. Lecture Notes in Computer Science, vol 14207. Springer, Cham. https://doi.org/10.1007/978-3-031-51671-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-51671-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51670-2
Online ISBN: 978-3-031-51671-9
eBook Packages: Computer ScienceComputer Science (R0)