UZNER: A Benchmark for Named Entity Recognition in Uzbek

Yusufu, Aizihaierjiang; Jiang, Liu; Ainiwaer, Abidan; Teng, Chong; Yusufu, Aizierguli; Li, Fei; Ji, Donghong

doi:10.1007/978-3-031-44693-1_14

Aizihaierjiang Yusufu¹¹,
Liu Jiang¹¹,
Abidan Ainiwaer¹²,
Chong Teng¹¹,
Aizierguli Yusufu¹³,
Fei Li¹¹ &
…
Donghong Ji¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14302))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1107 Accesses

Abstract

Named entity recognition (NER) is a key task in natural language processing, and entity recognition can provide necessary semantic information for many downstream tasks. However, the performance of NER is often limited by the richness of language resources. For low-resource languages, NER usually performs poorly due to the lack of sufficient labeled data and pre-trained models. To address this issue, we manually constructed a large-scale, high-quality Uzbek NER corpus of Uzbek, and experimented with various NER methods. We improved state-of-the-art baseline models by introducing additional features and data translations. Data translation enables the model to learn richer syntactic structure and semantic information. Affix features provide knowledge at the morphological level and play an important role in identifying oversimplified low-frequency entity labels. Our data and models will be available to facilitate low-resource NER.

A. Yusufu and L. Jiang—Co author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Code is available at https://github.com/azhar520/NER.
2.
https://qalampir.uz.
3.
https://www.uz.

References

Adelani, D.I., et al.: Masakhaner: named entity recognition for African languages. Trans. Assoc. Comput. Linguist. 9, 1116–1131 (2021)
Article Google Scholar
Al-Thubaity, A., Alkhereyf, S., Alzahrani, W., Bahanshal, A.: Caraner: the Covid-19 Arabic named entity corpus. In: WANLP@EMNLP 2022, pp. 1–10 (2022)
Google Scholar
Balabantaray, R.: Name entity recognition in machine translation. Emerg. Technol 1(3), 3 (2010)
Google Scholar
Benikova, D., Biemann, C., Reznicek, M.: Nosta-d named entity annotation for German: Guidelines and dataset. In: LREC 2014, pp. 2524–2531 (2014)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT 2019, pp. 4171–4186 (2019)
Google Scholar
Fei, H., Ji, D., Li, B., Liu, Y., Ren, Y., Li, F.: Rethinking boundaries: end-to-end recognition of discontinuous mentions with pointer networks. In: AAAI 2021, pp. 12785–12793 (2021)
Google Scholar
Ji, B., et al.: A hybrid approach for named entity recognition in Chinese electronic medical record. BMC Med. Inform. Decis. Mak. 19(2), 149–158 (2019)
Google Scholar
Krallinger, M., Valencia, A.: Text-mining and information-retrieval services for molecular biology. Genome Biol. 6(7), 1–8 (2005)
Article Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: HLT-NAACL 2016, pp. 260–270 (2016)
Google Scholar
Leonandya, R., Ikhwantri, F.: Pretrained language model transfer on neural named entity recognition in indonesian conversational texts. arXiv preprint arXiv:1902.07938 (2019)
Li, J., et al.: Unified named entity recognition as word-word relation classification. In: AAAI 2022. vol. 36, pp. 10965–10973 (2022)
Google Scholar
Liu, J., et al.: TOE: a grid-tagging discontinuous NER model enhanced by embedding tag/word relations and more fine-grained tags. IEEE/ACM Trans. Audio, Speech, Lang. Process. 31, 177–187 (2022)
Article Google Scholar
Liu, L., Ding, B., Bing, L., Joty, S., Si, L., Miao, C.: Mulda: a multilingual data augmentation framework for low-resource cross-lingual NER. In: ACL/IJCNLP 2021, pp. 5834–5846 (2021)
Google Scholar
Liu, Y., et al.: Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguist. 8, 726–742 (2020)
Article Google Scholar
Lu, W., Roth, D.: Joint mention extraction and classification with mention hypergraphs. In: EMNLP 2015, pp. 857–867 (2015)
Google Scholar
Lu, Y., et al.: Unified structure generation for universal information extraction. In: ACL 2022, pp. 5755–5772 (2022)
Google Scholar
Mollá, D., Van Zaanen, M., Smith, D.: Named entity recognition for question answering. In: ALTA 2006, pp. 51–58 (2006)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Ringland, N., Dai, X., Hachey, B., Karimi, S., Paris, C., Curran, J.R.: NNE: a dataset for nested named entity recognition in english newswire. arXiv preprint arXiv:1906.01359 (2019)
Rosenfeld, J.S.: Scaling laws for deep learning. arXiv preprint arXiv:2108.07686 (2021)
Shen, Y., Ma, X., Tan, Z., Zhang, S., Wang, W., Lu, W.: Locate and label: a two-stage identifier for nested named entity recognition. In: ACL/IJCNLP 2021, pp. 2782–2794 (2021)
Google Scholar
Sun, P., Yang, X., Zhao, X., Wang, Z.: An overview of named entity recognition. In: IALP 2018, pp. 273–278. IEEE (2018)
Google Scholar
Tang, B., Hu, J., Wang, X., Chen, Q.: Recognizing continuous and discontinuous adverse drug reaction mentions from social media using LSTM-CRF. In: Proceedings of the Wireless Communications and Mobile Computing 2018 (2018)
Google Scholar
Tedeschi, S., Maiorca, V., Campolungo, N., Cecconi, F., Navigli, R.: Wikineural: combined neural and knowledge-based silver data creation for multilingual NER. In: EMNLP (Findings) 2021, pp. 2521–2533 (2021)
Google Scholar
Truong, T.H., Dao, M.H., Nguyen, D.Q.: Covid-19 named entity recognition for Vietnamese. arXiv preprint arXiv:2104.03879 (2021)
Xue, L., et al.: mt5: A massively multilingual pre-trained text-to-text transformer. In: NAACL-HLT 2021, pp. 483–498 (2021)
Google Scholar
Yan, H., Gui, T., Dai, J., Guo, Q., Zhang, Z., Qiu, X.: A unified generative framework for various NER subtasks. In: ACL/IJCNLP 2021, pp. 5808–5822 (2021)
Google Scholar

Download references

Acknowledgment

This work is supported by the Natural Science Program of Xinjiang Uygur Autonomous Region for the Construction of Innovation Environment (Talents and Bases) (Special Training of Scientific and Technological Talents of Ethnic Minorities)(2022D03001), the National Natural Science Foundation of China (No. 62176187; No. 61662081), the Major Projects of the National Social Science Foundation of China (No.11 &ZD189; No.14AZD11), the National Key Research and Development Program of China (No. 2017YFC1200500), the Research Foundation of Ministry of Education of China (No. 18JZD015).

Author information

Authors and Affiliations

Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, China
Aizihaierjiang Yusufu, Liu Jiang, Chong Teng, Fei Li & Donghong Ji
School of Information Management, Wuhan University, Wuhan, China
Abidan Ainiwaer
School of Computer Science and Technology, Xinjiang Normal University, Urumqi, China
Aizierguli Yusufu

Authors

Aizihaierjiang Yusufu
View author publications
You can also search for this author in PubMed Google Scholar
Liu Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Abidan Ainiwaer
View author publications
You can also search for this author in PubMed Google Scholar
Chong Teng
View author publications
You can also search for this author in PubMed Google Scholar
Aizierguli Yusufu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Li
View author publications
You can also search for this author in PubMed Google Scholar
Donghong Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Donghong Ji .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yusufu, A. et al. (2023). UZNER: A Benchmark for Named Entity Recognition in Uzbek. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-44693-1_14
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44692-4
Online ISBN: 978-3-031-44693-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

UZNER: A Benchmark for Named Entity Recognition in Uzbek