ND-NER: A Named Entity Recognition Dataset for OSINT Towards the National Defense Domain

Li, Xinyan; Li, Dongxu; Yang, Zhihao; Zhao, Hui; Cai, Wei; Lin, Xi

doi:10.1007/978-981-99-1642-9_31

ND-NER: A Named Entity Recognition Dataset for OSINT Towards the National Defense Domain

Xinyan Li¹⁰,
Dongxu Li¹⁰,
Zhihao Yang¹⁰,
Hui Zhao^10,11,
Wei Cai¹² &
…
Xi Lin¹²

Conference paper
First Online: 14 April 2023

616 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1792))

Abstract

The public data on the Internet contains a large amount of high-value open source intelligence (OSINT) for the national defense. As the fundamental information extraction task, Named Entity Recognition (NER) plays a key role in question answering systems, knowledge graphs and reasoning. However, NER for the national defense domain achieves little progress due to unavailable datasets. Most previous methods mainly work on general-purpose datasets which lack insight into the particularity of the national defense. In this paper, we propose a Chinese NER dataset, ND-NER, for the national defense based on the data crawled from Sina Weibo. This is the first public human-annotation NER dataset for OSINT towards the national defense domain with 19 entity types and 418,227 tokens. We construct two baseline tasks and implement a series of popular models on our dataset. The empirical results show that ND-NER is a challenging dataset concerning the long entities with the nest structure, domain specialization, ambiguous entity boundaries, informality and colloquialism issues of social media. We believe that the published ND-NER at https://github.com/XinyanLi2016/ND-NER will encourage further exploring for OSINT towards the national defense domain.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Williams, H.J., Blum, I.: Defining second generation open source intelligence (OSINT) for the defense enterprise. Technical report, Rand Corporation (2018)
Google Scholar
Feng, Y., Zhang, H., Hao, W.: Named entity recognition for military texts. Comput. Sci. 42(7), 15–18 (2015)
Google Scholar
Wang, X., Yang, R., Feng, Y., Li, D., Hou, J.: A military named entity relation extraction approach based on deep learning. In: Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence, pp. 1–6 (2018)
Google Scholar
Zhang, X., Cao, X., Gao, Y.: Named entity recognition of combat documents based on deep learning. Command Control Simul. 3, 121–128 (2019)
Google Scholar
Xuezhen, Y., Hui, Z., Junbao, Z., Wanwei, Y., Zelin, H.: Multi-neural network collaboration for Chinese military named entity recognition. J. Tsinghua Univ. (Sci. Technol.) 60(8), 648–655 (2020)
Google Scholar
Sang, E.T.K., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp. 142–147 (2003)
Google Scholar
Derczynski, L., Nichols, E., van Erp, M., Limsopatham, N.: Results of the WNUT2017 shared task on novel and emerging entity recognition. In: Proceedings of the 3rd Workshop on Noisy User-generated Text, pp. 140–147 (2017)
Google Scholar
Levow, G.A.: The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. pp. 108–117 (2006)
Google Scholar
Peng, N., Dredze, M.: Named entity recognition for Chinese social media with jointly trained embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 548–554 (2015)
Google Scholar
Doddington, G.R., Mitchell, A., Przybocki, M.A., Ramshaw, L.A., Strassel, S.M., Weischedel, R.M.: The automatic content extraction (ACE) program-tasks, data, and evaluation (2004)
Google Scholar
Schirmer, P., Léveillé, J.: AI tools for military readiness (2021)
Google Scholar
Luz de Araujo, P.H., de Campos, T.E., de Oliveira, R.R.R., Stauffer, M., Couto, S., Bermejo, P.: LeNER-Br: a dataset for named entity recognition in Brazilian legal text. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 313–323. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_32
Chapter Google Scholar
Leitner, E., Rehm, G., Schneider, J.M.: A dataset of German legal documents for named entity recognition. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4478–4485 (2020)
Google Scholar
Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: Genia corpus-a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl_1), i180–i182 (2003)
Google Scholar
Li, J., et al.: BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016 (2016)
Google Scholar
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107 (2012)
Google Scholar
Wang, B., Lu, W.: Neural segmental hypergraphs for overlapping mention recognition. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 204–214 (2018)
Google Scholar
Zheng, C., Cai, Y., Xu, J., Leung, H., Xu, G.: A boundary-aware neural model for nested named entity recognition. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics (2019)
Google Scholar
Shibuya, T., Hovy, E.: Nested named entity recognition via second-best sequence learning and decoding. Trans. Assoc. Comput. Linguist. 8, 605–620 (2020)
Article Google Scholar
Li, X., Feng, J., Meng, Y., Han, Q., Wu, F., Li, J.: A unified MRC framework for named entity recognition. In: Meeting of the Association for Computational Linguistics (2020)
Google Scholar
Li, J., et al.: Unified named entity recognition as word-word relation classification. In: Proceedings of the AAAI Conference on Artificial Intelligence (2022)
Google Scholar
Kenton, J.D.M.W.C., Toutanova, L.K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
Google Scholar
Cui, Y., Che, W., Liu, T., Qin, B., Yang, Z.: Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3504–3514 (2021)
Article Google Scholar

Download references

Acknowledgment

This work is supported by the National Key Research and Development Program (2019YFB2102600).

Author information

Authors and Affiliations

Software Engineering Institute, East China Normal University, Shanghai, China
Xinyan Li, Dongxu Li, Zhihao Yang & Hui Zhao
Shanghai Key Laboratory of Trustworthy Computing, Shanghai, China
Hui Zhao
The 51st Research Institue of China Electronics Technology Group Corporation, Shanghai, China
Wei Cai & Xi Lin

Authors

Xinyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Dongxu Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhihao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Cai
View author publications
You can also search for this author in PubMed Google Scholar
Xi Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Zhao .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Li, D., Yang, Z., Zhao, H., Cai, W., Lin, X. (2023). ND-NER: A Named Entity Recognition Dataset for OSINT Towards the National Defense Domain. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1792. Springer, Singapore. https://doi.org/10.1007/978-981-99-1642-9_31

Download citation

DOI: https://doi.org/10.1007/978-981-99-1642-9_31
Published: 14 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1641-2
Online ISBN: 978-981-99-1642-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics