Skip to main content

ND-NER: A Named Entity Recognition Dataset for OSINT Towards the National Defense Domain

  • Conference paper
  • First Online:
  • 616 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1792))

Abstract

The public data on the Internet contains a large amount of high-value open source intelligence (OSINT) for the national defense. As the fundamental information extraction task, Named Entity Recognition (NER) plays a key role in question answering systems, knowledge graphs and reasoning. However, NER for the national defense domain achieves little progress due to unavailable datasets. Most previous methods mainly work on general-purpose datasets which lack insight into the particularity of the national defense. In this paper, we propose a Chinese NER dataset, ND-NER, for the national defense based on the data crawled from Sina Weibo. This is the first public human-annotation NER dataset for OSINT towards the national defense domain with 19 entity types and 418,227 tokens. We construct two baseline tasks and implement a series of popular models on our dataset. The empirical results show that ND-NER is a challenging dataset concerning the long entities with the nest structure, domain specialization, ambiguous entity boundaries, informality and colloquialism issues of social media. We believe that the published ND-NER at https://github.com/XinyanLi2016/ND-NER will encourage further exploring for OSINT towards the national defense domain.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Williams, H.J., Blum, I.: Defining second generation open source intelligence (OSINT) for the defense enterprise. Technical report, Rand Corporation (2018)

    Google Scholar 

  2. Feng, Y., Zhang, H., Hao, W.: Named entity recognition for military texts. Comput. Sci. 42(7), 15–18 (2015)

    Google Scholar 

  3. Wang, X., Yang, R., Feng, Y., Li, D., Hou, J.: A military named entity relation extraction approach based on deep learning. In: Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence, pp. 1–6 (2018)

    Google Scholar 

  4. Zhang, X., Cao, X., Gao, Y.: Named entity recognition of combat documents based on deep learning. Command Control Simul. 3, 121–128 (2019)

    Google Scholar 

  5. Xuezhen, Y., Hui, Z., Junbao, Z., Wanwei, Y., Zelin, H.: Multi-neural network collaboration for Chinese military named entity recognition. J. Tsinghua Univ. (Sci. Technol.) 60(8), 648–655 (2020)

    Google Scholar 

  6. Sang, E.T.K., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp. 142–147 (2003)

    Google Scholar 

  7. Derczynski, L., Nichols, E., van Erp, M., Limsopatham, N.: Results of the WNUT2017 shared task on novel and emerging entity recognition. In: Proceedings of the 3rd Workshop on Noisy User-generated Text, pp. 140–147 (2017)

    Google Scholar 

  8. Levow, G.A.: The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. pp. 108–117 (2006)

    Google Scholar 

  9. Peng, N., Dredze, M.: Named entity recognition for Chinese social media with jointly trained embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 548–554 (2015)

    Google Scholar 

  10. Doddington, G.R., Mitchell, A., Przybocki, M.A., Ramshaw, L.A., Strassel, S.M., Weischedel, R.M.: The automatic content extraction (ACE) program-tasks, data, and evaluation (2004)

    Google Scholar 

  11. Schirmer, P., Léveillé, J.: AI tools for military readiness (2021)

    Google Scholar 

  12. Luz de Araujo, P.H., de Campos, T.E., de Oliveira, R.R.R., Stauffer, M., Couto, S., Bermejo, P.: LeNER-Br: a dataset for named entity recognition in Brazilian legal text. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 313–323. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_32

    Chapter  Google Scholar 

  13. Leitner, E., Rehm, G., Schneider, J.M.: A dataset of German legal documents for named entity recognition. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4478–4485 (2020)

    Google Scholar 

  14. Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: Genia corpus-a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl_1), i180–i182 (2003)

    Google Scholar 

  15. Li, J., et al.: BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016 (2016)

    Google Scholar 

  16. Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107 (2012)

    Google Scholar 

  17. Wang, B., Lu, W.: Neural segmental hypergraphs for overlapping mention recognition. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 204–214 (2018)

    Google Scholar 

  18. Zheng, C., Cai, Y., Xu, J., Leung, H., Xu, G.: A boundary-aware neural model for nested named entity recognition. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics (2019)

    Google Scholar 

  19. Shibuya, T., Hovy, E.: Nested named entity recognition via second-best sequence learning and decoding. Trans. Assoc. Comput. Linguist. 8, 605–620 (2020)

    Article  Google Scholar 

  20. Li, X., Feng, J., Meng, Y., Han, Q., Wu, F., Li, J.: A unified MRC framework for named entity recognition. In: Meeting of the Association for Computational Linguistics (2020)

    Google Scholar 

  21. Li, J., et al.: Unified named entity recognition as word-word relation classification. In: Proceedings of the AAAI Conference on Artificial Intelligence (2022)

    Google Scholar 

  22. Kenton, J.D.M.W.C., Toutanova, L.K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)

    Google Scholar 

  23. Cui, Y., Che, W., Liu, T., Qin, B., Yang, Z.: Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3504–3514 (2021)

    Article  Google Scholar 

Download references

Acknowledgment

This work is supported by the National Key Research and Development Program (2019YFB2102600).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X., Li, D., Yang, Z., Zhao, H., Cai, W., Lin, X. (2023). ND-NER: A Named Entity Recognition Dataset for OSINT Towards the National Defense Domain. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1792. Springer, Singapore. https://doi.org/10.1007/978-981-99-1642-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-1642-9_31

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-1641-2

  • Online ISBN: 978-981-99-1642-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics