NER in Threat Intelligence Domain with TSFL

Wang, Xuren; Xiong, Zihan; Du, Xiangyu; Jiang, Jun; Jiang, Zhengwei; Xiong, Mengbo

doi:10.1007/978-3-030-60450-9_13

Xuren Wang^12,13,
Zihan Xiong^12,13,
Xiangyu Du¹³,
Jun Jiang¹³,
Zhengwei Jiang¹³ &
…
Mengbo Xiong^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12430))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

3578 Accesses

Abstract

In order to deal with more sophisticated Advanced Persistent Threat (APT) attacks, it is indispensable to convert cybersecurity threat intelligence via structured or semi-structured data specifications. In this paper, we convert the task of extracting indicators of compromises (IOC) information into a sequence labeling task of named entity recognition. We construct the dataset used for named entity identification in the threat intelligence domain and train word vectors in the threat intelligence domain. Meanwhile, we propose a new loss function TSFL, triplet loss function based on metric learning and sorted focal loss function, to solve the problem of unbalanced distribution of data labels. Experiments show that named entity recognition experiments show that F1 value have improved in both public domain datasets and threat intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HRTC: A Triplet Joint Extraction Model Based on Cyber Threat Intelligence

Research on Named Entity Recognition Method of Network Threat Intelligence

A network security entity recognition method based on feature template and CNN-BiLSTM-CRF

Article 01 June 2019

References

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)
MATH Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., et al.: Neural architectures for named entity recognition (2016)
Google Scholar
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF (2016)
Google Scholar
Peters, M.E., Neumann, M., Iyyer, M., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Joshi, A., Lal, R., Finin, T., et al.: Extracting cybersecurity related linked data from text. In: IEEE Seventh International Conference on Semantic Computing, pp. 252–259. IEEE (2013)
Google Scholar
Sabottke, C., Suciu, O., Dumitras, T.: Vulnerability disclosure in the age of social media: exploiting Twitter for predicting real-world exploits. In: Proceedings of the 24th USENIX Security Symposium (USENIX Security 2015). USENIX Association (2015)
Google Scholar
Liao, X., Yuan, K., Wang, X., Li, Z., Xing, L., Beyah, R.: Acing the IOC game: toward automatic discovery and analysis of open-source cyber threat intelligence. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS). Association for Computing Machinery (2016)
Google Scholar
Zhu, Z., Dumitras, T.: ChainSmith: automatically learning the semantics of malicious campaigns by mining threat intelligence reports. In: IEEE European Symposium on Security and Privacy. IEEE (2018)
Google Scholar
Dionísio, N., Alves, F., et al.: Cyberthreat detection from twitter using deep neural networks. In: IEEE International Joint Conference on Neural Networks. IEEE (2019)
Google Scholar
Tan, S., Long, Z., Tan., L., Guo, H.: Automatic identification of indicators of compromise using neural-based sequence labelling (2018)
Google Scholar
Zi, L., et al.: Collecting indicators of compromise from unstructured text of cybersecurity articles using neural-based sequence labelling. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE (2019)
Google Scholar
Xing, E.P., Ng, A.Y., Jordan, M.I., et al.: Distance metric learning with application to clustering with side-information. In: International Conference on Neural Information Processing Systems. MIT Press (2002)
Google Scholar
Hadsell, R., Chopra, S., Lecun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742, New York, USA (2006)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2015)
Google Scholar
Lan, Z., et al.: ALBERT: A lite BERT for self-supervised learning of language representations. In: International Conference on Learning Representations (2019)
Google Scholar
Wei, J.W., Kai, Z.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196 (2019)
Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar

Download references

Acknowledgments

We thank the corresponding authors Xuren Wang and Zihan Xiong for their help. This work is supported by the National Key Research and Development Program of China (Grant No. 2018YFC0824801, Grant No. 2016QY06X1204).

Author information

Authors and Affiliations

Information Engineering College, Capital Normal University, Beijing, 100048, China
Xuren Wang, Zihan Xiong & Mengbo Xiong
Key Laboratory of Network Assessment Technology, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, China
Xuren Wang, Zihan Xiong, Xiangyu Du, Jun Jiang, Zhengwei Jiang & Mengbo Xiong

Authors

Xuren Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zihan Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyu Du
View author publications
You can also search for this author in PubMed Google Scholar
Jun Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhengwei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Mengbo Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xuren Wang or Zihan Xiong .

Editor information

Editors and Affiliations

ECE & Ingenuity Labs Research Institute, Queen’s University, Kingston, ON, Canada
Xiaodan Zhu
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Min Zhang
School of Computer Science and Technology, Soochow University, Suzhou, China
Yu Hong
College of Intelligence and Computing, Tianjin University, Tianjin, China
Ruifang He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Xiong, Z., Du, X., Jiang, J., Jiang, Z., Xiong, M. (2020). NER in Threat Intelligence Domain with TSFL. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12430. Springer, Cham. https://doi.org/10.1007/978-3-030-60450-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-60450-9_13
Published: 02 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60449-3
Online ISBN: 978-3-030-60450-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)