research-article

TRAPPER:Learning with Weak Supervision for Threat Intelligence Entity Recognition

Authors:

Jiaxing SongAuthors Info & Claims

AISS '22: Proceedings of the 4th International Conference on Advanced Information Science and System

Article No.: 50, Pages 1 - 7

https://doi.org/10.1145/3573834.3574526

Published: 17 January 2023 Publication History

Abstract

The emergence of threat intelligence provides more foundation for tracing the source of network attacks, but it also necessitates a significant amount of manual analysis. Although data-driven automatic information extraction can effectively reduce labor consumption, it is limited by a lack of labeled data in the field of threat intelligence. To overcome this limitation, we propose TRAPPER, a threat entity recognition framework that can infer real threat entities from unlabeled threat sentences, avoiding the difficult labeling work. TRAPPER relies on label functions and three components, label aggregator, label predictor, and label expander, which guides the model with weak supervision and uses transfer knowledge as an aid. The label functions permit us to inject expert knowledge into the label aggregator to generate the inputs needed by the label predictor. It enables the label predictor to learn to recognize threat entities. The label expander combines the multi-source noisy label information with the transferred entity recognition semantic knowledge to further expand the entities. Throughout the process, the components promote each other by learning from each other. Comparative experiments on three threat intelligence-related datasets show that our method can effectively identify threat entities and achieve a maximum F1 score improvement of 6.3% over the best baseline.

References

[1]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

[2]

Chen Gao, Xuan Zhang, Mengting Han, and Hui Liu. 2021. A review on cyber security named entity recognition. Frontiers of Information Technology & Electronic Engineering 22, 9(2021), 1153–1168.

[3]

Tiberiu-Marian Georgescu, Bogdan Iancu, and Madalina Zurini. 2019. Named-entity-recognition-based automated system for diagnosing cybersecurity situations in IoT networks. Sensors 19, 15 (2019), 3380.

[4]

Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric Xing. 2016. Harnessing deep neural networks with logic rules. arXiv preprint arXiv:1603.06318(2016).

[5]

Gyeongmin Kim, Chanhee Lee, Jaechoon Jo, and Heuiseok Lim. 2020. Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network. Int. J. Mach. Learn. Cybern. 11, 10 (2020), 2341–2355.

[6]

Pierre Lison, Jeremy Barnes, Aliaksandr Hubin, and Samia Touileb. 2020. Named Entity Recognition without Labelled Data: A Weak Supervision Approach. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, 1518–1533.

[7]

Jian Liu, Junjie Yan, Jun Jiang, Yitong He, Xuren Wang, Zhengwei Jiang, Peian Yang, and Ning Li. 2022. TriCTI: an actionable cyber threat intelligence discovery system via trigger-enhanced neural network. Cybersecurity 5, 1 (2022), 1–16.

[8]

Nikki McNeil, Robert A Bridges, Michael D Iannacone, Bogdan Czejdo, Nicolas Perez, and John R Goodall. 2013. Pace: Pattern accurate computationally efficient bootstrapping for timely discovery of cyber-security concepts. In 2013 12th International Conference on Machine Learning and Applications, Vol. 2. 60–65.

Digital Library

[9]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. CoRR abs/1802.05365(2018).

[10]

Lance A Ramshaw and Mitchell P Marcus. 1999. Text chunking using transformation-based learning. In Natural language processing using very large corpora. Springer, 157–176.

[11]

Alexander Ratner, Stephen H. Bach, Henry R. Ehrenberg, Jason A. Fries, Sen Wu, and Christopher Ré. 2020. Snorkel: rapid training data creation with weak supervision. VLDB J. 29, 2-3 (2020), 709–730.

[12]

Andreas Rücklé, Steffen Eger, Maxime Peyrard, and Iryna Gurevych. 2018. Concatenated power mean word embeddings as universal cross-lingual sentence representations. arXiv preprint arXiv:1803.01400(2018).

[13]

Esteban Safranchik, Shiying Luo, and Stephen Bach. 2020. Weakly supervised sequence tagging from noisy rules. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 5570–5578.

[14]

Hai Wang and Hoifung Poon. 2018. Deep Probabilistic Logic: A Unifying Framework for Indirect Supervision. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. 1891–1902.

[15]

Xuren Wang, Runshi Liu, Jie Yang, Rong Chen, Zhiting Ling, Peian Yang, and Kai Zhang. 2022. Cyber Threat Intelligence Entity Extraction Based on Deep Learning and Field Knowledge Engineering. In 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 406–413.

[16]

Xuren Wang, Xinpei Liu, Shengqin Ao, Ning Li, Zhengwei Jiang, Zongyi Xu, Zihan Xiong, Mengbo Xiong, and Xiaoqing Zhang. 2020. DNRTI: A Large-scale Dataset for Named Entity Recognition in Threat Intelligence. In 19th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2020, Guangzhou, China, December 29, 2020 - January 1, 2021. 1842–1848.

[17]

Xuren Wang, Jie Yang, Qiuyun Wang, and Changxin Su. 2020. Threat Intelligence Relationship Extraction Based on Distant Supervision and Reinforcement Learning. In SEKE. 572–576.

[18]

Han Wu, Xiaoyong Li, and Yali Gao. 2020. An effective approach of named entity recognition for cyber threat intelligence. In 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Vol. 1. 1370–1374.

[19]

Zhifeng Xiao. 2017. Towards a two-phase unsupervised system for cybersecurity concepts extraction. In 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). 2161–2168.

[20]

Morteza Ziyadi, Yuting Sun, Abhishek Goswami, Jade Huang, and Weizhu Chen. 2020. Example-Based Named Entity Recognition. CoRR abs/2008.10570(2020).

Index Terms

TRAPPER:Learning with Weak Supervision for Threat Intelligence Entity Recognition
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation

Recommendations

Rethinking weak supervision in helping contrastive learning
ICML'23: Proceedings of the 40th International Conference on Machine Learning

Contrastive learning has shown outstanding performances in both supervised and unsupervised learning, and has recently been introduced to solve weakly supervised learning problems such as semi-supervised learning and noisy label learning. Despite the ...
Few-shot Node Classification with Extremely Weak Supervision
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Few-shot node classification aims at classifying nodes with limited labeled nodes as references. Recent few-shot node classification methods typically learn from classes with abundant labeled nodes (i.e., meta-training classes) and then generalize to ...
W2N: Switching from Weak Supervision to Noisy Supervision for Object Detection
Computer Vision – ECCV 2022
Abstract
Weakly-supervised object detection (WSOD) aims to train an object detector only requiring the image-level annotations. Recently, some works have managed to select the accurate boxes generated from a well-trained WSOD network to supervise a semi-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AISS '22: Proceedings of the 4th International Conference on Advanced Information Science and System

November 2022

396 pages

ISBN:9781450397933

DOI:10.1145/3573834

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Opening Project of Intelligent Policing Key Laboratory of Sichuan Province
Criminal Examination Key Laboratory of Sichuan Province

Conference

AISS 2022

AISS 2022: 2022 4th International Conference on Advanced Information Science and System

November 25 - 27, 2022

Sanya, China

Acceptance Rates

Overall Acceptance Rate 41 of 95 submissions, 43%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
66
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten