Loading [a11y]/accessibility-menu.js
A Framework for Document-level Cybersecurity Event Extraction from Open Source Data | IEEE Conference Publication | IEEE Xplore

A Framework for Document-level Cybersecurity Event Extraction from Open Source Data


Abstract:

With the rapid development of the Internet, the number of cyber threats increases exponentially. More and more cyber threats come from new and unexpected sources, leading...Show More

Abstract:

With the rapid development of the Internet, the number of cyber threats increases exponentially. More and more cyber threats come from new and unexpected sources, leading organizations and individuals to facing more security risks and vulnerabilities. Automatically obtaining and structuring security information from cybersecurity news can help security analysts to identify useful information more quickly. Most existing studies on extracting security events merely focused on the event detection task, aiming to discover and categorize cybersecurity events from the plain text. However, such event detection methods cannot capture useful information such as who performed the cyberattack, when the data breach event happened, who was the victim, etc. These arguments of a cybersecurity event are needed for analysts to get cybersecurity event details directly. Several studies have tried to extract rich semantic information of cybersecurity events, but they merely focused on extracting event arguments within the sentence scope. These studies still have limitations when the event arguments needed to recognize spread across multiple sentences. In this paper, we proposed a framework that effectively extracts cybersecurity events at the document-level from cybersecurity news, blogs and announcements. We model the document level event extraction task as a sequence tagging problem. The goal is to identify the related arguments of cybersecurity events from documents. Firstly, we get the characters embedding and incorporate the word information into the character representations. Then we design a sliding window mechanism to get the cross-sentence context information. Finally, we predict the label of each character. We build a Chinese cybersecurity dataset and use three methods to evaluate our method, and the experimental results demonstrate the effectiveness of the proposed model.
Date of Conference: 05-07 May 2021
Date Added to IEEE Xplore: 28 May 2021
ISBN Information:
Conference Location: Dalian, China

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.