Cybersecurity Automated Information Extraction Techniques: Drawbacks of Current Methods, and Enhanced Extractors

Bridges, Robert; Huffer, Kelly M.; Jones, Corinne L.; Iannacone, Michael; Goodall, John

doi:10.1109/ICMLA.2017.0-122

Title: Cybersecurity Automated Information Extraction Techniques: Drawbacks of Current Methods, and Enhanced Extractors

Conference · Mon Jan 01 00:00:00 EST 2018

DOI:https://doi.org/10.1109/ICMLA.2017.0-122· OSTI ID:1424492

^[1];

^[1]; Jones, Corinne L. ^[1];

^[1];

^[1]

ORNL

We address a crucial element of applied information extraction—accurate identification of basic security entities in text-—by evaluating previous methods and presenting new labelers. Our survey reveals that the previous efforts have not been tested on documents similar to the targeted sources (news articles, blogs, tweets, etc.) and that no sufficiently large publicly available annotated corpus of these documents exists. By assembling a representative test corpus, we perform a quantitative evaluation of previous methods in a realistic setting, revealing an overall lack of recall, and giving insight to the models' beneficial and inhibiting elements. In particular, our results show that many previous efforts overfit to the non-representative test corpora in this domain. Informed by this evaluation, we present three novel cyber entity extractors, which seek to leverage the available labeled data but remain worthwhile on the more diverse documents encountered in the wild. Each new model increases the state of the art in recall, with maximal or near maximal F1 score. Our results establish that the state of the art in cyber entity tagging is characterized by F1 = 0.61.

View Conference

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1424492

Resource Relation:: Conference: IEEE International Conference on Machine Learning and Applications (ICMLA) - Cancun, , Mexico - 12/20/2017 10:00:00 AM-12/24/2017 10:00:00 AM

Country of Publication:: United States

Language:: English

Similar Records

Automatic Labeling for Entity Extraction in Cyber Security

Conference · Wed Jan 01 00:00:00 EST 2014 · OSTI ID:1424492

Bridges, Robert A; Jones, Corinne L; Iannacone, Michael D; +2 more

PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts

Conference · Tue Jan 01 00:00:00 EST 2013 · OSTI ID:1424492

McNeil, Nikki C; Bridges, Robert A; Iannacone, Michael D; +3 more

Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study

Journal Article · Tue May 25 00:00:00 EDT 2021 · Journal of Medical Internet Research · OSTI ID:1424492

Daughton, Ashlynn R.; Shelley, Courtney D.; Barnard, Martha; +9 more

Title: Cybersecurity Automated Information Extraction Techniques: Drawbacks of Current Methods, and Enhanced Extractors

Citation Formats

Similar Records

Related Subjects