Human-Machine Interaction for Improved Cybersecurity Named Entity Recognition Considering Semantic Similarity

Kashihara, Kazuaki; Shakarian, Jana; Baral, Chitta

doi:10.1007/978-3-030-55187-2_28

Kazuaki Kashihara¹⁷,
Jana Shakarian¹⁸ &
Chitta Baral¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1251))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

1146 Accesses

Abstract

The automated and timely conversion or extraction of cybersecurity information from unstructured text from online sources is important and required for many applications. Named Entity Recognition (NER) is used to detect the relevant domain entities such as product, attack name, malware name, hacker group name, etc. To train a new NER model for cybersecurity, traditional NER requires a training corpus annotated with cybersecurity entities and state-of-the-art methods require time-consuming and labor intensive feature engineering. We propose a Human-Machine Interaction method for semi-automatic labeling and corpus generation for cybersecurity entities. Our method evaluates the learned NER model with the sentences that we collected in the training process, and the user selects only the correct pair of the named entity and its category for next iteration training. Thus, each iteration gets better training corpora to train the NER model. Some entities are ambiguous since the word or phrase has multiple meanings. We introduce a new semantic similarity measure and determine which category the word belongs to based on this semantic similarity of the entire sentence. The experimental evaluation result shows that our method is better than existing methods in finding undiscovered keywords of given categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automated Corpus Annotation for Cybersecurity Named Entity Recognition with Small Keyword Dictionary

A review on cyber security named entity recognition

Article 16 September 2021

Data and knowledge-driven named entity recognition for cyber security

Article Open access 03 May 2021

Notes

1.
https://www.cyr3con.ai.

References

Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics. COLING 2018, Santa Fe, New Mexico, USA, 20–26 August 2018, pp. 1638–1649 (2018)
Google Scholar
Baevski, A., Edunov, S., Liu, Y., Zettlemoyer, L., Auli, M.: Cloze-driven pretraining of self-attention networks. CoRR, abs/1903.07785 (2019)
Google Scholar
Bridges, R.A., Jones, C.L., Iannacone, M.D., Goodall, J.R.: Automatic labeling for entity extraction in cyber security. CoRR, abs/1308.4941 (2013)
Google Scholar
Carreras, X., Màrquez, L., Padró, L.: Learning a perceptron-based named entity chunker via online recognition feedback. In: Proceedings of the Seventh Conference on Natural Language Learning. CoNLL 2003, Held in Cooperation with HLT-NAACL 2003, Edmonton, Canada, 31 May–1 June 2003, pp. 156–159 (2003)
Google Scholar
Chieu, H.L., Ng, H.T.: Named entity recognition: a maximum entropy approach using global information. In: 19th International Conference on Computational Linguistics. COLING 2002, 24 August–1 September 2002. Howard International House and Academia Sinica, Taipei (2002)
Google Scholar
Cimiano, P., Handschuh, S., Staab, S.: Towards the self-annotating web. In: Proceedings of the 13th International Conference on World Wide Web. WWW 2004, New York, NY, USA, 17–20 May 2004, pp. 462–471 (2004)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Volume 1 (Long and Short Papers). NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, pp. 4171–4186 (2019)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: ACL 2005, 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 25–30 June 2005, pp. 363–370. University of Michigan, USA (2005)
Google Scholar
Gasmi, H., Bouras, A., Laval, J.: LSTM recurrent neural networks for cybersecurity named entity recognition. In: ICSEA 2018, p. 11 (2018)
Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.A.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Article Google Scholar
Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 345–420 (2016)
Article MathSciNet Google Scholar
Graves, A.. Mohamed, A.-R., Hinton, G.E.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2013, Vancouver, BC, Canada, 26–31 May 2013, pp. 6645–6649 (2013)
Google Scholar
Honnibal, M., Montani, I.: spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing, 7 (2017, to appear)
Google Scholar
Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: 19th International Conference on Computational Linguistics. COLING 2002, 24 August–1 September 2002. Howard International House and Academia Sinica, Taipei (2002)
Google Scholar
Jones, C.L., Bridges, R.A., Huffer, K.M.T., Goodall, J.R.: Towards a relation extraction framework for cyber-security concepts. In: Proceedings of the 10th Annual Cyber and Information Security Research Conference. CISR 2015, Oak Ridge, TN, USA, 7–9 April 2015, pp. 11:1–11:4 (2015)
Google Scholar
Joshi, A., Lal, R., Finin, T., Joshi, A.:. Extracting cybersecurity related linked data from text. In: 2013 IEEE Seventh International Conference on Semantic Computing, Irvine, CA, USA, 16–18 September 2013, pp. 252–259 (2013)
Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, 12–17 June 2016, pp. 260–270 (2016)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), 29 June–2 July 2 2000, pp. 591–598. Stanford University, Stanford (2000)
Google Scholar
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning. CoNLL 2003, held in Cooperation with HLT-NAACL 2003, Edmonton, Canada, 31 May–1 June 2003, pp. 188–191 (2003)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: word2vec (2014)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a Meeting held Lake Tahoe, Nevada, United States, 5–8 December 2013, pp. 3111–3119 (2013)
Google Scholar
Mulwad, V., Li, W., Joshi, A., Finin, T., Viswanathan, K.: Extracting information about security vulnerabilities from web text. In: Proceedings of the 2011 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops. WI-IAT 2011. 22–27 August 2011, pp. 257–260. Campus Scientifique de la Doua, Lyon (2011)
Google Scholar
Nguyen, T.H., Grishman, R.: Event detection and domain adaptation with convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Volume 2: Short Papers. ACL 2015, Beijing, China, 26–31 July 2015, pp. 365–371 (2015)
Google Scholar
Pantel, P., Pennacchiotti, M.: Automatically harvesting and ontologizing semantic relations. In: Ontology Learning and Population: Bridging the Gap Between Text and Knowledge, pp. 171–195 (2008)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task chunking. In: Fourth Conference on Computational Natural Language Learning, CoNLL 2000, and the Second Learning Language in Logic Workshop. LLL 2000, held in Cooperation with ICGI-2000, Lisbon, Portugal, 13–14 September 2000, pp. 127–132 (2000)
Google Scholar
Stenetorp, P., Pyysalo, S., Topic, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: EACL 2012, 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, 23–27 April 2012, pp. 102–107 (2012)
Google Scholar

Download references

Acknowledgments

We thank you for Dr. Robert P. Trevino from Maui High Performance Computing center for useful discussion.

Author information

Authors and Affiliations

Arizona State University, Tempe, AZ, 85281, USA
Kazuaki Kashihara & Chitta Baral
Cyber Reconnaissance, Inc., Tempe, AZ, 85281, USA
Jana Shakarian

Authors

Kazuaki Kashihara
View author publications
You can also search for this author in PubMed Google Scholar
Jana Shakarian
View author publications
You can also search for this author in PubMed Google Scholar
Chitta Baral
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazuaki Kashihara .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kashihara, K., Shakarian, J., Baral, C. (2021). Human-Machine Interaction for Improved Cybersecurity Named Entity Recognition Considering Semantic Similarity. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-55187-2_28
Published: 25 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55186-5
Online ISBN: 978-3-030-55187-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics