Skip to main content

Human-Machine Interaction for Improved Cybersecurity Named Entity Recognition Considering Semantic Similarity

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1251))

Included in the following conference series:

  • 1146 Accesses

Abstract

The automated and timely conversion or extraction of cybersecurity information from unstructured text from online sources is important and required for many applications. Named Entity Recognition (NER) is used to detect the relevant domain entities such as product, attack name, malware name, hacker group name, etc. To train a new NER model for cybersecurity, traditional NER requires a training corpus annotated with cybersecurity entities and state-of-the-art methods require time-consuming and labor intensive feature engineering. We propose a Human-Machine Interaction method for semi-automatic labeling and corpus generation for cybersecurity entities. Our method evaluates the learned NER model with the sentences that we collected in the training process, and the user selects only the correct pair of the named entity and its category for next iteration training. Thus, each iteration gets better training corpora to train the NER model. Some entities are ambiguous since the word or phrase has multiple meanings. We introduce a new semantic similarity measure and determine which category the word belongs to based on this semantic similarity of the entire sentence. The experimental evaluation result shows that our method is better than existing methods in finding undiscovered keywords of given categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.cyr3con.ai.

References

  1. Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics. COLING 2018, Santa Fe, New Mexico, USA, 20–26 August 2018, pp. 1638–1649 (2018)

    Google Scholar 

  2. Baevski, A., Edunov, S., Liu, Y., Zettlemoyer, L., Auli, M.: Cloze-driven pretraining of self-attention networks. CoRR, abs/1903.07785 (2019)

    Google Scholar 

  3. Bridges, R.A., Jones, C.L., Iannacone, M.D., Goodall, J.R.: Automatic labeling for entity extraction in cyber security. CoRR, abs/1308.4941 (2013)

    Google Scholar 

  4. Carreras, X., Màrquez, L., Padró, L.: Learning a perceptron-based named entity chunker via online recognition feedback. In: Proceedings of the Seventh Conference on Natural Language Learning. CoNLL 2003, Held in Cooperation with HLT-NAACL 2003, Edmonton, Canada, 31 May–1 June 2003, pp. 156–159 (2003)

    Google Scholar 

  5. Chieu, H.L., Ng, H.T.: Named entity recognition: a maximum entropy approach using global information. In: 19th International Conference on Computational Linguistics. COLING 2002, 24 August–1 September 2002. Howard International House and Academia Sinica, Taipei (2002)

    Google Scholar 

  6. Cimiano, P., Handschuh, S., Staab, S.: Towards the self-annotating web. In: Proceedings of the 13th International Conference on World Wide Web. WWW 2004, New York, NY, USA, 17–20 May 2004, pp. 462–471 (2004)

    Google Scholar 

  7. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  8. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Volume 1 (Long and Short Papers). NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, pp. 4171–4186 (2019)

    Google Scholar 

  9. Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: ACL 2005, 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 25–30 June 2005, pp. 363–370. University of Michigan, USA (2005)

    Google Scholar 

  10. Gasmi, H., Bouras, A., Laval, J.: LSTM recurrent neural networks for cybersecurity named entity recognition. In: ICSEA 2018, p. 11 (2018)

    Google Scholar 

  11. Gers, F.A., Schmidhuber, J., Cummins, F.A.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)

    Article  Google Scholar 

  12. Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 345–420 (2016)

    Article  MathSciNet  Google Scholar 

  13. Graves, A.. Mohamed, A.-R., Hinton, G.E.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2013, Vancouver, BC, Canada, 26–31 May 2013, pp. 6645–6649 (2013)

    Google Scholar 

  14. Honnibal, M., Montani, I.: spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing, 7 (2017, to appear)

    Google Scholar 

  15. Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: 19th International Conference on Computational Linguistics. COLING 2002, 24 August–1 September 2002. Howard International House and Academia Sinica, Taipei (2002)

    Google Scholar 

  16. Jones, C.L., Bridges, R.A., Huffer, K.M.T., Goodall, J.R.: Towards a relation extraction framework for cyber-security concepts. In: Proceedings of the 10th Annual Cyber and Information Security Research Conference. CISR 2015, Oak Ridge, TN, USA, 7–9 April 2015, pp. 11:1–11:4 (2015)

    Google Scholar 

  17. Joshi, A., Lal, R., Finin, T., Joshi, A.:. Extracting cybersecurity related linked data from text. In: 2013 IEEE Seventh International Conference on Semantic Computing, Irvine, CA, USA, 16–18 September 2013, pp. 252–259 (2013)

    Google Scholar 

  18. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, 12–17 June 2016, pp. 260–270 (2016)

    Google Scholar 

  19. McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), 29 June–2 July 2 2000, pp. 591–598. Stanford University, Stanford (2000)

    Google Scholar 

  20. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning. CoNLL 2003, held in Cooperation with HLT-NAACL 2003, Edmonton, Canada, 31 May–1 June 2003, pp. 188–191 (2003)

    Google Scholar 

  21. Mikolov, T., Chen, K., Corrado, G., Dean, J.: word2vec (2014)

    Google Scholar 

  22. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a Meeting held Lake Tahoe, Nevada, United States, 5–8 December 2013, pp. 3111–3119 (2013)

    Google Scholar 

  23. Mulwad, V., Li, W., Joshi, A., Finin, T., Viswanathan, K.: Extracting information about security vulnerabilities from web text. In: Proceedings of the 2011 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops. WI-IAT 2011. 22–27 August 2011, pp. 257–260. Campus Scientifique de la Doua, Lyon (2011)

    Google Scholar 

  24. Nguyen, T.H., Grishman, R.: Event detection and domain adaptation with convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Volume 2: Short Papers. ACL 2015, Beijing, China, 26–31 July 2015, pp. 365–371 (2015)

    Google Scholar 

  25. Pantel, P., Pennacchiotti, M.: Automatically harvesting and ontologizing semantic relations. In: Ontology Learning and Population: Bridging the Gap Between Text and Knowledge, pp. 171–195 (2008)

    Google Scholar 

  26. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  27. Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task chunking. In: Fourth Conference on Computational Natural Language Learning, CoNLL 2000, and the Second Learning Language in Logic Workshop. LLL 2000, held in Cooperation with ICGI-2000, Lisbon, Portugal, 13–14 September 2000, pp. 127–132 (2000)

    Google Scholar 

  28. Stenetorp, P., Pyysalo, S., Topic, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: EACL 2012, 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, 23–27 April 2012, pp. 102–107 (2012)

    Google Scholar 

Download references

Acknowledgments

We thank you for Dr. Robert P. Trevino from Maui High Performance Computing center for useful discussion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazuaki Kashihara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kashihara, K., Shakarian, J., Baral, C. (2021). Human-Machine Interaction for Improved Cybersecurity Named Entity Recognition Considering Semantic Similarity. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_28

Download citation

Publish with us

Policies and ethics