Skip to main content
Log in

Fine-grained cybersecurity entity typing based on multimodal representation learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Fine-grained entity typing is crucial to improving the efficiency of research in the field of cybersecurity. However, modality limitations and type-labeling hierarchy complexity limit the construction of fine-grained entity typing datasets and the performance of related models. Therefore, in this paper, we constructed a fine-grained entity typing dataset based on multimodal information from the cybersecurity literatures and design a multimodal representation learning model based on it. Specifically, we design and introduce a new benchmark dataset called CySets to facilitate the study of new tasks and train a novel multimodal representation learning model called Cyst-MMET with multitask objectives. The model utilizes multimodal knowledge from literature and external to unify visual and textual representations by eliminating visual noise through a multi-level fusion encoder, thereby alleviating data bottlenecks and long-tail problems in the fine-grained entity typing task. Experimental results show that CySets have sharper hierarchies and more diverse labels than the existing datasets. Across all datasets, our model achieves state-of-the-art or dominant performance (3%), demonstrating that the model is effective in predicting entity types at different granularities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in this published article.

Notes

  1. https://webofscience.com/

  2. https://en.wikipedia.org/wiki/Computer_security

  3. In this paper, “/” denotes entity types at the Fine-grained level in CySets, “./“denotes the entity type at the Ultra-fine level in CySets

  4. The derivation process of the attention mechanism, only the key steps are written. Without loss of generality, the SoftMax scaling factor \(\sqrt{d}\) is ignored.

  5. https://github.com/INK-USC/PLE/blob/master/Data/README.md

  6. https://github.com/allenai/scibert

  7. https://allennlp.org/elmo

  8. https://drive.google.com/file/d/1mNM0UEt7D-e9hsMEVRQzxJ277Dt5GB7v/view

References

  1. Bridges R A, Jones C L, MD Iannacone, et al. (2013) Automatic labeling for entity extraction in cyber security[J]. Comput Sci

  2. Joshi A, Lal R, Finin T, Joshi A (2013) “Extracting cybersecurity related linked data from text,” in Proceedings of the 7th IEEE International Conference on Semantic Computing. IEEE Comput Soc Press

  3. Huang S, Sha Y, Li R (2022) A Chinese named entity recognition method for small-scale dataset based on lexicon and unlabeled data[J]. Multimed Tools Appl:1–22

  4. Choi E, Levy O, Choi Y, Zettlemoyer L. (2018) Ultra-fine entity typing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15–20, 2018, volume 1: long papers, pages 87–96. Association for Computational Linguistics

  5. Del Corro L, Abujabal A, Gemulla R, Weikum G. (2015) FINET: context-aware fine-grained named entity typing. In Proceedings of the 2015 Conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015, pages 868–878. The Association for Computational Linguistics

  6. Zhang S, Balog K, Callan J (2020) “Generating categories for sets of entities,” in Proc ACM Conf Inf Knowl Manage, pp. 1833–1842

  7. Onoe Y, Durrett G (2020) Fine-grained entity typing for domain independent entity linking. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, pages 8576–8583. AAAI Press. LNCS Homepage, http://www.springer.com/lncs, last accessed 2016/11/21

  8. Yavuz S, Gur I, Su Y, Srivatsa M, Yan X. (2016) Improving semantic parsing via answer type inference. In Proceedings of the 2016 Conference on empirical methods in natural language processing, EMNLP 2016, Austin, Texas, USA, November 1–4, 2016, pages 149–159. The Association for Computational Linguistics

  9. Shimaoka S, Stenetorp P, Inui K, Riedel S (2017) Neural architectures for fine-grained entity type classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3–7, 2017, volume 1: long papers, pages 1271–1280. Association for Computational Linguistics

  10. Lin Y, Ji H (2019) An attentive fine-grained entity typing model with latent type representation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 6197–6202

  11. Chen T, Chen Y, Van Durme B (2020) Hierarchical Entity Typing via Multi-level Learning to Rank. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8465–8475, Online. Association for Computational Linguistics

  12. Xu P, Barbosa D (2018) “Neural fine-grained entity type classification with hierarchy-aware loss,” in Proc. Conf. North Amer Chapter Assoc Comput Linguistics, pp. 16–25

  13. Gillick D, Lazic N, Ganchev K, Kirchner J, Huynh D (2014) Context dependent fine-grained entity type tagging. CoRR abs/1412.1820:1–9

    ADS  Google Scholar 

  14. Raiman J R, Raiman O M (2018) Deep type: multilingual entity linking by neural type system evolution[C]. Thirty-Second AAAI Conference on Artificial Intelligence

  15. Sun C, Li W, Xiao J, et al. (2021) Fine-grained chemical entity typing with multimodal knowledge representation[J]

  16. Ling X, Weld DS (2012) Fine-grained entity recognition. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI 2012)

  17. Weischedel R, Brunstein A (2005) BBN pronoun coreference and entity type corpus[J]. Linguistic Data Consortium, Philadelphia, p 112

    Google Scholar 

  18. Fang B, Shi J, Wang Z et al (2021) Security threats and countermeasures of artificial intelligence-enabled cyber attacks [J]. China Eng Sci 23(3):7

    Google Scholar 

  19. Pingle A, Pillai A, Mittal S, et al. (2020) Relet: relation extraction using deep learning approaches for cybersecurity knowledge graph improvement[C]// 2019 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE

  20. Kang Y, Zhong J, Li R, et al. (2021) Classification method for network security data based on multi-featured extraction[J]. Int J Artif Intell Tools

  21. Shen G, Wang W, Mu Q et al (2020) Data-driven cybersecurity knowledge graph construction for industrial control system security[J]. Wirel Commun Mob Comput 2020(6):1–13

    Google Scholar 

  22. Raiman J, Raiman O (2018) Deeptype: multilingual entity linking by neural type system evolution. In Association for the Advancement of Artificial Intelligence

  23. Xu B, Huang S, Sha C et al (2022) MAF: a general matching and alignment framework for multimodal named entity recognition[C]//proceedings of the fifteenth ACM. Int Conf Web Search Data Min:1215–1223

  24. Rabinovich M, Klein D (2017) Fine-grained entity typing with high-multiplicity assignments. In proceedings of Association for Computational Linguistics (ACL)

  25. Murty S, Verga P, Vilnis L, McCallum A (2017) “Finer grained entity typing with typenet,” in Proc. 6th Workshop Automated Knowl. Base Construct, pp. 1–7

  26. Yao L, Riedel S, McCallum A (2013) Universal schema for entity type prediction. In Automatic Knowledge Base Construction Workshop at the Conference on Information and Knowledge Management

  27. Yaghoobzadeh Y, Schütze H (2016) Corpus-level fine-grained entity typing using contextual information. Proceedings of the Conference on Empirical Methods in Natural Language Processing

  28. Obeidat R, Fern XZ, Shahbazi H, Tadepalli P (2019) Description-based zero-shot fine-grained entity typing. In Proceedings of the 2019 Conference of the north American chapter of the Association for Computational Linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, volume 1 (long and short papers), pages 807–814. Association for Computational Linguistics

  29. Zhang T, Xia C, Lu C-T, Philip SY U (2020b) MZET: memory augmented zero-shot fine-grained named entity typing. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (online), December 8–13, 2020, pages 77–87. International Committee on Computational Linguistics

  30. Ren Y, Lin J, Zhou J (2020) Neural zero-shot fine-grained entity typing. In companion of the 2020 web conference 2020, Taipei, Taiwan, April 20-24, 2020, pages 846–847. ACM / IW3C2

  31. Ali MA, Sun Y, Li B, Wang W (2020) Fine-grained named entity typing over distantly supervised data based on refined representations. In The Thirty-F ourth AAAI Conference on Artificial Intelligence, AAAI 2020, The thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, pages 7391–7398. AAAI Press

  32. Wu J, Zhang R, Mao Y et al (2022) Dealing with hierarchical types and label noise in fine-grained entity typing[J]. IEEE/ACM Trans Audio, Speech, Lang Process 30:1305–1318

    Article  Google Scholar 

  33. Dai H, Donghong D, Li X, Song Y (2019) Improving fine-grained entity typing with entity linking. In Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLPIJCNLP 2019, Hong Kong, China, November 3–7, 2019, pages 6209–6214. Assoc Comput Linguist

  34. Sun C, Li W, Xiao J, et al. (2021) Fine-grained chemical entity typing with multimodal knowledge representation[C]//2021 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, 1984–1991

  35. Lee K, He L, Lewis M, Zettlemoyer L (2017) End-to-end neural coreference resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing

  36. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. 2018. Deep contextualized word representations. In proceedings of the 2018 conference of the north American chapter of the Association for Computational Linguistics: human language technologies (NAACL HLT 2018)

  37. Radford A, Kim J W, Hallacy C, et al. (2021) Learning transferable visual models from natural language supervision[C]//international conference on machine learning. PMLR: 8748–8763

  38. Ren X, He W, Meng Q, Voss CR, Ji H, Han J (2016b) Label noise reduction in entity typing by heterogeneous partial-label embedding. In proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, august 13–17, 2016, pages 1825–1834

  39. Zhang S, Duh K, Van Durme B (2018) Fine-grained entity typing through increased discourse context and adaptive classification thresholds. In proceedings of the seventh joint conference on lexical and computational semantics, *SEM@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5-6, 2018, pages 173–179

  40. Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 3615–3620

    Google Scholar 

  41. Li LH, Yatskar M, Yin D, Hsieh C-J, Chang K-W (2019) Visualbert: A simple and performant baseline for vision and language. ArXiv preprint abs/1908.03557 (2019). https://arxiv.org/abs/1908.03557

  42. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, may 7–9, 2015, Conference Track Proceedings

  43. Lin JC-W et al (2021) ASRNN: a recurrent neural network with an attention model for sequence labeling. Knowl-Based Syst 212:106548

    Article  Google Scholar 

  44. Lin JC-W et al (2020) Enhanced sequence labeling based on latent variable conditional random fields. Neurocomputing 403:431–440

    Article  Google Scholar 

  45. Shao Y et al (2021) Self-attention-based conditional random fields latent variables model for sequence labeling. Pattern Recogn Lett 145:157–164

    Article  ADS  Google Scholar 

  46. Sharma DK et al (2022) Explainable artificial intelligence for cybersecurity. Comput Electr Eng 103:108356

    Article  Google Scholar 

  47. Lin JC-W et al (2019) A bi-LSTM mention hypergraph model with encoding schema for mention extraction. Eng Appl Artif Intell 85:175–181

    Article  Google Scholar 

  48. Lv J et al (2023) Semi-supervised node classification via fine-grained graph auxiliary augmentation learning. Pattern Recogn:109301

  49. Azadifar S et al (2022) Graph-based relevancy-redundancy gene selection method for cancer diagnosis. Comput Biol Med 147:105766

    Article  CAS  PubMed  Google Scholar 

  50. Nasiri E, Berahmand K, Li Y (2023) Robust graph regularization nonnegative matrix factorization for link prediction in attributed networks. Multimed Tools Appl 82(3):3745–3768

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant No. 61862063, 61502413, 61262025; the Science Foundation of Young and Middle-aged Academic and Technical Leaders of Yunnan under Grant No. 202205 AC160040; the Science Foundation of Yunnan Jinzhi Expert Workstation under Grant No. 202205AF150006; Major Project of Yunnan Natural Science Foundation under Grant No. 202202AE090066; Science and Technology Project of Yunnan Power Grid Co., Ltd. under Grant No.YNKJXM20222254; the Science Foundation of “Knowledge-driven intelligent software engineering innovation team”.

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. 61862063, 61502413, 61262025; the Science Foundation of Young and Middle-aged Academic and Technical Leaders of Yunnan under Grant No. 202205 AC160040; the Science Foundation of Yunnan Jinzhi Expert Workstation under Grant No. 202205AF150006; Major Project of Yunnan Natural Science Foundation under Grant No. 202202AE090066; Science and Technology Project of Yunnan Power Grid Co., Ltd. under Grant No.YNKJXM20222254; the Science Foundation of “Knowledge-driven intelligent software engineering innovation team”.

Author information

Authors and Affiliations

Authors

Contributions

In this paper, we constructed a fine-grained entity typing dataset based on multimodal information from the cybersecurity literatures and design a multimodal representation learning model based on it. Baolei Wang completed the model design and experimental analysis of this paper and wrote the core chapters of the paper. Xuan Zhang is the corresponding author of this paper, providing hypothetical opinions for this article. Gao Chen completed the second chapter of this article, and Jishu Wang and Linyu Li completed the Figs. 9, 10 and 11 in the fourth part. Qing Duan provided grammatical help for the writing of this article. All authors composed the rest of the manuscript and reviewed the whole manuscript.

Corresponding author

Correspondence to Xuan Zhang.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Zhang, X., Wang, J. et al. Fine-grained cybersecurity entity typing based on multimodal representation learning. Multimed Tools Appl 83, 30207–30232 (2024). https://doi.org/10.1007/s11042-023-16839-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16839-z

Keywords

Navigation