Fine-grained cybersecurity entity typing based on multimodal representation learning

Wang, BaoLei; Zhang, Xuan; Wang, JiShu; Gao, Chen; Duan, Qing; Li, LinYu

doi:10.1007/s11042-023-16839-z

Fine-grained cybersecurity entity typing based on multimodal representation learning

Published: 15 September 2023

Volume 83, pages 30207–30232, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

BaoLei Wang^1,2,
Xuan Zhang ORCID: orcid.org/0000-0003-2929-2126^1,3,4,
JiShu Wang⁵,
Chen Gao⁵,
Qing Duan^1,3,4 &
…
LinYu Li¹

247 Accesses
Explore all metrics

Abstract

Fine-grained entity typing is crucial to improving the efficiency of research in the field of cybersecurity. However, modality limitations and type-labeling hierarchy complexity limit the construction of fine-grained entity typing datasets and the performance of related models. Therefore, in this paper, we constructed a fine-grained entity typing dataset based on multimodal information from the cybersecurity literatures and design a multimodal representation learning model based on it. Specifically, we design and introduce a new benchmark dataset called CySets to facilitate the study of new tasks and train a novel multimodal representation learning model called Cyst-MMET with multitask objectives. The model utilizes multimodal knowledge from literature and external to unify visual and textual representations by eliminating visual noise through a multi-level fusion encoder, thereby alleviating data bottlenecks and long-tail problems in the fine-grained entity typing task. Experimental results show that CySets have sharper hierarchies and more diverse labels than the existing datasets. Across all datasets, our model achieves state-of-the-art or dominant performance (3%), demonstrating that the model is effective in predicting entity types at different granularities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Entity Typing with Triples Using Language Models

Multilingual Fine-Grained Entity Typing

Neural Typing Entities in Chinese-Pedia

Data availability

All data generated or analyzed during this study are included in this published article.

Notes

https://webofscience.com/
https://en.wikipedia.org/wiki/Computer_security
In this paper, “/” denotes entity types at the Fine-grained level in CySets, “./“denotes the entity type at the Ultra-fine level in CySets
The derivation process of the attention mechanism, only the key steps are written. Without loss of generality, the SoftMax scaling factor $\sqrt{d}$ is ignored.
https://github.com/INK-USC/PLE/blob/master/Data/README.md
https://github.com/allenai/scibert
https://allennlp.org/elmo
https://drive.google.com/file/d/1mNM0UEt7D-e9hsMEVRQzxJ277Dt5GB7v/view

References

Bridges R A, Jones C L, MD Iannacone, et al. (2013) Automatic labeling for entity extraction in cyber security[J]. Comput Sci
Joshi A, Lal R, Finin T, Joshi A (2013) “Extracting cybersecurity related linked data from text,” in Proceedings of the 7th IEEE International Conference on Semantic Computing. IEEE Comput Soc Press
Huang S, Sha Y, Li R (2022) A Chinese named entity recognition method for small-scale dataset based on lexicon and unlabeled data[J]. Multimed Tools Appl:1–22
Choi E, Levy O, Choi Y, Zettlemoyer L. (2018) Ultra-fine entity typing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15–20, 2018, volume 1: long papers, pages 87–96. Association for Computational Linguistics
Del Corro L, Abujabal A, Gemulla R, Weikum G. (2015) FINET: context-aware fine-grained named entity typing. In Proceedings of the 2015 Conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015, pages 868–878. The Association for Computational Linguistics
Zhang S, Balog K, Callan J (2020) “Generating categories for sets of entities,” in Proc ACM Conf Inf Knowl Manage, pp. 1833–1842
Onoe Y, Durrett G (2020) Fine-grained entity typing for domain independent entity linking. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, pages 8576–8583. AAAI Press. LNCS Homepage, http://www.springer.com/lncs, last accessed 2016/11/21
Yavuz S, Gur I, Su Y, Srivatsa M, Yan X. (2016) Improving semantic parsing via answer type inference. In Proceedings of the 2016 Conference on empirical methods in natural language processing, EMNLP 2016, Austin, Texas, USA, November 1–4, 2016, pages 149–159. The Association for Computational Linguistics
Shimaoka S, Stenetorp P, Inui K, Riedel S (2017) Neural architectures for fine-grained entity type classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3–7, 2017, volume 1: long papers, pages 1271–1280. Association for Computational Linguistics
Lin Y, Ji H (2019) An attentive fine-grained entity typing model with latent type representation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 6197–6202
Chen T, Chen Y, Van Durme B (2020) Hierarchical Entity Typing via Multi-level Learning to Rank. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8465–8475, Online. Association for Computational Linguistics
Xu P, Barbosa D (2018) “Neural fine-grained entity type classification with hierarchy-aware loss,” in Proc. Conf. North Amer Chapter Assoc Comput Linguistics, pp. 16–25
Gillick D, Lazic N, Ganchev K, Kirchner J, Huynh D (2014) Context dependent fine-grained entity type tagging. CoRR abs/1412.1820:1–9
ADS Google Scholar
Raiman J R, Raiman O M (2018) Deep type: multilingual entity linking by neural type system evolution[C]. Thirty-Second AAAI Conference on Artificial Intelligence
Sun C, Li W, Xiao J, et al. (2021) Fine-grained chemical entity typing with multimodal knowledge representation[J]
Ling X, Weld DS (2012) Fine-grained entity recognition. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI 2012)
Weischedel R, Brunstein A (2005) BBN pronoun coreference and entity type corpus[J]. Linguistic Data Consortium, Philadelphia, p 112
Google Scholar
Fang B, Shi J, Wang Z et al (2021) Security threats and countermeasures of artificial intelligence-enabled cyber attacks [J]. China Eng Sci 23(3):7
Google Scholar
Pingle A, Pillai A, Mittal S, et al. (2020) Relet: relation extraction using deep learning approaches for cybersecurity knowledge graph improvement[C]// 2019 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE
Kang Y, Zhong J, Li R, et al. (2021) Classification method for network security data based on multi-featured extraction[J]. Int J Artif Intell Tools
Shen G, Wang W, Mu Q et al (2020) Data-driven cybersecurity knowledge graph construction for industrial control system security[J]. Wirel Commun Mob Comput 2020(6):1–13
Google Scholar
Raiman J, Raiman O (2018) Deeptype: multilingual entity linking by neural type system evolution. In Association for the Advancement of Artificial Intelligence
Xu B, Huang S, Sha C et al (2022) MAF: a general matching and alignment framework for multimodal named entity recognition[C]//proceedings of the fifteenth ACM. Int Conf Web Search Data Min:1215–1223
Rabinovich M, Klein D (2017) Fine-grained entity typing with high-multiplicity assignments. In proceedings of Association for Computational Linguistics (ACL)
Murty S, Verga P, Vilnis L, McCallum A (2017) “Finer grained entity typing with typenet,” in Proc. 6th Workshop Automated Knowl. Base Construct, pp. 1–7
Yao L, Riedel S, McCallum A (2013) Universal schema for entity type prediction. In Automatic Knowledge Base Construction Workshop at the Conference on Information and Knowledge Management
Yaghoobzadeh Y, Schütze H (2016) Corpus-level fine-grained entity typing using contextual information. Proceedings of the Conference on Empirical Methods in Natural Language Processing
Obeidat R, Fern XZ, Shahbazi H, Tadepalli P (2019) Description-based zero-shot fine-grained entity typing. In Proceedings of the 2019 Conference of the north American chapter of the Association for Computational Linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, volume 1 (long and short papers), pages 807–814. Association for Computational Linguistics
Zhang T, Xia C, Lu C-T, Philip SY U (2020b) MZET: memory augmented zero-shot fine-grained named entity typing. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (online), December 8–13, 2020, pages 77–87. International Committee on Computational Linguistics
Ren Y, Lin J, Zhou J (2020) Neural zero-shot fine-grained entity typing. In companion of the 2020 web conference 2020, Taipei, Taiwan, April 20-24, 2020, pages 846–847. ACM / IW3C2
Ali MA, Sun Y, Li B, Wang W (2020) Fine-grained named entity typing over distantly supervised data based on refined representations. In The Thirty-F ourth AAAI Conference on Artificial Intelligence, AAAI 2020, The thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, pages 7391–7398. AAAI Press
Wu J, Zhang R, Mao Y et al (2022) Dealing with hierarchical types and label noise in fine-grained entity typing[J]. IEEE/ACM Trans Audio, Speech, Lang Process 30:1305–1318
Article Google Scholar
Dai H, Donghong D, Li X, Song Y (2019) Improving fine-grained entity typing with entity linking. In Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLPIJCNLP 2019, Hong Kong, China, November 3–7, 2019, pages 6209–6214. Assoc Comput Linguist
Sun C, Li W, Xiao J, et al. (2021) Fine-grained chemical entity typing with multimodal knowledge representation[C]//2021 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, 1984–1991
Lee K, He L, Lewis M, Zettlemoyer L (2017) End-to-end neural coreference resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. 2018. Deep contextualized word representations. In proceedings of the 2018 conference of the north American chapter of the Association for Computational Linguistics: human language technologies (NAACL HLT 2018)
Radford A, Kim J W, Hallacy C, et al. (2021) Learning transferable visual models from natural language supervision[C]//international conference on machine learning. PMLR: 8748–8763
Ren X, He W, Meng Q, Voss CR, Ji H, Han J (2016b) Label noise reduction in entity typing by heterogeneous partial-label embedding. In proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, august 13–17, 2016, pages 1825–1834
Zhang S, Duh K, Van Durme B (2018) Fine-grained entity typing through increased discourse context and adaptive classification thresholds. In proceedings of the seventh joint conference on lexical and computational semantics, *SEM@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5-6, 2018, pages 173–179
Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 3615–3620
Google Scholar
Li LH, Yatskar M, Yin D, Hsieh C-J, Chang K-W (2019) Visualbert: A simple and performant baseline for vision and language. ArXiv preprint abs/1908.03557 (2019). https://arxiv.org/abs/1908.03557
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, may 7–9, 2015, Conference Track Proceedings
Lin JC-W et al (2021) ASRNN: a recurrent neural network with an attention model for sequence labeling. Knowl-Based Syst 212:106548
Article Google Scholar
Lin JC-W et al (2020) Enhanced sequence labeling based on latent variable conditional random fields. Neurocomputing 403:431–440
Article Google Scholar
Shao Y et al (2021) Self-attention-based conditional random fields latent variables model for sequence labeling. Pattern Recogn Lett 145:157–164
Article ADS Google Scholar
Sharma DK et al (2022) Explainable artificial intelligence for cybersecurity. Comput Electr Eng 103:108356
Article Google Scholar
Lin JC-W et al (2019) A bi-LSTM mention hypergraph model with encoding schema for mention extraction. Eng Appl Artif Intell 85:175–181
Article Google Scholar
Lv J et al (2023) Semi-supervised node classification via fine-grained graph auxiliary augmentation learning. Pattern Recogn:109301
Azadifar S et al (2022) Graph-based relevancy-redundancy gene selection method for cancer diagnosis. Comput Biol Med 147:105766
Article CAS PubMed Google Scholar
Nasiri E, Berahmand K, Li Y (2023) Robust graph regularization nonnegative matrix factorization for link prediction in attributed networks. Multimed Tools Appl 82(3):3745–3768
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant No. 61862063, 61502413, 61262025; the Science Foundation of Young and Middle-aged Academic and Technical Leaders of Yunnan under Grant No. 202205 AC160040; the Science Foundation of Yunnan Jinzhi Expert Workstation under Grant No. 202205AF150006; Major Project of Yunnan Natural Science Foundation under Grant No. 202202AE090066; Science and Technology Project of Yunnan Power Grid Co., Ltd. under Grant No.YNKJXM20222254; the Science Foundation of “Knowledge-driven intelligent software engineering innovation team”.

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. 61862063, 61502413, 61262025; the Science Foundation of Young and Middle-aged Academic and Technical Leaders of Yunnan under Grant No. 202205 AC160040; the Science Foundation of Yunnan Jinzhi Expert Workstation under Grant No. 202205AF150006; Major Project of Yunnan Natural Science Foundation under Grant No. 202202AE090066; Science and Technology Project of Yunnan Power Grid Co., Ltd. under Grant No.YNKJXM20222254; the Science Foundation of “Knowledge-driven intelligent software engineering innovation team”.

Author information

Authors and Affiliations

School of Software, Yunnan University, Kunming, 650091, Yunnan, China
BaoLei Wang, Xuan Zhang, Qing Duan & LinYu Li
The Yi-Shu-Si River Basin Administration Hydrological Bureau, HRC, Xuzhou, China
BaoLei Wang
Key Laboratory of Software Engineering of Yunnan Province, Kunming, 650091, Yunnan, China
Xuan Zhang & Qing Duan
Engineering Research Center of Cyberspace, Kunming, 650091, Yunnan, China
Xuan Zhang & Qing Duan
School of Information Science & Engineering, Yunnan University, Kunming, 650091, Yunnan, China
JiShu Wang & Chen Gao

Authors

BaoLei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
JiShu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Gao
View author publications
You can also search for this author in PubMed Google Scholar
Qing Duan
View author publications
You can also search for this author in PubMed Google Scholar
LinYu Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

In this paper, we constructed a fine-grained entity typing dataset based on multimodal information from the cybersecurity literatures and design a multimodal representation learning model based on it. Baolei Wang completed the model design and experimental analysis of this paper and wrote the core chapters of the paper. Xuan Zhang is the corresponding author of this paper, providing hypothetical opinions for this article. Gao Chen completed the second chapter of this article, and Jishu Wang and Linyu Li completed the Figs. 9, 10 and 11 in the fourth part. Qing Duan provided grammatical help for the writing of this article. All authors composed the rest of the manuscript and reviewed the whole manuscript.

Corresponding author

Correspondence to Xuan Zhang.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, B., Zhang, X., Wang, J. et al. Fine-grained cybersecurity entity typing based on multimodal representation learning. Multimed Tools Appl 83, 30207–30232 (2024). https://doi.org/10.1007/s11042-023-16839-z

Download citation

Received: 07 August 2022
Revised: 09 June 2023
Accepted: 31 August 2023
Published: 15 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16839-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fine-grained cybersecurity entity typing based on multimodal representation learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Entity Typing with Triples Using Language Models

Multilingual Fine-Grained Entity Typing

Neural Typing Entities in Chinese-Pedia

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Fine-grained cybersecurity entity typing based on multimodal representation learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Entity Typing with Triples Using Language Models

Multilingual Fine-Grained Entity Typing

Neural Typing Entities in Chinese-Pedia

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation