skip to main content
research-article

A Compact Vulnerability Knowledge Graph for Risk Assessment

Published: 31 July 2024 Publication History

Abstract

Software vulnerabilities, also known as flaws, bugs or weaknesses, are common in modern information systems, putting critical data of organizations and individuals at cyber risk. Due to the scarcity of resources, initial risk assessment is becoming a necessary step to prioritize vulnerabilities and make better decisions on remediation, mitigation, and patching. Datasets containing historical vulnerability information are crucial digital assets to enable AI-based risk assessments. However, existing datasets focus on collecting information on individual vulnerabilities while simply storing them in relational databases, disregarding their structural connections. This article constructs a compact vulnerability knowledge graph, VulKG, containing over 276 K nodes and 1 M relationships to represent the connections between vulnerabilities, exploits, affected products, vendors, referred domain names, and more. We provide a detailed analysis of VulKG modeling and construction, demonstrating VulKG-based query and reasoning, and providing a use case of applying VulKG to a vulnerability risk assessment task, i.e., co-exploitation behavior discovery. Experimental results demonstrate the value of graph connections in vulnerability risk assessment tasks. VulKG offers exciting opportunities for more novel and significant research in areas related to vulnerability risk assessment. The data and codes of this article are available at https://github.com/happyResearcher/VulKG.git.

References

[1]
Ahmed AlEroud and George Karabatis. 2012. A contextual anomaly detection approach to discover zero-day attacks. In Proceedings of the 2012 International Conference on Cyber Security. IEEE, 40–45.
[2]
Mehran Bozorgi, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2010. Beyond heuristics: learning to classify vulnerabilities and predict exploits. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 105–114.
[3]
Haipeng Chen, Jing Liu, Rui Liu, Noseong Park, and V. S. Subrahmanian. 2019. VEST: A system for vulnerability exploit scoring & timing. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI ’19). 6503–6505.
[4]
Haipeng Chen, Rui Liu, Noseong Park, and V. S. Subrahmanian. 2019. Using twitter to predict when vulnerabilities will be exploited. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3143–3152.
[5]
Dongdong Du, Xingzhang Ren, Yupeng Wu, Jien Chen, Wei Ye, Jinan Sun, Xiangyu Xi, Qing Gao, and Shikun Zhang. 2018. Refining traceability links between vulnerability and software component in a vulnerability knowledge graph. In Proceedings of the International Conference on Web Engineering. Springer, 33–49.
[6]
Michel Edkrantz and Alan Said. 2015. Predicting cyber vulnerability exploits with machine learning. In Proceedings of the 13th Scandinavian Conference on Artificial Intelligence (SCAI 15). 48–57.
[7]
Seyed M. Ghaffarian and Hamid R. Shahriari. 2017. Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM Computing Surveys (CSUR) 50, 4 (2017), 1–36.
[8]
Seyed M. Ghaffarian and Hamid R. Shahriari. 2021. Neural software vulnerability analysis using rich intermediate graph representations of programs. Information Sciences 553 (2021), 189–207.
[9]
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025–1035.
[10]
Zhuobing Han, Xiaohong Li, Zhenchang Xing, Hongtao Liu, and Zhiyong Feng. 2017. Learning to predict severity of software vulnerability using only vulnerability description. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME ’17). IEEE, 125–136.
[11]
Jay Jacobs, Sasha Romanosky, Benjamin Edwards, Michael Roytman, and Idris Adjerid. 2019. Exploit prediction scoring system (EPSS), Digital Threats: Research and Practice. 2: 1 - 17. Retrieved from https://api.semanticscholar.org/CorpusID:199577534
[12]
Yan Jia, Yulu Qi, Huaijun Shang, Rong Jiang, and Aiping Li. 2018. A practical approach to constructing a knowledge graph for cybersecurity. Engineering 4, 1 (2018), 53–60. DOI:
[13]
Elmar Kiesling, Andreas Ekelhart, Kabul Kurniawan, and Fajar Ekaputra. 2019. The SEPSES knowledge graph: An integrated resource for cybersecurity. In Proceedings of the International Semantic Web Conference. Springer, 198–214.
[14]
Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Retrieved from https://api.semanticscholar.org/CorpusID:3144218
[15]
Xiangjie Kong, Menglin Li, Jianxin Li, Kaiqi Tian, Xiping Hu, and Feng Xia. 2019. CoPFun: An urban co-occurrence pattern mining scheme based on regional function discovery. World Wide Web 22 (2019), 1029–1054.
[16]
Xiangjie Kong, Yajie Shi, Shuo Yu, Jiaying Liu, and Feng Xia. 2019. Academic social networks: Modeling, analysis, mining and applications. Journal of Network and Computer Applications 132 (2019), 86–103.
[17]
Xiangjie Kong, Ximeng Song, Feng Xia, Haochen Guo, Jinzhong Wang, and Amr Tolba. 2018. LoTAD: Long-term traffic anomaly detection based on crowdsourced bus trajectory data. World Wide Web 21 (2018), 825–847.
[18]
Triet H. M. Le, Huaming Chen, and M. Ali Babar. 2021. A survey on data-driven software vulnerability assessment and prioritization. ACM Computing Surveys (CSUR) 55, (2021), 1–39.
[19]
Fan Liu, Xingshe Zhou, Jinli Cao, Zhu Wang, Hua Wang, and Yanchun Zhang. 2019. Arrhythmias classification by integrating stacked bidirectional LSTM and two-dimensional CNN. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 136–149.
[20]
Francesco Lomio, Emanuele Iannone, Andrea De Lucia, Fabio Palomba, and Valentina Lenarduzzi. 2022. Just-in-time software vulnerability detection: Are we there yet? Journal of Systems and Software 188, (2022), 111283.
[21]
Jie Lu, Zheng Yan, Jialin Han, and Guangquan Zhang. 2019. Data-driven decision-making (D 3 M): Framework, methodology, and directions. IEEE Transactions on Emerging Topics in Computational Intelligence 3, 4 (2019), 286–296.
[22]
Steven Noel, Eric Harley, Kam Him Tam, Michael Limiero, and Matthew Share. 2016. CyGraph: Graph-based analytics and visualization for cybersecurity. In: Venkat N. Gudivada, Vijay V. Raghavan, Venu Govindaraju, C.R. Rao (Eds.), Handbook of Statistics, Vol. 35. Elsevier, 117–167.
[23]
Shengzhi Qin and K. P. Chow. 2019. Automatic analysis and reasoning based on vulnerability knowledge graph. In Proceedings of the Cyberspace Data and Intelligence, and Cyber-Living, Syndrome, and Health. Springer, 3–19.
[24]
Ernesto R. Russo, Andrea Di Sorbo, Corrado A. Visaggio, and Gerardo Canfora. 2019. Summarizing vulnerabilities’ descriptions to support experts during vulnerability assessment activities. Journal of Systems and Software 156 (2019), 84–99.
[25]
Carl Sabottke, Octavian Suciu, and Tudor Dumitraș. 2015. Vulnerability disclosure in the age of social media: Exploiting twitter for predicting real-world exploits. In Proceedings of the 24th {USENIX{USENIX} Security 15). 1041–1056.
[26]
Zhenpeng Shi, Nikolay Matyunin, Kalman Graffi, and David Starobinski. 2022. Uncovering product vulnerabilities with threat knowledge graphs. In Proceedings of the IEEE Secure Development Conference (SecDev ’22). IEEE, 84–90.
[27]
Yizhen Sun, Dandan Lin, Hong Song, Minjia Yan, and Linjing Cao. 2020. A method to construct vulnerability knowledge graph based on heterogeneous data. In Proceedings of the 16th International Conference on Mobility, Sensing and Networking (MSN ’20). IEEE, 740–745.
[28]
Nazgol Tavabi, Palash Goyal, Mohammed Almukaynizi, Paulo Shakarian, and Kristina Lerman. 2018. Darkembed: Exploit prediction with neural language models. In Proceedings of the 32 AAAI Conference on Artificial Intelligence. 7849–7854.
[29]
Yan Wang, Xiaowei Hou, Xiu Ma, and Qiujian Lv. 2022. A software security entity relationships prediction framework based on knowledge graph embedding using sentence-bert. In Proceedings of the International Conference on Wireless Algorithms, Systems, and Applications. Springer, 501–513.
[30]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog) 38, 5 (2019), 1–12.
[31]
Hongbo Xiao, Zhenchang Xing, Xiaohong Li, and Hao Guo. 2019. Embedding and predicting software security entity relationships: A knowledge graph based approach. In Proceedings of the 26th International Conference on Neural Information Processing (ICONIP ’19), Part III 26. Springer, 50–63.
[32]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? Retrieved from https://api.semanticscholar.org/CorpusID:52895589
[33]
Jiao Yin, MingJian Tang, Jinli Cao, and Hua Wang. 2020. Apply transfer learning to cybersecurity: Predicting exploitability of vulnerabilities by description. Knowledge-Based Systems 210 (2020), 106529. DOI: https://doi.org/10.1016/j.knosys.2020.106529
[34]
Jiao Yin, MingJian Tang, Jinli Cao, Hua Wang, and Mingshan You. 2022. A real-time dynamic concept adaptive learning algorithm for exploitability prediction. Neurocomputing 472 (2022), 252–265.
[35]
Jiao Yin, MingJian Tang, Jinli Cao, Hua Wang, Mingshan You, and Yongzheng Lin. 2020. Adaptive online learning for vulnerability exploitation time prediction. In Proceedings of the Web Information Systems Engineering (WISE ’20). Springer, 252–266.
[36]
Jiao Yin, MingJian Tang, Jinli Cao, Mingshan You, Hua Wang, and Mamoun Alazab. 2022. Knowledge-Driven Cybersecurity intelligence: Software Vulnerability Co-exploitation Behaviour Discovery. IEEE Transactions on Industrial Informatics 19, 4 (2022), 5593–5601.
[37]
Mingshan You, Jiao Yin, Hua Wang, Jinli Cao, Kate Wang, Yuan Miao, and Elisa Bertino. 2022. A knowledge graph empowered online learning framework for access control decision-making. World Wide Web 26, (2022), 1–22.
[38]
Liu Yuan, Yude Bai, Zhenchang Xing, Sen Chen, Xiaohong Li, and Zhidong Deng. 2021. Predicting entity relations across different security databases by using graph attention network. In Proceedings of the IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC ’21). IEEE, 834–843.
[39]
Shichao Zhu, Shirui Pan, Chuan Zhou, Jia Wu, Yanan Cao, and Bin Wang. 2020. Graph geometry interaction learning. Advances in Neural Information Processing Systems 33 (2020), 7548–7558.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 8
September 2024
700 pages
EISSN:1556-472X
DOI:10.1145/3613713
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 July 2024
Online AM: 05 June 2024
Accepted: 31 May 2024
Revised: 27 May 2024
Received: 16 March 2023
Published in TKDD Volume 18, Issue 8

Check for updates

Author Tags

  1. Knowledge graph
  2. vulnerability risk assessment
  3. vulnerability co-exploitation
  4. link prediction

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)652
  • Downloads (Last 6 weeks)119
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)SecKG2vec: A novel security knowledge graph relational reasoning method based on semantic and structural fusion embeddingComputers & Security10.1016/j.cose.2024.104192149(104192)Online publication date: Feb-2025
  • (2024)Cybersecurity RiskReference Module in Social Sciences10.1016/B978-0-443-13701-3.00550-8Online publication date: 2024
  • (2024)Correlation Between Macro Economic Variables and Financial Sector Australian Share Market IndexDatabases Theory and Applications10.1007/978-981-96-1242-0_18(239-249)Online publication date: 17-Dec-2024
  • (2024)From Data to Insights: Constructing and Evaluating a Hospitality Dataset for Quadruple Aspect-Based Sentiment AnalysisWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_8(102-113)Online publication date: 2-Dec-2024
  • (2024)EBUD: Evolving Disaster Burst Detection over Social StreamsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0567-5_9(105-115)Online publication date: 2-Dec-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media