research-article

Predicting the existence of exploitation concepts linked to software vulnerabilities using text mining

Authors:
Konstantinos Charmanas

School of Informatics, Aristotle University of Thessaloniki, Greece

School of Informatics, Aristotle University of Thessaloniki, Greece
View Profile

,
Nikolaos Mittas

Department of Chemistry, International Hellenic University, Greece

Department of Chemistry, International Hellenic University, Greece
View Profile

,
Lefteris Angelis

School of Informatics, Aristotle University of Thessaloniki, Greece

School of Informatics, Aristotle University of Thessaloniki, Greece
View Profile

PCI '21: Proceedings of the 25th Pan-Hellenic Conference on InformaticsNovember 2021Pages 352–356https://doi.org/10.1145/3503823.3503888

Published:22 February 2022Publication History

PCI '21: Proceedings of the 25th Pan-Hellenic Conference on Informatics

Pages 352–356

ABSTRACT

Software vulnerabilities are weaknesses of specific products and versions that may lead in benefiting attackers to exploit such malfunctions, having a further goal to gain access to operating systems, devices and users’ data. As not all vulnerabilities constitute potential threats to such information, this research attempts to explore ways to the identify the ones that are possible to be exploited using only textual descriptions. The practical goal of the experiments is to examine future raw descriptions in order to predict whether the linked product is likely to be exploited or not. The ground truth examined is the existence or absence of references that report exploitation concepts of the related weaknesses. To meet our objectives, in this study, we make use of Natural Language Processing (NLP) techniques, feature evaluation filtering mechanisms and an oversampling method in order to adapt the raw texts into inputs to classification models and detect the most important terms. The results are promising as many constructed models provided an overall accepted accuracy based on the information of the collected dataset.

References

Booth, H., Rike, D., & Witte, G. A. (2013). The national vulnerability database (nvd): Overview.Google Scholar
Almukaynizi, M., Nunes, E., Dharaiya, K., Senguttuvan, M., Shakarian, J., & Shakarian, P. (2017, November). Proactive identification of exploits in the wild through vulnerability mentions online. In 2017 International Conference on Cyber Conflict (CyCon US) (pp. 82-88). IEEE.Google ScholarCross Ref
Bozorgi, M., Saul, L. K., Savage, S., & Voelker, G. M. (2010, July). Beyond heuristics: learning to classify vulnerabilities and predict exploits. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 105-114).Google ScholarDigital Library
Fang, Y., Liu, Y., Huang, C., & Liu, L. (2020). FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm. Plos one, 15(2), e0228439.Google ScholarCross Ref
Bhatt, N., Anand, A., & Yadavalli, V. S. (2021). Exploitability prediction of software vulnerabilities. Quality and Reliability Engineering International, 37(2), 648-663.Google ScholarCross Ref
Almukaynizi, M., Nunes, E., Dharaiya, K., Senguttuvan, M., Shakarian, J., & Shakarian, P. (2019). Patch before exploited: An approach to identify targeted software vulnerabilities. In AI in Cybersecurity (pp. 81-113). Springer, Cham.Google ScholarCross Ref
Toloudis, D., Spanos, G., & Angelis, L. (2016, June). Associating the severity of vulnerabilities with their description. In International Conference on Advanced Information Systems Engineering (pp. 231-242). Springer, Cham.Google Scholar
Spanos, G., & Angelis, L. (2018). A multi-target approach to estimate software vulnerability characteristics and severity scores. Journal of Systems and Software, 146, 152-166.Google ScholarCross Ref
Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (2008). Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS), 26(3), 1-37.Google ScholarDigital Library
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).Google Scholar
Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196). PMLR.Google Scholar
Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).Google ScholarCross Ref
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.Google Scholar
Bennasar, M., Hicks, Y., & Setchi, R. (2015). Feature selection using joint mutual information maximisation. Expert Systems with Applications, 42(22), 8520-8532.Google ScholarDigital Library
Bommert, A., Sun, X., Bischl, B., Rahnenführer, J., & Lang, M. (2020). Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis, 143, 106839.Google ScholarDigital Library
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263-1284.Google ScholarDigital Library
Kerdprasop, N., & Kerdprasop, K. (2012). On the generation of accurate predictive model from highly imbalanced data with heuristics and replication techniques. International journal of bio-science and bio-technology, 4(1), 49-64.Google Scholar
Kouns, J. (2008). Open Source Vulnerability Database Project. Open Source Business Resource, (June 2008).Google Scholar
Spanos, G., Sioziou, A., & Angelis, L. (2013, September). WIVSS: a new methodology for scoring information systems vulnerabilities. In Proceedings of the 17th panhellenic conference on informatics (pp. 83-90).Google ScholarDigital Library
Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS transactions on computers, 4(8), 966-974. [24] An Evaluation of Text Classifification Methods for Literary Study, Bei Yu, Syracuse UniversityGoogle Scholar
He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). IEEE.Google Scholar
Spanos, G., & Angelis, L. (2015). Impact metrics of security vulnerabilities: Analysis and weighing. Information Security Journal: A Global Perspective, 24(1-3), 57-71.Google ScholarDigital Library

Index Terms

Predicting the existence of exploitation concepts linked to software vulnerabilities using text mining

Index terms have been assigned to the content through auto-classification.

Recommendations

POSTER: Dynamic Software Vulnerabilities Threat Prediction through Social Media Contextual Analysis
ASIA CCS '20: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security

Publicly available software vulnerabilities and exploit codes are often utilized by malicious actors to launch cyberattack to vulnerable targets. Therefore, organizations not only need to update their software to the latest version, they need to do ...
Read More
Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data
IWSPA '17: Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics

Each year, thousands of software vulnerabilities are discovered and reported to the public. Unpatched known vulnerabilities are a significant security risk. It is imperative that software vendors quickly provide patches once vulnerabilities are known ...
Read More
A New Method of Software Security Vulnerability Exploitation
ICECC '12: Proceedings of the 2012 International Conference on Electronics, Communications and Control

In the cyber-society, information security is becoming extremely important since more and more security threats have occurred. As most occurrences of attacks are the result of different kinds of vulnerabilities in software products, the security ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PCI '21: Proceedings of the 25th Pan-Hellenic Conference on Informatics
November 2021
499 pages
ISBN:9781450395557
DOI:10.1145/3503823
Editors:
Michael Gr. Vassilakopoulos,
Nikitas N. Karanikolas,
George Stamoulis,
Vassilios S. Verykios,
Cleo Sgouropoulou
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 February 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data oversampling
feature selection
machine learning
natural language processing
vulnerability exploitation
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate190of390submissions,49%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 67
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Predicting the existence of exploitation concepts linked to software vulnerabilities using text mining

PCI '21: Proceedings of the 25th Pan-Hellenic Conference on Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

POSTER: Dynamic Software Vulnerabilities Threat Prediction through Social Media Contextual Analysis

Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data

A New Method of Software Security Vulnerability Exploitation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Predicting the existence of exploitation concepts linked to software vulnerabilities using text mining

PCI '21: Proceedings of the 25th Pan-Hellenic Conference on Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

POSTER: Dynamic Software Vulnerabilities Threat Prediction through Social Media Contextual Analysis

Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data

A New Method of Software Security Vulnerability Exploitation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media