ABSTRACT
Software vulnerabilities are weaknesses of specific products and versions that may lead in benefiting attackers to exploit such malfunctions, having a further goal to gain access to operating systems, devices and users’ data. As not all vulnerabilities constitute potential threats to such information, this research attempts to explore ways to the identify the ones that are possible to be exploited using only textual descriptions. The practical goal of the experiments is to examine future raw descriptions in order to predict whether the linked product is likely to be exploited or not. The ground truth examined is the existence or absence of references that report exploitation concepts of the related weaknesses. To meet our objectives, in this study, we make use of Natural Language Processing (NLP) techniques, feature evaluation filtering mechanisms and an oversampling method in order to adapt the raw texts into inputs to classification models and detect the most important terms. The results are promising as many constructed models provided an overall accepted accuracy based on the information of the collected dataset.
- Booth, H., Rike, D., & Witte, G. A. (2013). The national vulnerability database (nvd): Overview.Google Scholar
- Almukaynizi, M., Nunes, E., Dharaiya, K., Senguttuvan, M., Shakarian, J., & Shakarian, P. (2017, November). Proactive identification of exploits in the wild through vulnerability mentions online. In 2017 International Conference on Cyber Conflict (CyCon US) (pp. 82-88). IEEE.Google ScholarCross Ref
- Bozorgi, M., Saul, L. K., Savage, S., & Voelker, G. M. (2010, July). Beyond heuristics: learning to classify vulnerabilities and predict exploits. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 105-114).Google ScholarDigital Library
- Fang, Y., Liu, Y., Huang, C., & Liu, L. (2020). FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm. Plos one, 15(2), e0228439.Google ScholarCross Ref
- Bhatt, N., Anand, A., & Yadavalli, V. S. (2021). Exploitability prediction of software vulnerabilities. Quality and Reliability Engineering International, 37(2), 648-663.Google ScholarCross Ref
- Almukaynizi, M., Nunes, E., Dharaiya, K., Senguttuvan, M., Shakarian, J., & Shakarian, P. (2019). Patch before exploited: An approach to identify targeted software vulnerabilities. In AI in Cybersecurity (pp. 81-113). Springer, Cham.Google ScholarCross Ref
- Toloudis, D., Spanos, G., & Angelis, L. (2016, June). Associating the severity of vulnerabilities with their description. In International Conference on Advanced Information Systems Engineering (pp. 231-242). Springer, Cham.Google Scholar
- Spanos, G., & Angelis, L. (2018). A multi-target approach to estimate software vulnerability characteristics and severity scores. Journal of Systems and Software, 146, 152-166.Google ScholarCross Ref
- Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (2008). Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS), 26(3), 1-37.Google ScholarDigital Library
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).Google Scholar
- Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196). PMLR.Google Scholar
- Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).Google ScholarCross Ref
- Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.Google Scholar
- Bennasar, M., Hicks, Y., & Setchi, R. (2015). Feature selection using joint mutual information maximisation. Expert Systems with Applications, 42(22), 8520-8532.Google ScholarDigital Library
- Bommert, A., Sun, X., Bischl, B., Rahnenführer, J., & Lang, M. (2020). Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis, 143, 106839.Google ScholarDigital Library
- He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263-1284.Google ScholarDigital Library
- Kerdprasop, N., & Kerdprasop, K. (2012). On the generation of accurate predictive model from highly imbalanced data with heuristics and replication techniques. International journal of bio-science and bio-technology, 4(1), 49-64.Google Scholar
- Kouns, J. (2008). Open Source Vulnerability Database Project. Open Source Business Resource, (June 2008).Google Scholar
- Spanos, G., Sioziou, A., & Angelis, L. (2013, September). WIVSS: a new methodology for scoring information systems vulnerabilities. In Proceedings of the 17th panhellenic conference on informatics (pp. 83-90).Google ScholarDigital Library
- Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS transactions on computers, 4(8), 966-974. [24] An Evaluation of Text Classifification Methods for Literary Study, Bei Yu, Syracuse UniversityGoogle Scholar
- He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). IEEE.Google Scholar
- Spanos, G., & Angelis, L. (2015). Impact metrics of security vulnerabilities: Analysis and weighing. Information Security Journal: A Global Perspective, 24(1-3), 57-71.Google ScholarDigital Library
Index Terms
- Predicting the existence of exploitation concepts linked to software vulnerabilities using text mining
Recommendations
POSTER: Dynamic Software Vulnerabilities Threat Prediction through Social Media Contextual Analysis
ASIA CCS '20: Proceedings of the 15th ACM Asia Conference on Computer and Communications SecurityPublicly available software vulnerabilities and exploit codes are often utilized by malicious actors to launch cyberattack to vulnerable targets. Therefore, organizations not only need to update their software to the latest version, they need to do ...
Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data
IWSPA '17: Proceedings of the 3rd ACM on International Workshop on Security And Privacy AnalyticsEach year, thousands of software vulnerabilities are discovered and reported to the public. Unpatched known vulnerabilities are a significant security risk. It is imperative that software vendors quickly provide patches once vulnerabilities are known ...
A New Method of Software Security Vulnerability Exploitation
ICECC '12: Proceedings of the 2012 International Conference on Electronics, Communications and ControlIn the cyber-society, information security is becoming extremely important since more and more security threats have occurred. As most occurrences of attacks are the result of different kinds of vulnerabilities in software products, the security ...
Comments