skip to main content
10.1145/3503823.3503888acmotherconferencesArticle/Chapter ViewAbstractPublication PagespciConference Proceedingsconference-collections
research-article

Predicting the existence of exploitation concepts linked to software vulnerabilities using text mining

Authors Info & Claims
Published:22 February 2022Publication History

ABSTRACT

Software vulnerabilities are weaknesses of specific products and versions that may lead in benefiting attackers to exploit such malfunctions, having a further goal to gain access to operating systems, devices and users’ data. As not all vulnerabilities constitute potential threats to such information, this research attempts to explore ways to the identify the ones that are possible to be exploited using only textual descriptions. The practical goal of the experiments is to examine future raw descriptions in order to predict whether the linked product is likely to be exploited or not. The ground truth examined is the existence or absence of references that report exploitation concepts of the related weaknesses. To meet our objectives, in this study, we make use of Natural Language Processing (NLP) techniques, feature evaluation filtering mechanisms and an oversampling method in order to adapt the raw texts into inputs to classification models and detect the most important terms. The results are promising as many constructed models provided an overall accepted accuracy based on the information of the collected dataset.

References

  1. Booth, H., Rike, D., & Witte, G. A. (2013). The national vulnerability database (nvd): Overview.Google ScholarGoogle Scholar
  2. Almukaynizi, M., Nunes, E., Dharaiya, K., Senguttuvan, M., Shakarian, J., & Shakarian, P. (2017, November). Proactive identification of exploits in the wild through vulnerability mentions online. In 2017 International Conference on Cyber Conflict (CyCon US) (pp. 82-88). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  3. Bozorgi, M., Saul, L. K., Savage, S., & Voelker, G. M. (2010, July). Beyond heuristics: learning to classify vulnerabilities and predict exploits. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 105-114).Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fang, Y., Liu, Y., Huang, C., & Liu, L. (2020). FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm. Plos one, 15(2), e0228439.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bhatt, N., Anand, A., & Yadavalli, V. S. (2021). Exploitability prediction of software vulnerabilities. Quality and Reliability Engineering International, 37(2), 648-663.Google ScholarGoogle ScholarCross RefCross Ref
  6. Almukaynizi, M., Nunes, E., Dharaiya, K., Senguttuvan, M., Shakarian, J., & Shakarian, P. (2019). Patch before exploited: An approach to identify targeted software vulnerabilities. In AI in Cybersecurity (pp. 81-113). Springer, Cham.Google ScholarGoogle ScholarCross RefCross Ref
  7. Toloudis, D., Spanos, G., & Angelis, L. (2016, June). Associating the severity of vulnerabilities with their description. In International Conference on Advanced Information Systems Engineering (pp. 231-242). Springer, Cham.Google ScholarGoogle Scholar
  8. Spanos, G., & Angelis, L. (2018). A multi-target approach to estimate software vulnerability characteristics and severity scores. Journal of Systems and Software, 146, 152-166.Google ScholarGoogle ScholarCross RefCross Ref
  9. Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (2008). Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS), 26(3), 1-37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).Google ScholarGoogle Scholar
  11. Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196). PMLR.Google ScholarGoogle Scholar
  12. Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).Google ScholarGoogle ScholarCross RefCross Ref
  13. Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.Google ScholarGoogle Scholar
  14. Bennasar, M., Hicks, Y., & Setchi, R. (2015). Feature selection using joint mutual information maximisation. Expert Systems with Applications, 42(22), 8520-8532.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bommert, A., Sun, X., Bischl, B., Rahnenführer, J., & Lang, M. (2020). Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis, 143, 106839.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263-1284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kerdprasop, N., & Kerdprasop, K. (2012). On the generation of accurate predictive model from highly imbalanced data with heuristics and replication techniques. International journal of bio-science and bio-technology, 4(1), 49-64.Google ScholarGoogle Scholar
  18. Kouns, J. (2008). Open Source Vulnerability Database Project. Open Source Business Resource, (June 2008).Google ScholarGoogle Scholar
  19. Spanos, G., Sioziou, A., & Angelis, L. (2013, September). WIVSS: a new methodology for scoring information systems vulnerabilities. In Proceedings of the 17th panhellenic conference on informatics (pp. 83-90).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS transactions on computers, 4(8), 966-974. [24] An Evaluation of Text Classifification Methods for Literary Study, Bei Yu, Syracuse UniversityGoogle ScholarGoogle Scholar
  21. He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). IEEE.Google ScholarGoogle Scholar
  22. Spanos, G., & Angelis, L. (2015). Impact metrics of security vulnerabilities: Analysis and weighing. Information Security Journal: A Global Perspective, 24(1-3), 57-71.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Predicting the existence of exploitation concepts linked to software vulnerabilities using text mining
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                PCI '21: Proceedings of the 25th Pan-Hellenic Conference on Informatics
                November 2021
                499 pages

                Copyright © 2021 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 22 February 2022

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed limited

                Acceptance Rates

                Overall Acceptance Rate190of390submissions,49%
              • Article Metrics

                • Downloads (Last 12 months)31
                • Downloads (Last 6 weeks)2

                Other Metrics

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format .

              View HTML Format