Static analysis of source code security: Assessment of tools against SAMATE tests

doi:10.1016/j.infsof.2013.02.005

Information and Software Technology

Volume 55, Issue 8, August 2013, Pages 1462-1476

https://doi.org/10.1016/j.infsof.2013.02.005 Get rights and content

Abstract

Context

Static analysis tools are used to discover security vulnerabilities in source code. They suffer from false negatives and false positives. A false positive is a reported vulnerability in a program that is not really a security problem. A false negative is a vulnerability in the code which is not detected by the tool.

Objective

The main goal of this article is to provide objective assessment results following a well-defined and repeatable methodology that analyzes the performance detecting security vulnerabilities of static analysis tools. The study compares the performance of nine tools (CBMC, K8-Insight, PC-lint, Prevent, Satabs, SCA, Goanna, Cx-enterprise, Codesonar), most of them commercials tools, having a different design.

Method

We executed the static analysis tools against SAMATE Reference Dataset test suites 45 and 46 for C language. One includes test cases with known vulnerabilities and the other one is designed with specific vulnerabilities fixed. Afterwards, the results are analyzed by using a set of well known metrics.

Results

Only SCA is designed to detect all vulnerabilities considered in SAMATE. None of the tools detect “cross-site scripting” vulnerabilities. The best results for F-measure metric are obtained by Prevent, SCA and K8-Insight. The average precision for analyzed tools is 0.7 and the average recall is 0.527. The differences between all tools are relevant, detecting different kinds of vulnerabilities.

Conclusions

The results provide empirical evidences that support popular propositions not objectively demonstrated until now. The methodology is repeatable and allows ranking strictly the analyzed static analysis tools, in terms of vulnerabilities coverage and effectiveness for detecting the highest number of vulnerabilities having few false positives. Its use can help practitioners to select appropriate tools for a security review process of code. We propose some recommendations for improving the reliability and usefulness of static analysis tools and the process of benchmarking.

Introduction

The world is facing today some of the biggest security challenges because of different kinds of computer risks: from identity theft to spyware, from traditional sniffing of secure information to electronic warfare. Many industries depend more and more on various pieces of software coming from vendors, open source or third parties. The number of systems with more complex interactions is growing as well, comprising operating system and application patches coming from Internet or from collaboration through different distributed sites. The security needs for avionic systems, Black [1], or the most recent Flame virus attack [2] demonstrate that information security is a multifaceted problem with a common factor: the users need security assurance on the software they use.

Software engineers must consider a variety of strategies to build sufficiently dependable software before release. Indeed the tools used for software development and maintenance can supply developers with information for assurance cases. This information must be gathered to get software secure enough for its intended use. Among these tools the automatic static source code security scanners can be used for examining legacy code and, also, as a routine task included in the software development lifecycle. Static analysis of source code is a fault-detection technique [3], which does not require program execution. Automated static security analysis tools of source code are increasingly used today and taken into account in software development strategies, following an easy scheme as seen in Fig. 1. They are designed to detect vulnerabilities: flaws, faults, bugs and other errors in software code that, if left unaddressed, could lead to exploitable security vulnerabilities.

Nevertheless, static source code security analysis tools are far from being standard tools. Their internal design determines which set of vulnerabilities can detect. Commercial tools have a lack of public information about the design and their code is not accessible. The classification used by the tools for found vulnerabilities is different in each of the tools, making difficult the comparison of the results. A real benchmark is necessary to compare their performance, in terms of the number of detected vulnerabilities (true and false positives), their usability and to state, independently, which are the best ones for each environment. Finally, the fact that these tools can be used by any developer, even by the ones without previous information security experience, is doubtful. In accordance with Chess and West [4] and Howard [5], and against the claims of some vendors, it is relevant to know if an information security experienced user is necessary to get the best results of these tools.

These tools have begun to be part of the toolset for many software development teams, although a clear understanding of their strengths and limitations is still needed. This work aims to contribute in clarifying some aspects of this matter.

The first goal of this study is to provide objective assessment results, following a well-defined and repeatable methodology that allows an evaluation of the performance of vulnerabilities detection capacity of static code security analyzers. The methodology uses a selected benchmark with a well-known set of security vulnerabilities. A tool has the best performance against a benchmark if it has the best balance between detecting the highest number of true positives and having few false positives. The benchmark must be “repeatable, portable, scalable, representative, require minimum changes in the target tools and simple to use”, in accordance with Gray [52]. Several benchmarks initiatives have been analyzed and the NIST SAMATE Reference Dataset [6] project meets all these requirements. It compiles a suite of synthetic benchmarks with support for multiplatform and for the C/C++, Java, J2EE and PHP languages, with coverage for most of vulnerabilities categories.

The second goal is to use the methodology for comparing the performance of nine different tools for C/C++ languages (CBMC, K8-Insight, PC-lint, Prevent, Satabs, SCA, Goanna, Cx-enterprise, Codesonar), having a different design. The study searches their effectiveness in terms of the number of detected security vulnerabilities and uses a selected and widely accepted set of metrics. This evaluation allows providing some practical recommendations for the use of these tools and also on how to improve their effectiveness.

The study searches also to characterize how automatic the behavior of the tools is or, in other words, if an experienced user is necessary or not to get the best performance of these tools. Some of these tools, especially the commercial ones, claim not needing security expert people working on the results produced by the tools, and we wanted to clarify this statement.

The rest of the paper is organized as follows: Section 2 makes a tour on previous work about static analysis comparison and existing benchmarks; Section 3 presents a theoretical analysis of the advantages/disadvantages of the current static source code security analysis tools; Section 4 describes the methodology used when working with the NIST SAMATE tests, the evaluation metrics and the static source code security analysis tools selected for our work; Section 5 shows the results for the evaluation of these tools against the SAMATE test suites; Section 6 offers a discussion on the main results of the study and, finally, Section 7 draws some conclusions and recommendations about the work done and outlines possible future work.

Section snippets

Background and related work

This section reviews previous work about static analysis tools, also about comparisons of their performance, and finally explains how the present work improves these studies.

Static source code security analysis tools

In this section some of the most significant features of current static source code security analyzers are described. We review also the diverse approaches used by different categories of tools and the drawbacks of these tools.

All these tools follow the same pattern when applied to a piece of source code:

1.
Transforming the code to be analyzed into a program model, a set of data structures that represent the code.
2.
Analyzing the model using different rules and/or properties.
3.
Showing the results to

Assessment methodology

This section describes briefly the SAMATE initiative, the methodology for doing the comparison with SAMATE, the evaluation metrics and the tests selected for our study within SAMATE. Afterwards we describe the static analysis tools we decided to compare. Finally the main research questions of this work are presented. Fig. 3 shows our methodology step by step, illustrating the complete approach followed in this study.

The methodology obliged us to:

–
Use the list of the vulnerabilities each tool

Research question 1: How do the nine tools compare in terms of detecting security vulnerabilities of different categories on the SAMATE test suite 45?

Table 4 shows the statistics of vulnerabilities detection against the 78 specific cases of test-suite 45, for each type of vulnerabilities, and for the nine tools. Each type of vulnerabilities in test suites has a different numbers of test cases, which could favor one tool over other. To normalize the result of detections we calculate the percentage of detections for each type of vulnerability (recall) that each tool is designed for detecting. Last row of Table 4 shows also the arithmetic mean

Discussion

In this section we discuss the main results of the evaluation of the nine static source code analysis tools.

Conclusions

The present study provides objective evidence of the performance of static analysis tools, using a well defined benchmark test suite and a repeatable methodology, and provides results from the state-of-the-art tools.

The methodology applies widely known metrics based on rates of true and false positives and vulnerabilities coverage degree of tools, producing a strict scale for the performance of static analysis tools. Then, a company can choose a tool by analyzing the precision, recall, F

Acknowledgment

The authors acknowledge the support provided by e-Madrid Project, S2009/TIC-1650, “Investigación y Desarrollo de tecnologías para el e-learning en la Comunidad de Madrid.”

References (75)

S. Heckman et al.
A systematic literature review of actionable alert identification techniques for automated static code analysis
Information and Software Technology
(2011)
P.E. Black, Software assurance with SAMATE reference dataset, tool standards and studies, in: Proc. 26th IEEE/AIAA...
Flame Virus,...
IEEE, IEEE Standard Glossary of Software Engineering Terminology, IEEE Standard 610.12-1990,...
B. Chess et al.
Secure Programming with Static Analysis
(2007)
M.A. Howard
A process for performing security code reviews
IEEE Security & Privacy
(2006)
SAMATE, 2005, Software Assurance Metrics And Tool Evaluation, National Institute of Standards and Technology,...
N. Nagappan, T. Ball, Static analysis tools as early indicators of pre-release defect density, in: Proc. 27th...
D. Hovemeyer, J. Spacco, W. Pugh, Evaluating and tuning a static analysis to find null pointer bugs, in: Proc. 6th ACM...
N. Ayewah et al.
Using static análisis to find bugs
IEEE Software
(2008)

M.S. Ware, C.J. Fox, Securing Java code: heuristics and an evaluation of static analysis tools, in: Conference on...

J. Zheng et al.

On the value of static analysis for fault detection in software

IEEE Transactions on Software Engineering

(2006)

M. Howard et al.

Writing Secure Code

(2003)

J. Pescatore, Require vulnerability testing during software development, Gartner Research Note,...

B. Chess et al.

Static analysis for security

IEEE Security & Privacy

(2004)

P. Chandra et al.

Putting the tools to work: how to succeed with source code analysis

IEEE Security & Privacy

(2006)

G. McGraw

Software Security: Building Security In

(2006)

D.N. Kleidermacher, Integrating static analysis into a secure software development process, in: IEEE Conference on...

NIST SP 500-279, NIST Special Publication 500-279, 2009, Static Analysis Tool Exposition (SATE) 2008,...

NIST Interagency Report 7755, 2010, Toward a Preliminary Framework for Assessing the Trustworthiness of Software....

NIST SP 500-287, NIST Special Publication 500-287, 2010, The Second Static Analysis Tool Exposition (SATE) 2009....

A. Aggarwal, P. Jalote, Integrating static and dynamic analysis for detecting vulnerabilities, in: Proc. 30th Annual...

A. Petukhov, D. Kozlov, Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis with Penetration...

SANS TOP 25. <http://www.sans.org/top25-software-errors/> (accessed September...

K. Kratkiewicz, Evaluating Static Analysis Tools for Detecting Buffer Overflows in C Code, Master’s Thesis, Harvard...

MathWorks Polyspace. <http://www.mathworks.es/products/polyspace/> (accessed December...

X. Yichen, A. Chou, D. Engler, ARCHER: using symbolic, path-sensitive analysis to detect memory access errors, in:...

G. Holzmann, UNO: static source code checking for userdefined properties, in: 6th World Conf. on Integrated Design and...

D. Evans, D. Larochelle, Improving security using extensible lightweight static analysis, IEEE Software...

D. Wagner, J. Foster, E. Brewer, A. Aiken, A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities,...

M. Zitser, R. Lippmann, T. Leek, Testing static analysis tools using exploitable buffer overflows from open source...

P. Emanuelsson, U. Nilsson, A Comparative Study of Industrial Static Analysis Tools (Extended Version), Technical...

Coverity Products, <http://www.coverity.com/products/coverity-prevent.html> (accessed September...

Klocwork Insight, <http://www.klocwork.com/products/insight/> (accessed September...

T. Hofer, Evaluating Static Source Code Analysis Tools, Ecole Polytechnique Federale de Lausanne, 2010....

N. Rutar, C.B. Almazan, J.S. Foster, A comparison of bug finding tools for Java, 2004, in: Proceedings of the...

S. Wagner, J. Jrjens, C. Koller, P. Trischberger, 2005, Comparing bug finding tools with reviews and tests, in:...

Cited by (48)

Turn tree into graph: Automatic code review via simplified AST driven graph convolutional network
2022, Knowledge-Based Systems
Automatic code review (ACR), which can relieve the costs of manual inspection, is an indispensable and essential task in software engineering. To deal with ACR, existing work is to serialize the abstract syntax tree (AST). However, making sense of the whole AST with sequence encoding approach is a daunting task, mostly due to some redundant nodes in AST hinder the transmission of node information. Not to mention that the serialized representation is inadequate to grasp the information of tree structure in AST. In this paper, we first present a new large-scale Apache Automatic Code Review (AACR) dataset for ACR task since there is still no publicly available dataset in this task. The release of this dataset would push forward the research in this field. Based on it, we propose a novel Simplified AST based Graph Convolutional Network (SimAST-GCN) to deal with ACR task. Concretely, to improve the efficiency of node information dissemination, we first simplify the AST of code by deleting the redundant nodes that do not contain connection attributes, and thus deriving a Simplified AST. Then, we construct a relation graph for each code based on the Simplified AST to properly embody the relations among code fragments of the tree structure into the graph. Subsequently, in the light of the merit of graph structure, we explore a graph convolution networks architecture that follows an attention mechanism to leverage the crucial implications of code fragments to derive code representations. Finally, we exploit a simple but effective subtraction operation in the representations between the original and revised code, enabling the revised difference to be preferably learned for deciding the results of ACR. Experimental results on the AACR dataset illustrate that our proposed model outperforms the state-of-the-art methods.
A new multi-label dataset for Web attacks CAPEC classification using machine learning techniques
2022, Computers and Security
There are many datasets for training and evaluating models to detect web attacks, labeling each request as normal or attack. Web attack protection tools must provide additional information on the type of attack detected, in a clear and simple way.
This paper presents a new multi-label dataset for classifying web attacks based on CAPEC classification, a new way of features extraction based on ASCII values, and the evaluation of several combinations of models and algorithms.
Using a new way to extract features by computing the average of the sum of the ASCII values of each of the characters in each field that compose a web request, several combinations of algorithms (LightGBM and CatBoost) and multi-label classification models are evaluated, to provide a complete CAPEC classification of the web attacks that a system is suffering. The training and test data used for training and evaluating the models come from the new SR-BH 2020 multi-label dataset.
Calculating the average of the sum of the ASCII values of the different characters that make up a web request shows its usefulness for numeric encoding and feature extraction. The new SR-BH 2020 multi-label dataset allows the training and evaluation of multi-label classification models, also allowing the CAPEC classification of the various attacks that a web system is undergoing. The combination of the two-phase model with the MultiOutputClassifier module of the scikit-learn library, together with the CatBoost algorithm shows its superiority in classifying attacks in the different criticality scenarios.
Experimental results indicate that the combination of machine learning algorithms and multi-phase models leads to improved prediction of web attacks. Also, the use of a multi-label dataset is suitable for training learning models that provide information about the type of attack.
On the adoption of static analysis for software security assessment–A case study of an open-source e-government project
2021, Computers and Security
Static Application Security Testing (SAST) is a popular quality assurance technique in software engineering. However, integrating SAST tools into industry-level product development for security assessment poses various technical and managerial challenges. In this work, we reported results from a case study of adopting SAST as a part of a human-driven security assessment process in an open-source e-government project. We described how SASTs are selected, evaluated, and combined into a novel approach and adopted by security experts for software security assessment. The approach was preliminarily evaluated using semi-structured interviews. Our results show that while some SAST tools out-perform others, it is possible to achieve better performance by combining more than one SAST tools. The combined approach has the potential to aid the security assessment process for open-source software.
An empirical study of security warnings from static application security testing tools
2019, Journal of Systems and Software
Citation Excerpt :
Kratkiewicz and Lippmann (2005) have evaluated five SAST tools using a corpus of 291 small C program test cases. Díaz and Bermejo (2013) run nine SAST tools against two of SAMATE reference dataset test suites for C language. All of the research mentioned above was conducted using test cases having real-world vulnerabilities that are known in advance, in order to assess the false positives.
The Open Web Application Security Project (OWASP) defines Static Application Security Testing (SAST) tools as those that can help find security vulnerabilities in the source code or compiled code of software. Such tools detect and classify the vulnerability warnings into one of many types (e.g., input validation and representation). It is well known that these tools produce high numbers of false positive warnings. However, what is not known is if specific types of warnings have a higher predisposition to be false positives or not. Therefore, our goal is to investigate the different types of SAST-produced warnings and their evolution over time to determine if one type of warning is more likely to have false positives than others. To achieve our goal, we carry out a large empirical study where we examine 116 large and popular C++ projects using six different state-of-the-art open source and commercial SAST tools that detect security vulnerabilities. In order to track a piece of code that has been tagged with a warning, we use a new state of the art framework called $c r e g i t^{+}$ that traces source code lines across different commits. The results demonstrate the potential of using SAST tools as an assessment tool to measure the quality of a product and the possible risks without manually reviewing the warnings. In addition, this work shows that pattern-matching static analysis technique is a very powerful method when combined with other advanced analysis methods.
The Impact of Shift-Left Testing to Software Quality in Agile Methodology: A Case Study
2023, Proceedings of 2023 International Conference on Information Management and Technology, ICIMTech 2023
Contrastive Learning for Multi-Modal Automatic Code Review
2022, arXiv

View all citing articles on Scopus

View full text

Static analysis of source code security: Assessment of tools against SAMATE tests

Abstract

Context

Objective

Method

Results

Conclusions

Introduction

Section snippets

Background and related work

Static source code security analysis tools

Assessment methodology

Research question 1: How do the nine tools compare in terms of detecting security vulnerabilities of different categories on the SAMATE test suite 45?

Discussion

Conclusions

Acknowledgment

Information and Software Technology

Secure Programming with Static Analysis

A process for performing security code reviews

IEEE Security & Privacy

Using static análisis to find bugs

IEEE Software

On the value of static analysis for fault detection in software

IEEE Transactions on Software Engineering

Writing Secure Code

Static analysis for security

IEEE Security & Privacy

Putting the tools to work: how to succeed with source code analysis

IEEE Security & Privacy

Software Security: Building Security In