Static analysis of source code security: Assessment of tools against SAMATE tests

https://doi.org/10.1016/j.infsof.2013.02.005Get rights and content

Abstract

Context

Static analysis tools are used to discover security vulnerabilities in source code. They suffer from false negatives and false positives. A false positive is a reported vulnerability in a program that is not really a security problem. A false negative is a vulnerability in the code which is not detected by the tool.

Objective

The main goal of this article is to provide objective assessment results following a well-defined and repeatable methodology that analyzes the performance detecting security vulnerabilities of static analysis tools. The study compares the performance of nine tools (CBMC, K8-Insight, PC-lint, Prevent, Satabs, SCA, Goanna, Cx-enterprise, Codesonar), most of them commercials tools, having a different design.

Method

We executed the static analysis tools against SAMATE Reference Dataset test suites 45 and 46 for C language. One includes test cases with known vulnerabilities and the other one is designed with specific vulnerabilities fixed. Afterwards, the results are analyzed by using a set of well known metrics.

Results

Only SCA is designed to detect all vulnerabilities considered in SAMATE. None of the tools detect “cross-site scripting” vulnerabilities. The best results for F-measure metric are obtained by Prevent, SCA and K8-Insight. The average precision for analyzed tools is 0.7 and the average recall is 0.527. The differences between all tools are relevant, detecting different kinds of vulnerabilities.

Conclusions

The results provide empirical evidences that support popular propositions not objectively demonstrated until now. The methodology is repeatable and allows ranking strictly the analyzed static analysis tools, in terms of vulnerabilities coverage and effectiveness for detecting the highest number of vulnerabilities having few false positives. Its use can help practitioners to select appropriate tools for a security review process of code. We propose some recommendations for improving the reliability and usefulness of static analysis tools and the process of benchmarking.

Introduction

The world is facing today some of the biggest security challenges because of different kinds of computer risks: from identity theft to spyware, from traditional sniffing of secure information to electronic warfare. Many industries depend more and more on various pieces of software coming from vendors, open source or third parties. The number of systems with more complex interactions is growing as well, comprising operating system and application patches coming from Internet or from collaboration through different distributed sites. The security needs for avionic systems, Black [1], or the most recent Flame virus attack [2] demonstrate that information security is a multifaceted problem with a common factor: the users need security assurance on the software they use.

Software engineers must consider a variety of strategies to build sufficiently dependable software before release. Indeed the tools used for software development and maintenance can supply developers with information for assurance cases. This information must be gathered to get software secure enough for its intended use. Among these tools the automatic static source code security scanners can be used for examining legacy code and, also, as a routine task included in the software development lifecycle. Static analysis of source code is a fault-detection technique [3], which does not require program execution. Automated static security analysis tools of source code are increasingly used today and taken into account in software development strategies, following an easy scheme as seen in Fig. 1. They are designed to detect vulnerabilities: flaws, faults, bugs and other errors in software code that, if left unaddressed, could lead to exploitable security vulnerabilities.

Nevertheless, static source code security analysis tools are far from being standard tools. Their internal design determines which set of vulnerabilities can detect. Commercial tools have a lack of public information about the design and their code is not accessible. The classification used by the tools for found vulnerabilities is different in each of the tools, making difficult the comparison of the results. A real benchmark is necessary to compare their performance, in terms of the number of detected vulnerabilities (true and false positives), their usability and to state, independently, which are the best ones for each environment. Finally, the fact that these tools can be used by any developer, even by the ones without previous information security experience, is doubtful. In accordance with Chess and West [4] and Howard [5], and against the claims of some vendors, it is relevant to know if an information security experienced user is necessary to get the best results of these tools.

These tools have begun to be part of the toolset for many software development teams, although a clear understanding of their strengths and limitations is still needed. This work aims to contribute in clarifying some aspects of this matter.

The first goal of this study is to provide objective assessment results, following a well-defined and repeatable methodology that allows an evaluation of the performance of vulnerabilities detection capacity of static code security analyzers. The methodology uses a selected benchmark with a well-known set of security vulnerabilities. A tool has the best performance against a benchmark if it has the best balance between detecting the highest number of true positives and having few false positives. The benchmark must be “repeatable, portable, scalable, representative, require minimum changes in the target tools and simple to use”, in accordance with Gray [52]. Several benchmarks initiatives have been analyzed and the NIST SAMATE Reference Dataset [6] project meets all these requirements. It compiles a suite of synthetic benchmarks with support for multiplatform and for the C/C++, Java, J2EE and PHP languages, with coverage for most of vulnerabilities categories.

The second goal is to use the methodology for comparing the performance of nine different tools for C/C++ languages (CBMC, K8-Insight, PC-lint, Prevent, Satabs, SCA, Goanna, Cx-enterprise, Codesonar), having a different design. The study searches their effectiveness in terms of the number of detected security vulnerabilities and uses a selected and widely accepted set of metrics. This evaluation allows providing some practical recommendations for the use of these tools and also on how to improve their effectiveness.

The study searches also to characterize how automatic the behavior of the tools is or, in other words, if an experienced user is necessary or not to get the best performance of these tools. Some of these tools, especially the commercial ones, claim not needing security expert people working on the results produced by the tools, and we wanted to clarify this statement.

The rest of the paper is organized as follows: Section 2 makes a tour on previous work about static analysis comparison and existing benchmarks; Section 3 presents a theoretical analysis of the advantages/disadvantages of the current static source code security analysis tools; Section 4 describes the methodology used when working with the NIST SAMATE tests, the evaluation metrics and the static source code security analysis tools selected for our work; Section 5 shows the results for the evaluation of these tools against the SAMATE test suites; Section 6 offers a discussion on the main results of the study and, finally, Section 7 draws some conclusions and recommendations about the work done and outlines possible future work.

Section snippets

Background and related work

This section reviews previous work about static analysis tools, also about comparisons of their performance, and finally explains how the present work improves these studies.

Static source code security analysis tools

In this section some of the most significant features of current static source code security analyzers are described. We review also the diverse approaches used by different categories of tools and the drawbacks of these tools.

All these tools follow the same pattern when applied to a piece of source code:

  • 1.

    Transforming the code to be analyzed into a program model, a set of data structures that represent the code.

  • 2.

    Analyzing the model using different rules and/or properties.

  • 3.

    Showing the results to

Assessment methodology

This section describes briefly the SAMATE initiative, the methodology for doing the comparison with SAMATE, the evaluation metrics and the tests selected for our study within SAMATE. Afterwards we describe the static analysis tools we decided to compare. Finally the main research questions of this work are presented. Fig. 3 shows our methodology step by step, illustrating the complete approach followed in this study.

The methodology obliged us to:

  • Use the list of the vulnerabilities each tool

Research question 1: How do the nine tools compare in terms of detecting security vulnerabilities of different categories on the SAMATE test suite 45?

Table 4 shows the statistics of vulnerabilities detection against the 78 specific cases of test-suite 45, for each type of vulnerabilities, and for the nine tools. Each type of vulnerabilities in test suites has a different numbers of test cases, which could favor one tool over other. To normalize the result of detections we calculate the percentage of detections for each type of vulnerability (recall) that each tool is designed for detecting. Last row of Table 4 shows also the arithmetic mean

Discussion

In this section we discuss the main results of the evaluation of the nine static source code analysis tools.

Conclusions

The present study provides objective evidence of the performance of static analysis tools, using a well defined benchmark test suite and a repeatable methodology, and provides results from the state-of-the-art tools.

The methodology applies widely known metrics based on rates of true and false positives and vulnerabilities coverage degree of tools, producing a strict scale for the performance of static analysis tools. Then, a company can choose a tool by analyzing the precision, recall, F

Acknowledgment

The authors acknowledge the support provided by e-Madrid Project, S2009/TIC-1650, “Investigación y Desarrollo de tecnologías para el e-learning en la Comunidad de Madrid.”

References (75)

  • S. Heckman et al.

    A systematic literature review of actionable alert identification techniques for automated static code analysis

    Information and Software Technology

    (2011)
  • P.E. Black, Software assurance with SAMATE reference dataset, tool standards and studies, in: Proc. 26th IEEE/AIAA...
  • Flame Virus,...
  • IEEE, IEEE Standard Glossary of Software Engineering Terminology, IEEE Standard 610.12-1990,...
  • B. Chess et al.

    Secure Programming with Static Analysis

    (2007)
  • M.A. Howard

    A process for performing security code reviews

    IEEE Security & Privacy

    (2006)
  • SAMATE, 2005, Software Assurance Metrics And Tool Evaluation, National Institute of Standards and Technology,...
  • N. Nagappan, T. Ball, Static analysis tools as early indicators of pre-release defect density, in: Proc. 27th...
  • D. Hovemeyer, J. Spacco, W. Pugh, Evaluating and tuning a static analysis to find null pointer bugs, in: Proc. 6th ACM...
  • N. Ayewah et al.

    Using static análisis to find bugs

    IEEE Software

    (2008)
  • M.S. Ware, C.J. Fox, Securing Java code: heuristics and an evaluation of static analysis tools, in: Conference on...
  • J. Zheng et al.

    On the value of static analysis for fault detection in software

    IEEE Transactions on Software Engineering

    (2006)
  • M. Howard et al.

    Writing Secure Code

    (2003)
  • J. Pescatore, Require vulnerability testing during software development, Gartner Research Note,...
  • B. Chess et al.

    Static analysis for security

    IEEE Security & Privacy

    (2004)
  • P. Chandra et al.

    Putting the tools to work: how to succeed with source code analysis

    IEEE Security & Privacy

    (2006)
  • G. McGraw

    Software Security: Building Security In

    (2006)
  • D.N. Kleidermacher, Integrating static analysis into a secure software development process, in: IEEE Conference on...
  • NIST SP 500-279, NIST Special Publication 500-279, 2009, Static Analysis Tool Exposition (SATE) 2008,...
  • NIST Interagency Report 7755, 2010, Toward a Preliminary Framework for Assessing the Trustworthiness of Software....
  • NIST SP 500-287, NIST Special Publication 500-287, 2010, The Second Static Analysis Tool Exposition (SATE) 2009....
  • A. Aggarwal, P. Jalote, Integrating static and dynamic analysis for detecting vulnerabilities, in: Proc. 30th Annual...
  • A. Petukhov, D. Kozlov, Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis with Penetration...
  • SANS TOP 25. <http://www.sans.org/top25-software-errors/> (accessed September...
  • K. Kratkiewicz, Evaluating Static Analysis Tools for Detecting Buffer Overflows in C Code, Master’s Thesis, Harvard...
  • MathWorks Polyspace. <http://www.mathworks.es/products/polyspace/> (accessed December...
  • X. Yichen, A. Chou, D. Engler, ARCHER: using symbolic, path-sensitive analysis to detect memory access errors, in:...
  • G. Holzmann, UNO: static source code checking for userdefined properties, in: 6th World Conf. on Integrated Design and...
  • D. Evans, D. Larochelle, Improving security using extensible lightweight static analysis, IEEE Software...
  • D. Wagner, J. Foster, E. Brewer, A. Aiken, A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities,...
  • M. Zitser, R. Lippmann, T. Leek, Testing static analysis tools using exploitable buffer overflows from open source...
  • P. Emanuelsson, U. Nilsson, A Comparative Study of Industrial Static Analysis Tools (Extended Version), Technical...
  • Coverity Products, <http://www.coverity.com/products/coverity-prevent.html> (accessed September...
  • Klocwork Insight, <http://www.klocwork.com/products/insight/> (accessed September...
  • T. Hofer, Evaluating Static Source Code Analysis Tools, Ecole Polytechnique Federale de Lausanne, 2010....
  • N. Rutar, C.B. Almazan, J.S. Foster, A comparison of bug finding tools for Java, 2004, in: Proceedings of the...
  • S. Wagner, J. Jrjens, C. Koller, P. Trischberger, 2005, Comparing bug finding tools with reviews and tests, in:...
  • Cited by (48)

    • An empirical study of security warnings from static application security testing tools

      2019, Journal of Systems and Software
      Citation Excerpt :

      Kratkiewicz and Lippmann (2005) have evaluated five SAST tools using a corpus of 291 small C program test cases. Díaz and Bermejo (2013) run nine SAST tools against two of SAMATE reference dataset test suites for C language. All of the research mentioned above was conducted using test cases having real-world vulnerabilities that are known in advance, in order to assess the false positives.

    • The Impact of Shift-Left Testing to Software Quality in Agile Methodology: A Case Study

      2023, Proceedings of 2023 International Conference on Information Management and Technology, ICIMTech 2023
    View all citing articles on Scopus
    View full text