Software Verification with Validation of Results

Beyer, Dirk

doi:10.1007/978-3-662-54580-5_20

Dirk Beyer¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10206))

Included in the following conference series:

International Conference on Tools and Algorithms for the Construction and Analysis of Systems

3451 Accesses
63 Citations

Abstract

This report describes the 2017 Competition on Software Verification (SV-COMP), the 6\(^{\text {th}}\) edition of the annual thorough comparative evaluation of fully-automatic software verifiers. The goal is to reflect the current state of the art in software verification in terms of effectiveness and efficiency. The major achievement of the 6\(^{\text {th}}\) edition of SV-COMP is that the verification results were validated in most categories. The verifiers have to produce verification witnesses, which contain hints that a validator can later use to reproduce the verification result. The answer of a verifier counts only if the validator confirms the verification result. SV-COMP uses two independent, publicly available witness validators. For 2017, a new category structure was introduced that now orders the verification tasks according to the property to verify on the top level, and by the type of programs (e.g., which kind of data types are used) on a second level. The categories Overflows and Termination were heavily extended, and the category SoftwareSystems now contains also verification tasks from the software system BusyBox. The competition used 8 908 verification tasks that each consisted of a C program and a property (reachability, memory safety, termination). SV-COMP 2017 had 32 participating verification systems from 12 countries.

You have full access to this open access chapter, Download conference paper PDF

Software Verification and Verifiable Witnesses

Reliable and Reproducible Competition Results with BenchExec and Witnesses (Report on SV-COMP 2016)

Competition on Software Verification and Witness Validation: SV-COMP 2023

1 Introduction

Software verification is an increasingly important research area, and the annual Competition on Software Verification (SV-COMP)^{Footnote 1} is the showcase of the state of the art in the area, in particular, of the effectiveness and efficiency that is currently achieved by tool implementations of the most recent ideas, concepts, and algorithms for fully-automatic verification. Every year, the SV-COMP project consists of two parts: (1) The collection of verification tasks and their partition into categories has to take place before the actual experiments start, and requires quality-assurance work on the source code in order to ensure a high-quality evaluation. It is important that the SV-COMP verification tasks reflect what the research and development community considers interesting and challenging for evaluating the effectivity (soundness and completeness) and efficiency (performance) of state-of-the-art verification tools. (2) The actual experiments of the comparative evaluation of the relevant tool implementations is performed by the organizer of SV-COMP. Since SV-COMP shall stimulate and showcase new technology, it is necessary to explore and define standards for a reliable and reproducible execution of such a competition: we use [10], a modern framework for reliable benchmarking and resource measurement, to run the experiments, and verification witnesses [7, 8] to validate the verification results.

As for every edition, this SV-COMP report describes the (updated) rules and definitions, presents the competition results, and discusses other interesting facts about the execution of the competition experiments. Also, we need to measure the success of SV-COMP by evaluating whether the main objectives of the competition are achieved (list taken from [5]):

1.
provide an overview of the state of the art in software-verification technology and increase visibility of the most recent software verifiers,
2.
establish a repository of software-verification tasks that is publicly available for free use as standard benchmark suite for evaluating verification software,
3.
establish standards that make it possible to compare different verification tools, including a property language and formats for the results, and
4.
accelerate the transfer of new verification technology to industrial practice.

As for (1), there were 32 participating software systems from 12 countries, representing a broad spectrum of technology (cf. Table 4). SV-COMP is considered an important event in the research community, and increasingly also in industry. This year, SV-COMP for the first time had two participating verification systems from industry. As for (2), the total set of verification tasks increased in size from 6 661 to 8 908. Still, SV-COMP has an ongoing focus on collecting and constructing verification tasks to ensure even more diversity. Compared to the last years, the level and amount of quality-assurance activities from the SV-COMP community increased significantly, as witnessed by the issue tracker^{Footnote 2} and by the pull requests^{Footnote 3} in the GitHub project. As for (3), the largest step forward was to apply an extension of the standard witness language as a common, exchangeable format to correctness witnesses as well this year (violation witnesses have been used before). This means, if a verifier reports False (claims to know an error path through the program that violates the specification), then it produces a violation witness; if a verifier reports True (claims to know a proof of correctness), then it produces a correctness witness. The two points of the SV-COMP scoring schema for correct answers True are assigned only if the correctness witness was confirmed by a witness validator, i.e., a proof of correctness could be reconstructed by a different tool. As for (4), we continuously received positive feedback from industry.

Related Competitions. It is well-understood that competitions are an important evaluation method, and there are other competitions in the field of software verification: RERS^{Footnote 4} [20] and VerifyThis^{Footnote 5} [22]. While SV-COMP performs replicable experiments in a controlled environment (dedicated resources, resource limits), the RERS Challenges give more room for exploring combinations of interactive with automatic approaches without limits on the resources, and the VerifyThis Competition focuses on evaluating approaches and ideas rather than on fully-automatic verification. The termination competition termCOMP^{Footnote 6} [16] concentrates on termination but considers a broader range of systems, including logic and functional programs. A more comprehensive list of other competitions is provided in the report on SV-COMP 2014 [4].

2 Procedure

The overall competition organization did not change in comparison to the past editions [2,3,4,5,6]. SV-COMP is an open competition, where all verification tasks are known before the submission of the participating verifiers, which is necessary due to the complexity of the language C. During the benchmark submission phase, new verification tasks were collected and classified, during the training phase, the teams inspected the verification tasks and trained their verifiers (also, the verification tasks received fixes and quality improvement), and during the evaluation phase, verification runs were preformed with all competition candidates, and the system descriptions were reviewed by the competition jury. The participants received the results of their verifier directly via e-mail, and after a few days of inspection, the results were publicly announced on the competition web site. The Competition Jury consisted again of the chair and one member of each participating team. Team representatives of the jury are listed in Table 3.

3 Definitions, Formats, and Rules

Verification Task. The definition of verification task was not changed (taken from [4]). A verification task consists of a C program and a property. A verification run is a non-interactive execution of a competition candidate (verifier) on a single verification task, in order to check whether the following statement is correct: “The program satisfies the property.” The result of a verification run is a triple (answer, witness, time). answer is one of the following outcomes:

True: The property is satisfied (no path exists that violates the property), and a correctness witness is produced that contains hints to reconstruct the proof.
False: The property is violated (there exists a path that violates the property), and a violation witness is produced that contains hints to replay the error path to the property violation.
Unknown: The tool cannot decide the problem, or terminates abnormally, or exhausts the computing resources time or memory (the competition candidate does not succeed in computing an answer True or False).

The component witness [7, 8] was this year for the first time mandatory for both answers True or False; a few categories were excluded from validation if the validators did not sufficiently support a certain kind of program or property. We used the two publicly available witness validators CPAchecker and UAutomizer. time is measured as consumed CPU time until the verifier terminates, including the consumed CPU time of all processes that the verifier started [10]. If time is equal to or larger than the time limit (15 min), then the verifier is terminated and the answer is set to ‘timeout’ (and interpreted as Unknown).

Table 1. Properties used in SV-COMP 2017 (cf. [5] for more details)

Full size table

Categories. The collection of verification tasks is partitioned into categories. A major update was done on the structure of the categories, in order to support various extensions that were planned for SV-COMP 2017. For example, the categories Overflows and Termination were considerably extended (Overflows from 12 to 328 and Termination from 631 to 1 437 verification tasks). Figure 1 shows the previous structure of main and sub-categories on the left, and the new structure is shown on the right. The guideline is to have main categories that correspond to different properties and sub-categories that reflect the type of program. The goal of the category SoftwareSystems is to complement the other categories (which sometimes contain small and constructed examples to show certain verification features) by large and complicated verification tasks from real software systems (further structured according to system and property to verify). The category assignment was proposed and implemented by the competition chair, and approved by the competition jury. SV-COMP 2017 has a total of eight categories for which award plaques are handed out, including the six main categories, category Overall, which contains the union of all categories, and category Falsification. Category Falsification consists of all verification tasks with safety properties, and any answers True are not counted for the score (the goal of this category is to show bug-hunting capabilities of verifiers that are not able to construct correctness proofs). The categories are described in more detail on the competition web site.^{Footnote 7}

Table 2. Scoring schema for SV-COMP 2017

Full size table

Properties and Their Format. For the definition of the properties and the property format, we refer to the previous competition report [5]. All specifications are available in the main directory of the benchmark repository. Table 1 lists the properties and their syntax as overview.

Evaluation by Scores and Run Time. The scoring schema of SV-COMP 2017 is similar to the previous scoring schema, except that results with answer True are now assigned two points only if the witness was confirmed by a validator, and one point is assigned if the answer matches the expected result but the witness was not confirmed. Table 2 provides the overview and Fig. 2 visually illustrates the score assignment for one property. The ranking is decided based on the sum of points (normalized for meta categories) and for equal sum of points according to success run time, which is the total CPU time over all verification tasks for which the verifier reported a correct verification result. Opt-out from Categories and Score Normalization for Meta Categories was done as described previously [3] (page 597).

4 Reproducibility

It is important that the SV-COMP experiments can be independently replicated, and that the results can be reproduced. Therefore, all major components that are used for the competition need to be publicly available. Figure 3 gives an overview over the components that contribute to the reproducible setup of SV-COMP.

Repositories for Verification Tasks (a), Benchmark Definitions (b), and Tool-Information Modules (c). The previous competition report [6] describes how replicability is ensured by making all essential ingredients available in public archives. The verification tasks (a) are available via the tag ‘svcomp17’ in a public Git repository.^{Footnote 8} The benchmark definitions (b) define for each verifier (i) on which verification tasks the verifier is to be executed (each verifier can choose which categories to participate in) and (ii) which parameters need to be passed to the verifier (there are global parameters that are specified for all categories, and there are specific parameters such as the bit architecture). The benchmark definitions are available via the tag ‘svcomp17’ in another public Git repository.^{Footnote 9} The tool-information modules (c) ensure, for each verifier respectively, that the command line to execute the verifier is correctly assembled (including source and property file as well as the options) from the parts specified in the benchmark definition (b), and that the results of the verifier are correctly interpreted and translated into the uniform SV-COMP result (True, False(p), Unknown). The tool-info modules that were used for SV-COMP 2017 are available in 1.10.^{Footnote 10}

Reliable Assignment and Controlling of Computing Resources (e). We use ^{Footnote 11} [10] to satisfy the requirements for scientifically valid experimentation, such as (i) accurate measurement and reliable enforcement of limits for CPU time and memory, and (ii) reliable termination of processes (including all child processes). For the first time in SV-COMP, we used ’s container mode, in order to make sure that read and write operations are properly controlled. For example, it was previously not automatically and reliably enforced that tools do not increase the assigned memory by using a RAM disk. This and some other issues that previously required manual inspection and analysis are now systematically solved.

Violation Witnesses (f) and Correctness Witnesses (g). In SV-COMP, each verification run (if applicable) is followed by a validation run that checks whether the witness adheres to the exchange format and can be confirmed. The resource limits for the witness validators were 2 processing units (one physical CPU core with hyper-threading), 7 GB memory, and 10% of the verification time (i.e., 1.5 min) for violation witnesses and 100% (15 min) for correctness witnesses. The purpose of the tighter resource limits is to avoid delegating all verification work to the validator. This witness-based validation process ensures a higher quality of assignment of scores, compared to without witnesses: if a verifier claims a found bug but is not able to provide a witness, then the verifier does not get the full score. The witness format and the validation process is explained on the witness-format web page^{Footnote 12}. The version of the exchange format that was used for SV-COMP 2017 has the tag ‘svcomp17’. More details on witness validation is given in two related research articles [7, 8].

Verifier Archives (d). Due to legal issues we do not re-distribute the verifiers on the competition web site, but list for each verifier a URL to an archive that the participants promised to keep publicly available, together with the SHA1 hash of the archive that was used in SV-COMP. An overview table is provided on the systems-description page of the competition web site^{Footnote 13}. For replicating experiments, the archive can be downloaded and verified against the given SHA1 hash. Each archive contains all parts that are needed to execute the verifier (statically-linked executables and all components that are required in a certain version, or for which no standard Ubuntu package is available). The archives are also supposed to contain a license that permits use in SV-COMP, replicating the SV-COMP experiments, that all data that the verifier produces as output are property of the person that executes the verifier, and that the results obtained from the verifier can be published without any restriction.

Table 3. Competition candidates with tool references and representing jury members

Full size table

Table 4. Technologies and features that the competition candidates offer

Full size table

5 Results and Discussion

For the sixth time, the competition experiments represent the state of the art in fully-automatic software-verification tools. The report shows the improvements of the last year, in terms of effectiveness (number of verification tasks that can be solved, correctness of the results, as accumulated in the score) and efficiency (resource consumption in terms of CPU time). The results that are presented in this article were inspected and approved by the participating teams.

Participating Verifiers. Table 3 provides an overview of the participating competition candidates and Table 4 lists the features and technologies that are used in the verification tools.

Table 5. Quantitative overview over all results; empty cells mark opt-outs

Full size table

Table 6. Overview of the top-three verifiers for each category (CPU time in h, rounded to two significant digits)

Full size table

Computing Resources. The resource limits were the same as last year [6]: Each verification run was limited to 8 processing units (cores), 15 GB of memory, and 15 min of CPU time. The witness validation was limited to 2 processing units, 7 GB of memory, and 1.5 min of CPU time for violation witnesses and 15 min of CPU time for correctness witnesses. The machines for running the experiments were different from last year, because we now had 168 machines available and each verification run could be executed on a completely unloaded, dedicated machine, in order to achieve precise measurements. Each machine had one Intel Xeon E3-1230 v5 CPU, with 8 processing units each, a frequency of 3.4 GHz 33 GB of RAM, and a GNU/Linux operating system (x86_64-linux, Ubuntu 16.04 with Linux kernel 4.4).

Table 7. Necessary effort to compute results False versus True (measurement values rounded to two significant digits)

Full size table

One complete verification execution of the competition consisted of 421 benchmarks (each verifier on each selected category according to the opt-outs), summing up to 170 417 verification runs. Witness validation required 678 benchmarks (combinations of verifier, category with witness validation, and two validators) summing up to 232 916 validation runs. The consumed total CPU time for one complete competition run for verification required a total of 490 days of CPU time. Each tool was executed several times, in order to make sure no installation issues occur during the execution. We used [10] to measure and control computing resources (CPU time, memory, CPU energy) and VerifierCloud ^{Footnote 14} to distribute, install, run, and clean-up verification runs, and to collect the results.

Quantitative Results. Table 5 presents the quantitative overview over all tools and all categories ( participated only in subcategory ReachSafety-Heap, MemSafety-Heap, and MemSafety-LinkedLists; participated only in some subcategories of ReachSafety). The head row mentions the category, the maximal score for the category, and the number of verification tasks. The tools are listed in alphabetical order; every table row lists the scores of one verifier for each category. We indicate the top-three candidates by formatting their scores in bold face and in larger font size. An empty table cell means that the verifier opted-out from the respective category. There was one category for which the winner was decided based on the run time: in category ConcurrencySafety, all top-three verifiers achieved the maximum score of 1293 points, but the run time differed. More information (including interactive tables, quantile plots for every category, and also the raw data in XML format) is available on the competition web-site.^{Footnote 15}

Table 6 reports the top-three verifiers for each category. The run time (column ‘CPU Time’) refers to successfully solved verification tasks (column ‘Solved Tasks’). The columns ‘False Alarms’ and ‘Wrong Proofs’ report the number of verification tasks for which the verifier reported wrong results: reporting an error path but the property holds (incorrect False) and claiming that the program fulfills the property although it actually contains a bug (incorrect True), respectively.

Discussion of Scoring Schema and Normalization. The verification community considers it more difficult to compute correctness proofs compared to computing error paths: according to Table 2, an answer True yields 2 points (confirmed witness) and 1 point (unconfirmed witness), while an answer False yields 1 point (confirmed witness). This can have consequences on the final ranking, as discussed in the report on the last SV-COMP edition [6].

Assigning a higher score value to results True (compared to results False) seems justified by the CPU time and energy that the verifiers need to compute the result. Table 7 shows actual numbers on this: the first column lists the three best verifiers of category Overall, the second and third columns report the average CPU time and average CPU energy for results True, and the forth and fifth columns for results False. The average is taken over all verification tasks; the CPU time is reported in seconds and the CPU energy in Joule ( reads and accumulates the energy measurements of Intel CPUs). Especially for the verifier , the effort to compute results True is significantly higher compared to the effort to compute results False: 210 s versus 51 s of average CPU time per verification task and 2 200 J versus 580 J of average CPU energy.

A similar consideration was made on the score normalization. The community considers the value of each category equal, which has the consequence that solving a verification task in a large category (many, often similar verification tasks) has less value than solving a verification task in a small category (only a few verification tasks) [3]. The values for category Overall in Table 6 illustrate the purpose of the score normalization: solved 5 393 tasks, which is 791 solved tasks more than the winner could solve (4 602). So why did not win the category? Because is better in the intuitive sense of ‘overall’: it solved tasks more diversely, the ‘overall’ value of the verification work is higher. Thus, received 7 099 points and received 5 296 points. Similarly, in category SoftwareSystems, solved 177 more tasks than ; the tasks that solved were considered of less value (i.e., from large categories). was able to solve considerably more verification tasks in the seemingly difficult BusyBox categories. In these cases, the score normalization correctly maps the community’s intuition.

Score-Based Quantile Functions for Quality Assessment. We use score-based quantile functions [3] because these visualizations make it easier to understand the results of the comparative evaluation. The web-site (see footnote 15) includes such a plot for each category; as example, we show the plot for category Overall (all verification tasks) in Fig. 4. A total of 15 verifiers participated in category Overall, for which the quantile plot shows the overall performance over all categories (scores for meta categories are normalized [3]). A more detailed discussion of score-based quantile plots, including examples of what interesting insights one can obtain from the plots, is provided in previous competition reports [3, 6].

Correctness of Results. Out of those verifiers that participated in all categories, is the only verifier that did not report any wrong result, did not report any false alarm, and , and did not report any wrong proof.

Table 8. Confirmation rate of witnesses

Full size table

Verifiable Witnesses. For SV-COMP, it is not sufficient to answer with just True or False: each answer must be accompanied by a verification witness. For correctness witnesses, an unconfirmed answer True was still accepted, but was assigned only 1 point instead of 2 (cf. Table 2). All verifiers in categories that required witness validation support the common exchange format for violation and correctness witnesses. We used the two independently developed witness validators that are integrated in CPAchecker and [7, 8].

It is interesting to see that the majority of witnesses that the top-three verifiers produced can be confirmed by the witness-validation process (more than 90%). Table 8 shows the confirmed versus unconfirmed result: the first column lists the three best verifiers of category Overall, the three columns for result True reports the total, confirmed, and unconfirmed number of verification tasks for which the verifier answered with True, respectively, and the three columns for result False reports the total, confirmed, and unconfirmed number of verification tasks for which the verifier answered with False, respectively. More information (for all verifiers) is given in the detailed tables on the competition web-site (see footnote 15), cf. also the report on the demo category for correctness witnesses from SV-COMP 2016 [6].

6 Conclusion

SV-COMP 2017, the 6\(^{\text {th}}\) edition of the Competition on Software Verification, attracted 32 participating teams from 12 countries (number of teams 2012: 10, 2013: 11, 2014: 15, 2015: 22, 2016: 35). SV-COMP continues to be the broadest overview of the state of the art in automatic software verification. For the first time in verification history, proof hints (stored in an exchangeable witness) from verifiers were used on a large scale to help a different tool (validator) to validate whether it can, given the proof hints, reproduce a correctness proof. Given the results (cf. Table 8), this approach is successful. The two points for the results True were counted only if the correctness witness was confirmed; for unconfirmed results True, only 1 point was assigned. The number of verification tasks was increased from 6 661 to 8 908. The partitioning of the verification tasks into categories was considerably restructured; the categories Overflows, MemSafety, and Termination were extended and structured using sub-categories; many verification tasks from the software system BusyBox were added to the category SoftwareSystems. As before, the large jury and the organizer made sure that the competition follows the high quality standards of the TACAS conference, in particular with respect to the important principles of fairness, community support, and transparency.

Notes

References

Andrianov, P., Mutilin, V., Friedberger, K., Mandrykin, M., Volkov, A.: CPA-BAM-BnB: Block-abstraction memorization and region-based memory models for predicate abstractions (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 355–359. Springer, Heidelberg (2017)
Google Scholar
Beyer, D.: Competition on software verification (SV-COMP). In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 504–524. Springer, Heidelberg (2012)
Chapter Google Scholar
Beyer, D.: Second competition on software verification. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 594–609. Springer, Heidelberg (2013)
Chapter Google Scholar
Beyer, D.: Status report on software verification. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 373–388. Springer, Heidelberg (2014)
Chapter Google Scholar
Beyer, D.: Software verification and verifiable witnesses. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 401–416. Springer, Heidelberg (2015)
Google Scholar
Beyer, D.: Reliable and reproducible competition results with BenchExec and witnesses (report on SV-COMP 2016). In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 887–904. Springer, Heidelberg (2016)
Chapter Google Scholar
Beyer, D., Dangl, M., Dietsch, D., Heizmann, M.: Correctness witnesses: Exchanging verification results between verifiers. In: FSE, pp. 326–337. ACM (2016)
Google Scholar
Beyer, D., Dangl, M., Dietsch, D., Heizmann, M., Stahlbauer, A.: Witness validation and stepwise testification across software verifiers. In: FSE, pp. 721–733. ACM (2015)
Google Scholar
Beyer, D., Dangl, M., Wendler, P.: Boosting k-induction with continuously-refined invariants. In: Kröning, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 622–640. Springer, Cham (2015)
Chapter Google Scholar
Beyer, D., Löwe, S., Wendler, P.: Benchmarking and resource measurement. In: Fischer, B., Geldenhuys, J. (eds.) SPIN 2015. LNCS, vol. 9232, pp. 160–178. Springer, Cham (2015)
Chapter Google Scholar
Cassez, F., Sloane, T., Roberts, M., Pigram, M., Aledo, P.G.D., Suvanpong, P.: Skink 2.0: Static analysis of LLVM intermediate representation (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 380–384. Springer, Heidelberg (2017)
Google Scholar
Chalupa, M., Vitovská, M., Jonáš, M., Slaby, J., Strejček, J.: Symbiotic 4: Beyond reachability (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 385–389. Springer, Heidelberg (2017)
Google Scholar
Chimdyalwar, B., Darke, P., Chauhan, A., Shah, P., Kumar, S., Venkatesh, R.: VeriAbs: Verification by abstraction (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 404–408. Springer, Heidelberg (2017)
Google Scholar
Dangl, M., Löwe, S., Wendler, P.: CPAchecker with support for recursive programs and floating-point arithmetic. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 423–425. Springer, Heidelberg (2015)
Google Scholar
Gadelha, M.Y.R., Ismail, H.I., Cordeiro, L.C.: Handling loops in bounded model checking of C programs via k-induction. STTT 19(1), 97–114 (2017)
Article Google Scholar
Giesl, J., Mesnard, F., Rubio, A., Thiemann, R., Waldmann, J.: Termination competition (termCOMP 2015). In: Felty, A.P., Middeldorp, A. (eds.) CADE 2015. LNCS (LNAI), vol. 9195, pp. 105–108. Springer, Cham (2015)
Chapter Google Scholar
Greitschus, M., Dietsch, D., Heizmann, M., Nutz, A., Schätzle, C., Schilling, C., Schüssele, F., Podelski, A.: Ultimate Taipan: Trace abstraction and abstract interpretation (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 399–403. Springer, Heidelberg (2017)
Google Scholar
Heizmann, M., Chen, Y.-W., Dietsch, D., Greitschus, M., Musa, B., Nutz, A., Schätzle, C., Schilling, C., Schüssele, F., Podelski, A.: Ultimate Automizer with an on-demand construction of Floyd-Hoare automata (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 394–398. Springer, Heidelberg (2017)
Google Scholar
Hensel, J., Emrich, F., Frohn, F., Stroeder, T., Giesl, J.: AProVE: Proving and disproving termination of memory-manipulating C programs (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 350–354. Springer, Heidelberg (2017)
Google Scholar
Howar, F., Isberner, M., Merten, M., Steffen, B., Beyer, D.: The RERS grey-box challenge 2012: Analysis of event-condition-action systems. In: Margaria, T., Steffen, B. (eds.) ISoLA 2012. LNCS, vol. 7609, pp. 608–614. Springer, Heidelberg (2012)
Chapter Google Scholar
Hruska, M., Holik, L., Vojnar, T., Lengal, O., Rogalewicz, A., Simacek, J.: Forester: From heap shapes to automata predicates (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 365–369. Springer, Heidelberg (2017)
Google Scholar
Huisman, M., Klebanov, V., Monahan, R.: VerifyThis 2012: A program verification competition. STTT 17(6), 647–657 (2015)
Article Google Scholar
Inverso, O., Nguyen, T.L., Fischer, B., La Torre, S., Parlato, G.: Lazy-CSeq: A context-bounded model checking tool for multi-threaded C programs. In: ASE, pp. 807–812. IEEE (2015)
Google Scholar
Jonáš, M., Mrázek, J., Štill, V., Barnat, J., Lauko, H.: Optimizing and caching SMT queries in SymDIVINE (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 390–393. Springer, Heidelberg (2017)
Google Scholar
Kotoun, M., Peringer, P., Šoková, V., Vojnar, T.: Optimized PredatorHP and the SV-COMP heap and memory-safety benchmark (competition contribution). In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 942–945. Springer, Heidelberg (2016)
Chapter Google Scholar
Kröning, D., Tautschnig, M.: CBMC: C bounded model checker (competition contribution). In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 389–391. Springer, Heidelberg (2014)
Chapter Google Scholar
Le, T.C., Ta, Q.-T., Chin, W.-N.: HipTNT+: A termination and non-termination analyzer by second-order abduction (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 370–374. Springer, Heidelberg (2017)
Google Scholar
Morse, J., Ramalho, M., Cordeiro, L., Nicole, D., Fischer, B.: ESBMC 1.22 (competition contribution). In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 405–407. Springer, Heidelberg (2014)
Chapter Google Scholar
Nguyen, T.L., Fischer, B., La Torre, S., Parlato, G.: Lazy sequentialization for the safety verification of unbounded concurrent programs. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 174–191. Springer, Cham (2016)
Chapter Google Scholar
Nguyen, T.L., Inverso, O., Fischer, B., La Torre, S., Parlato, G.: Lazy-CSeq 2.0: Combining lazy sequentialization with abstract interpretation (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 375–379. Springer, Heidelberg (2017)
Google Scholar
Nutz, A., Dietsch, D., Mohamed, M.M., Podelski, A.: Ultimate Kojak with memory-safety checks (competition contribution). In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 458–460. Springer, Heidelberg (2015)
Google Scholar
Rakamarić, Z., Emmi, M.: SMACK: Decoupling source language details from verifier implementations. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 106–113. Springer, Cham (2014)
Google Scholar
Rocha, W., Rocha, H.O., Ismail, H., Cordeiro, L., Fischer, B.: DepthK: A k-induction verifier based on invariant inference for C programs (competition contribution). In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II. LNCS, vol. 10206, pp. 360–364. Springer, Heidelberg (2017)
Google Scholar
Schrammel, P., Kröning, D.: 2LS for program analysis (competition contribution). In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 905–907. Springer, Heidelberg (2016)
Chapter Google Scholar
Shved, P., Mandrykin, M., Mutilin, V.: Predicate analysis with Blast 2.7 (competition contribution). In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 525–527. Springer, Heidelberg (2012)
Chapter Google Scholar
Tomasco, E., Nguyen, T.L., Inverso, O., Fischer, B., La Torre, S., Parlato, G.: MU-CSeq 0.4: Individual memory location unwindings (competition contribution). In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 938–941. Springer, Heidelberg (2016)
Chapter Google Scholar
Zheng, M., Edenhofner, J.G., Luo, Z., Gerrard, M.J., Rogers, M.S., Dwyer, M.B., Siegel, S.F.: CIVL: Applying a general concurrency verification framework to C/P threads programs (competition contribution). In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 908–911. Springer, Heidelberg (2016)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

LMU Munich, Munich, Germany
Dirk Beyer

Authors

Dirk Beyer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Inria, Rennes Cedex, France
Axel Legay
University of Limerick and Lero - The Irish Software Research Center, Limerick, Ireland
Tiziana Margaria

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beyer, D. (2017). Software Verification with Validation of Results. In: Legay, A., Margaria, T. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2017. Lecture Notes in Computer Science(), vol 10206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-54580-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-662-54580-5_20
Published: 31 March 2017
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-54579-9
Online ISBN: 978-3-662-54580-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The European Joint Conferences on Theory and Practice of Software. (opens in a new tab)

Software Verification with Validation of Results

Abstract

Similar content being viewed by others

Software Verification and Verifiable Witnesses

Reliable and Reproducible Competition Results with BenchExec and Witnesses (Report on SV-COMP 2016)

Competition on Software Verification and Witness Validation: SV-COMP 2023

1 Introduction

2 Procedure

3 Definitions, Formats, and Rules

4 Reproducibility

5 Results and Discussion

6 Conclusion

Notes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Software Verification with Validation of Results

Abstract

Similar content being viewed by others

Software Verification and Verifiable Witnesses

Reliable and Reproducible Competition Results with BenchExec and Witnesses (Report on SV-COMP 2016)

Competition on Software Verification and Witness Validation: SV-COMP 2023

1 Introduction

2 Procedure

3 Definitions, Formats, and Rules

4 Reproducibility

5 Results and Discussion

6 Conclusion

Notes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation