Compiling C and C++ Programs for Dynamic White-Box Analysis

Baranová, Zuzana; Ročkai, Petr

doi:10.1007/978-3-030-54994-7_4

Compiling C and C++ Programs for Dynamic White-Box Analysis

Zuzana Baranová²⁵ &
Petr Ročkai²⁵

Conference paper
First Online: 13 August 2020

653 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12232))

Abstract

Building software packages from source is a complex and highly technical process. For this reason, most software comes with build instructions which have both a human-readable and an executable component. The latter in turn requires substantial infrastructure, which helps software authors deal with two major sources of complexity: first, generation and management of various build artefacts and their dependencies, and second, the differences between platforms, compiler toolchains and build environments.

This poses a significant problem for white-box analysis tools, which often require that the source code of the program under test is compiled into an intermediate format, like the LLVM IR. In this paper, we present divcc, a drop-in replacement for C and C++ compilation tools which transparently fits into existing build tools and software deployment solutions. Additionally, divcc generates intermediate and native code in a single pass, ensuring that the final executable is built from the intermediate code that is being analysed.

This work has been partially supported by the Czech Science Foundation grant No. 18-02177S.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Of course, tools which work with machine code, known as black-box tools, do exist, but their use in software development is limited – they are mainly used in software forensics. In this paper, we focus on white-box methods, which work with source code or an intermediate representation thereof.
2.
A separate (often third-party) software package which needs to be installed in the system before the build can proceed – usually a library, sometimes a tool used in the build process.
3.
Not doing so could lead to configuration mismatches between the two compilers causing build failures, or worse, miscompilation.
4.
This is true even in cases where such tools can work with partial programs – i.e. programs which use functions whose definitions are not available to the tool; however, this mode of operation negatively affects the precision of the analysis.
5.
And subsequently also in static libraries, which on POSIX systems are simply archives of object files.
6.
Source code & supplementary material at https://divine.fi.muni.cz/2019/divcc.
7.
Most build systems also attempt to speed up repeated builds by avoiding re-compiling files that are unchanged (and whose dependencies also remain up-to-date). This capability is important during development and testing, though of course it adds further complexity to the process.
8.
We use a section named .llvmbc, which is the same as the LTO subsystem. This section is recognized by some LLVM tools and is the closest there is to a ‘standard’ way to embed bitcode in object files.
9.
This is important with e.g. libraries provided by DiOS, which are normally only compiled into bitcode and packaged into bitcode archives.
10.
Unfortunately, libpthread, which is also provided by DiOS, is not yet ABI compatible with the host version – see also Sect. 4.2.
11.
This decision may be reversed in a later version, if the situation with support for shared libraries in analysis tools improves.
12.
In some cases, it may be possible to reconstruct platform-neutral LLVM IR using the Remill decompilation library. This is especially pertinent to legacy software which may use inline assembly in applications which would be better served with compiler built-in functions. We will investigate using Remill in this capacity as an option in the future.
13.
This is the reason for zero build time of Eigen for all compilers.

References

Abdulla, P.A., Aronis, S., Atig, M.F., Jonsson, B., Leonardsson, C., Sagonas, K.: Stateless model checking for TSO and PSO. Acta Informatica 54(8), 789–818 (2016). https://doi.org/10.1007/s00236-016-0275-0
Article MathSciNet MATH Google Scholar
Baranová, Z., et al.: Model checking of C and C++ with DIVINE 4 (2017)
Google Scholar
Cadar, C., Dunbar, D., Engler, D.R.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: OSDI, pp. 209–224. USENIX Association (2008)
Google Scholar
Chalupa, M., Vitovská, M., Jonáš, M., Slaby, J., Strejček, J.: Symbiotic 4: beyond reachability. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 385–389. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_28
Chapter Google Scholar
Günther, H., Laarman, A., Weissenbacher, G.: Vienna verification tool: IC3 for parallel software (competition contribution). In: TACAS, pp. 954–957 (2016). https://doi.org/10.1007/978-3-662-49674-9_69
Kokologiannakis, M., Lahav, O., Sagonas, K., Vafeiadis, V.: Effective stateless model checking for C/C++ concurrency. Proc. ACM Program. Lang. 2(POPL), 17:1–17:32 (2017). https://doi.org/10.1145/3158105
Kremenek, T., et al.: Scan-build (2009). https://clang-analyzer.llvm.org/scan-build.html
Kroening, D., Tautschnig, M.: CBMC – C bounded model checker. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 389–391. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_26
Chapter Google Scholar
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: PLDI (2007)
Google Scholar
Ročkai, P., Baranová, Z., Mrázek, J., Kejstová, K., Barnat, J.: Reproducible execution of POSIX programs with DiOS. In: Ölveczky, P.C., Salaün, G. (eds.) SEFM 2019. LNCS, vol. 11724, pp. 333–349. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30446-1_18
Chapter Google Scholar
Sinz, C., Merz, F., Falke, S.: LLBMC: a bounded model checker for LLVM’s intermediate representation. In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 542–544. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28756-5_44
Chapter Google Scholar
The LLVM Project. LLVM Link Time Optimization (2019). https://www.llvm.org/docs/LinkTimeOptimization.html
Thompson, S., Brat, G.: Verification of C++ flight software with the MCP model checker. In: 2008 IEEE Aerospace Conference, pp. 1–9 (2008). https://doi.org/10.1109/AERO.2008.4526577

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Zuzana Baranová & Petr Ročkai

Authors

Zuzana Baranová
View author publications
You can also search for this author in PubMed Google Scholar
Petr Ročkai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petr Ročkai .

Editor information

Editors and Affiliations

McMaster University, Hamilton, ON, Canada
Emil Sekerinski
University of Porto, Porto, Portugal
Nelma Moreira
University of Minho, Braga, Portugal
José N. Oliveira
Argo Ai, Munich, Germany
Daniel Ratiu
University of Pisa, Pisa, Italy
Riccardo Guidotti
University of Liverpool, Liverpool, UK
Marie Farrell
University of Liverpool, Liverpool, UK
Matt Luckcuck
University of Exeter, Exeter, UK
Diego Marmsoler
University of Minho, Braga, Portugal
José Campos
University of Newcastle, Newcastle upon Tyne, UK
Troy Astarte
Claude Bernard University, Lyon, France
Laure Gonnord
Nazarbayev University, Nur-Sultan, Kazakhstan
Antonio Cerone
University of Surrey, Guildford, UK
Luis Couto
University of Surrey, Guildford, UK
Brijesh Dongol
University of Giessen, Giessen, Germany
Martin Kutrib
University of Lisbon, Lisbon, Portugal
Pedro Monteiro
Airbus Operations S.A.S., Toulouse, France
David Delmas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baranová, Z., Ročkai, P. (2020). Compiling C and C++ Programs for Dynamic White-Box Analysis. In: Sekerinski, E., et al. Formal Methods. FM 2019 International Workshops. FM 2019. Lecture Notes in Computer Science(), vol 12232. Springer, Cham. https://doi.org/10.1007/978-3-030-54994-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-54994-7_4
Published: 13 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54993-0
Online ISBN: 978-3-030-54994-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics