Skip to main content

Compiling C and C++ Programs for Dynamic White-Box Analysis

  • Conference paper
  • First Online:
  • 653 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12232))

Abstract

Building software packages from source is a complex and highly technical process. For this reason, most software comes with build instructions which have both a human-readable and an executable component. The latter in turn requires substantial infrastructure, which helps software authors deal with two major sources of complexity: first, generation and management of various build artefacts and their dependencies, and second, the differences between platforms, compiler toolchains and build environments.

This poses a significant problem for white-box analysis tools, which often require that the source code of the program under test is compiled into an intermediate format, like the LLVM IR. In this paper, we present divcc, a drop-in replacement for C and C++ compilation tools which transparently fits into existing build tools and software deployment solutions. Additionally, divcc generates intermediate and native code in a single pass, ensuring that the final executable is built from the intermediate code that is being analysed.

This work has been partially supported by the Czech Science Foundation grant No. 18-02177S.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Of course, tools which work with machine code, known as black-box tools, do exist, but their use in software development is limited – they are mainly used in software forensics. In this paper, we focus on white-box methods, which work with source code or an intermediate representation thereof.

  2. 2.

    A separate (often third-party) software package which needs to be installed in the system before the build can proceed – usually a library, sometimes a tool used in the build process.

  3. 3.

    Not doing so could lead to configuration mismatches between the two compilers causing build failures, or worse, miscompilation.

  4. 4.

    This is true even in cases where such tools can work with partial programs – i.e. programs which use functions whose definitions are not available to the tool; however, this mode of operation negatively affects the precision of the analysis.

  5. 5.

    And subsequently also in static libraries, which on POSIX systems are simply archives of object files.

  6. 6.

    Source code & supplementary material at https://divine.fi.muni.cz/2019/divcc.

  7. 7.

    Most build systems also attempt to speed up repeated builds by avoiding re-compiling files that are unchanged (and whose dependencies also remain up-to-date). This capability is important during development and testing, though of course it adds further complexity to the process.

  8. 8.

    We use a section named .llvmbc, which is the same as the LTO subsystem. This section is recognized by some LLVM tools and is the closest there is to a ‘standard’ way to embed bitcode in object files.

  9. 9.

    This is important with e.g. libraries provided by DiOS, which are normally only compiled into bitcode and packaged into bitcode archives.

  10. 10.

    Unfortunately, libpthread, which is also provided by DiOS, is not yet ABI compatible with the host version – see also Sect. 4.2.

  11. 11.

    This decision may be reversed in a later version, if the situation with support for shared libraries in analysis tools improves.

  12. 12.

    In some cases, it may be possible to reconstruct platform-neutral LLVM IR using the Remill decompilation library. This is especially pertinent to legacy software which may use inline assembly in applications which would be better served with compiler built-in functions. We will investigate using Remill in this capacity as an option in the future.

  13. 13.

    This is the reason for zero build time of Eigen for all compilers.

References

  1. Abdulla, P.A., Aronis, S., Atig, M.F., Jonsson, B., Leonardsson, C., Sagonas, K.: Stateless model checking for TSO and PSO. Acta Informatica 54(8), 789–818 (2016). https://doi.org/10.1007/s00236-016-0275-0

    Article  MathSciNet  MATH  Google Scholar 

  2. Baranová, Z., et al.: Model checking of C and C++ with DIVINE 4 (2017)

    Google Scholar 

  3. Cadar, C., Dunbar, D., Engler, D.R.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: OSDI, pp. 209–224. USENIX Association (2008)

    Google Scholar 

  4. Chalupa, M., Vitovská, M., Jonáš, M., Slaby, J., Strejček, J.: Symbiotic 4: beyond reachability. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 385–389. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_28

    Chapter  Google Scholar 

  5. Günther, H., Laarman, A., Weissenbacher, G.: Vienna verification tool: IC3 for parallel software (competition contribution). In: TACAS, pp. 954–957 (2016). https://doi.org/10.1007/978-3-662-49674-9_69

  6. Kokologiannakis, M., Lahav, O., Sagonas, K., Vafeiadis, V.: Effective stateless model checking for C/C++ concurrency. Proc. ACM Program. Lang. 2(POPL), 17:1–17:32 (2017). https://doi.org/10.1145/3158105

  7. Kremenek, T., et al.: Scan-build (2009). https://clang-analyzer.llvm.org/scan-build.html

  8. Kroening, D., Tautschnig, M.: CBMC – C bounded model checker. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 389–391. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_26

    Chapter  Google Scholar 

  9. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: PLDI (2007)

    Google Scholar 

  10. Ročkai, P., Baranová, Z., Mrázek, J., Kejstová, K., Barnat, J.: Reproducible execution of POSIX programs with DiOS. In: Ölveczky, P.C., Salaün, G. (eds.) SEFM 2019. LNCS, vol. 11724, pp. 333–349. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30446-1_18

    Chapter  Google Scholar 

  11. Sinz, C., Merz, F., Falke, S.: LLBMC: a bounded model checker for LLVM’s intermediate representation. In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 542–544. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28756-5_44

    Chapter  Google Scholar 

  12. The LLVM Project. LLVM Link Time Optimization (2019). https://www.llvm.org/docs/LinkTimeOptimization.html

  13. Thompson, S., Brat, G.: Verification of C++ flight software with the MCP model checker. In: 2008 IEEE Aerospace Conference, pp. 1–9 (2008). https://doi.org/10.1109/AERO.2008.4526577

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petr Ročkai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baranová, Z., Ročkai, P. (2020). Compiling C and C++ Programs for Dynamic White-Box Analysis. In: Sekerinski, E., et al. Formal Methods. FM 2019 International Workshops. FM 2019. Lecture Notes in Computer Science(), vol 12232. Springer, Cham. https://doi.org/10.1007/978-3-030-54994-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-54994-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-54993-0

  • Online ISBN: 978-3-030-54994-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics