skip to main content
10.1145/3650212.3685313acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

PolyTracker: Whole-Input Dynamic Information Flow Tracing

Published: 11 September 2024 Publication History

Abstract

We present PolyTracker, a whole-program, whole-input dynamic information flow tracing (DIFT) framework. Given an LLVM compatible codebase or a binary that has been lifted to LLVM intermediate representation (IR), PolyTracker compiles it, adding static instrumentation. The instrumented program will run normally with modest performance overhead, but will additionally output a runtime trace artifact in the co-designed TDAG (Tainted Directed Acyclic Graph) format. TDAGs can be post-processed for a variety of analyses, including tracing every input byte through program execution. TDAGs can be generated either by running the program over a corpus of inputs or by employing a randomized input generator such as a fuzzer. PolyTracker traces (TDAGs) are useful not only for very localized, targeted dynamic program analysis as with smaller-scale DIFT: TDAGs are primarily intended for whole-program runtime exploration and bug finding, granular information-flow diffing between program runs, and comparisons of implementations of the same input specification without any need to emulate and instrument the entire running environment. For user-friendliness and reproducibility, the software repository provides a number of examples of PolyTracker-instrumented builds of popular open-source software projects. We also provide an analysis library and REPL written in Python that are designed to assist users with operating over TDAGs.

References

[1]
Simon Aarons and David Buchanan. 2023. CVE-2023-21036. https://nvd.nist.gov/vuln/detail/CVE-2023-21036
[2]
Ange Albertini. 2015. Abusing file formats; or, Corkami, the Novella. The International Journal of Proof of Concept or GTFO, 0x07, 6 (2015), March, 18–41.
[3]
Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the Annual USENIX Technical Conference (ATEC ’05). USENIX Association, USA. 41.
[4]
Henrik Brodin. 2023. #528: Missing support for YCbCr601. https://github.com/mdaus/nitro/issues/528
[5]
Henrik Brodin. 2023. How to avoid the aCropalypse. https://blog.trailofbits.com/2023/03/30/acropalypse-polytracker-blind-spots/
[6]
Henrik Brodin, Evan A. Sultanik, and Marek Surovič. 2023. Blind Spots: Identifying Exploitable Program Inputs. In Proceedings of the Eighth Workshop on Language-Theoretic Security at IEEE Security and Privacy.
[7]
Peng Chen and Hao Chen. 2018. Angora: Efficient Fuzzing by Principled Search. In Proceedings of the IEEE Symposium on Security and Privacy. 711–725. https://doi.org/10.1109/SP.2018.00046
[8]
LLVM Contributors. 2020. DataFlowSanitizer. https://clang.llvm.org/docs/DataFlowSanitizer.html Accessed: 2020-07-26
[9]
LLVM Contributors. 2021. [DFSan] Change shadow and origin memory layouts to match MSan. LLVM commit 45f6d5522f8d,. https://reviews.llvm.org/D104896?id=354633 Accessed: 2020-07-26
[10]
Douglas Crockford. 2006. The application/json Media Type for JavaScript Object Notation (JSON). RFC 4627. https://doi.org/10.17487/RFC4627
[11]
Ali Davanian, Zhenxiao Qi, Yu Qu, and Heng Yin. 2019. DECAF++: Elastic Whole-System Dynamic Taint Analysis. In 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019). USENIX Association, Chaoyang District, Beijing. 31–45. isbn:978-1-939133-07-6
[12]
Brendan Dolan-Gavitt, Josh Hodosh, Patrick Hulin, Tim Leek, and Ryan Whelan. 2015. Repeatable reverse engineering with PANDA. In Proceedings of the 5th Program Protection and Reverse Engineering Workshop. 1–11.
[13]
Standard ECMA-404. 2017. The JSON Data Interchange Format. https://ecma-international.org/publications-and-standards/standards/ecma-404/
[14]
Andrea Fioraldi, Dominik Maier, Heiko Eiß feldt, and Marc Heuse. 2020. afl++: Combining incremental steps of fuzzing research. In 14th USENIX Workshop on Offensive Technologies (WOOT 20).
[15]
Galois. 2023. Format Analysis Workbench (FAW). https://github.com/galoisinc/faw
[16]
Carson Harmon, Bradford Larsen, and Evan A. Sultanik. 2020. Toward Automated Grammar Extraction via Semantic Labeling of Parser Implementations. In Proceedings of the Sixth Workshop on Language-Theoretic Security at IEEE Security and Privacy.
[17]
Kelly Kaoudis, Henrik Brodin, and Evan A. Sultanik. 2023. Automatically Detecting Variability Bugs Through Hybrid Control and Data Flow Analysis. In Proceedings of the Eighth Workshop on Language-Theoretic Security at IEEE Security and Privacy.
[18]
Ahmed Karaman. 2020. Measuring QEMU Emulation Efficiency. https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Measuring-QEMU-Emulation-Efficiency/ Accessed: 2020-07-26
[19]
Vasileios P Kemerlis, Georgios Portokalidis, Kangkook Jee, and Angelos D Keromytis. 2012. libdft: Practical dynamic data flow tracking for commodity systems. In Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments. 121–132.
[20]
W. M. Khoo. 2020. Taintgrind: a Valgrind taint analysis tool. https://github.com/wmkhoo/taintgrind/releases/tag/v3.14.0
[21]
Junhyoung Kim, TaeGuen Kim, and Eul Gyu Im. 2014. Survey of dynamic taint analysis. In 2014 4th IEEE International Conference on Network Infrastructure and Digital Content. 269–272. https://doi.org/10.1109/ICNIDC.2014.7000307
[22]
Falcon Momot, Sergey Bratus, Sven M Hallberg, and Meredith L Patterson. 2016. The seven turrets of babel: A taxonomy of langsec errors and how to expunge them. In 2016 IEEE Cybersecurity Development (SecDev). 45–52.
[23]
J. Daniel Smith. 2022. NITRO. NITRO (NITFio, "R" is a ligature for "Fi") is a full-fledged, extensible library solution for reading and writing the National Imagery Transmission Format (NITF), a U.S. DoD standard format. Available from https://github.com/mdaus/nitro/releases/tag/NITRO-2.11.2
[24]
SRI International. 2022. WLLVM: Whole Program LLVM in Go. https://github.com/SRI-CSL/gllvm/releases/tag/v1.3.1
[25]
William Woodruff. 2023. Blight: A framework for instrumenting build tools. https://github.com/trailofbits/blight/releases/tag/v0.0.53

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis
September 2024
1928 pages
ISBN:9798400706127
DOI:10.1145/3650212
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DIFT
  2. DTA
  3. Dynamic information flow tracing
  4. data formats
  5. dynamic taint analysis
  6. file formats
  7. universal taint analysis

Qualifiers

  • Research-article

Funding Sources

  • Defense Advanced Research Projects Agency

Conference

ISSTA '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 145
    Total Downloads
  • Downloads (Last 12 months)145
  • Downloads (Last 6 weeks)10
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media