research-article

PolyTracker: Whole-Input Dynamic Information Flow Tracing

Authors:

Marek Surovič,

Facundo Tuesca,

Joseph Sweeney,

Bradford LarsenAuthors Info & Claims

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 1841 - 1845

https://doi.org/10.1145/3650212.3685313

Published: 11 September 2024 Publication History

Abstract

We present PolyTracker, a whole-program, whole-input dynamic information flow tracing (DIFT) framework. Given an LLVM compatible codebase or a binary that has been lifted to LLVM intermediate representation (IR), PolyTracker compiles it, adding static instrumentation. The instrumented program will run normally with modest performance overhead, but will additionally output a runtime trace artifact in the co-designed TDAG (Tainted Directed Acyclic Graph) format. TDAGs can be post-processed for a variety of analyses, including tracing every input byte through program execution. TDAGs can be generated either by running the program over a corpus of inputs or by employing a randomized input generator such as a fuzzer. PolyTracker traces (TDAGs) are useful not only for very localized, targeted dynamic program analysis as with smaller-scale DIFT: TDAGs are primarily intended for whole-program runtime exploration and bug finding, granular information-flow diffing between program runs, and comparisons of implementations of the same input specification without any need to emulate and instrument the entire running environment. For user-friendliness and reproducibility, the software repository provides a number of examples of PolyTracker-instrumented builds of popular open-source software projects. We also provide an analysis library and REPL written in Python that are designed to assist users with operating over TDAGs.

References

[1]

Simon Aarons and David Buchanan. 2023. CVE-2023-21036. https://nvd.nist.gov/vuln/detail/CVE-2023-21036

[2]

Ange Albertini. 2015. Abusing file formats; or, Corkami, the Novella. The International Journal of Proof of Concept or GTFO, 0x07, 6 (2015), March, 18–41.

[3]

Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the Annual USENIX Technical Conference (ATEC ’05). USENIX Association, USA. 41.

Digital Library

[4]

Henrik Brodin. 2023. #528: Missing support for YCbCr601. https://github.com/mdaus/nitro/issues/528

[5]

Henrik Brodin. 2023. How to avoid the aCropalypse. https://blog.trailofbits.com/2023/03/30/acropalypse-polytracker-blind-spots/

[6]

Henrik Brodin, Evan A. Sultanik, and Marek Surovič. 2023. Blind Spots: Identifying Exploitable Program Inputs. In Proceedings of the Eighth Workshop on Language-Theoretic Security at IEEE Security and Privacy.

[7]

Peng Chen and Hao Chen. 2018. Angora: Efficient Fuzzing by Principled Search. In Proceedings of the IEEE Symposium on Security and Privacy. 711–725. https://doi.org/10.1109/SP.2018.00046

[8]

LLVM Contributors. 2020. DataFlowSanitizer. https://clang.llvm.org/docs/DataFlowSanitizer.html Accessed: 2020-07-26

[9]

LLVM Contributors. 2021. [DFSan] Change shadow and origin memory layouts to match MSan. LLVM commit 45f6d5522f8d,. https://reviews.llvm.org/D104896?id=354633 Accessed: 2020-07-26

[10]

Douglas Crockford. 2006. The application/json Media Type for JavaScript Object Notation (JSON). RFC 4627. https://doi.org/10.17487/RFC4627

Digital Library

[11]

Ali Davanian, Zhenxiao Qi, Yu Qu, and Heng Yin. 2019. DECAF++: Elastic Whole-System Dynamic Taint Analysis. In 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019). USENIX Association, Chaoyang District, Beijing. 31–45. isbn:978-1-939133-07-6

[12]

Brendan Dolan-Gavitt, Josh Hodosh, Patrick Hulin, Tim Leek, and Ryan Whelan. 2015. Repeatable reverse engineering with PANDA. In Proceedings of the 5th Program Protection and Reverse Engineering Workshop. 1–11.

Digital Library

[13]

Standard ECMA-404. 2017. The JSON Data Interchange Format. https://ecma-international.org/publications-and-standards/standards/ecma-404/

[14]

Andrea Fioraldi, Dominik Maier, Heiko Eiß feldt, and Marc Heuse. 2020. afl++: Combining incremental steps of fuzzing research. In 14th USENIX Workshop on Offensive Technologies (WOOT 20).

[15]

Galois. 2023. Format Analysis Workbench (FAW). https://github.com/galoisinc/faw

[16]

Carson Harmon, Bradford Larsen, and Evan A. Sultanik. 2020. Toward Automated Grammar Extraction via Semantic Labeling of Parser Implementations. In Proceedings of the Sixth Workshop on Language-Theoretic Security at IEEE Security and Privacy.

[17]

Kelly Kaoudis, Henrik Brodin, and Evan A. Sultanik. 2023. Automatically Detecting Variability Bugs Through Hybrid Control and Data Flow Analysis. In Proceedings of the Eighth Workshop on Language-Theoretic Security at IEEE Security and Privacy.

[18]

Ahmed Karaman. 2020. Measuring QEMU Emulation Efficiency. https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Measuring-QEMU-Emulation-Efficiency/ Accessed: 2020-07-26

[19]

Vasileios P Kemerlis, Georgios Portokalidis, Kangkook Jee, and Angelos D Keromytis. 2012. libdft: Practical dynamic data flow tracking for commodity systems. In Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments. 121–132.

Digital Library

[20]

W. M. Khoo. 2020. Taintgrind: a Valgrind taint analysis tool. https://github.com/wmkhoo/taintgrind/releases/tag/v3.14.0

[21]

Junhyoung Kim, TaeGuen Kim, and Eul Gyu Im. 2014. Survey of dynamic taint analysis. In 2014 4th IEEE International Conference on Network Infrastructure and Digital Content. 269–272. https://doi.org/10.1109/ICNIDC.2014.7000307

[22]

Falcon Momot, Sergey Bratus, Sven M Hallberg, and Meredith L Patterson. 2016. The seven turrets of babel: A taxonomy of langsec errors and how to expunge them. In 2016 IEEE Cybersecurity Development (SecDev). 45–52.

[23]

J. Daniel Smith. 2022. NITRO. NITRO (NITFio, "R" is a ligature for "Fi") is a full-fledged, extensible library solution for reading and writing the National Imagery Transmission Format (NITF), a U.S. DoD standard format. Available from https://github.com/mdaus/nitro/releases/tag/NITRO-2.11.2

[24]

SRI International. 2022. WLLVM: Whole Program LLVM in Go. https://github.com/SRI-CSL/gllvm/releases/tag/v1.3.1

[25]

William Woodruff. 2023. Blight: A framework for instrumenting build tools. https://github.com/trailofbits/blight/releases/tag/v0.0.53

Index Terms

PolyTracker: Whole-Input Dynamic Information Flow Tracing
1. Security and privacy
  1. Software and application security
    1. Software reverse engineering
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software organization and properties
    1. Software functional properties
      1. Formal methods
        Dynamic analysis

Recommendations

A dynamic marking method for implicit information flow in dynamic taint analysis
SIN '15: Proceedings of the 8th International Conference on Security of Information and Networks

Dynamic taint analysis is an important technique for tracking information flow in software and it has been widely applied in the field of software testing, debugging and vulnerability detection. However, most of the dynamic taint analysis tools only ...
HardTaint: Production-Run Dynamic Taint Analysis via Selective Hardware Tracing

Dynamic taint analysis (DTA), as a fundamental analysis technique, is widely used in security, privacy, and diagnosis, etc. As DTA demands to collect and analyze massive taint data online, it suffers extremely high runtime overhead. Over the past decades,...
Polyglot, Label-Defined Dynamic Taint Analysis in TruffleTaint
MPLR '22: Proceedings of the 19th International Conference on Managed Programming Languages and Runtimes

Dynamic taint analysis assigns taint labels to sensitive data and tracks the propagation of such tainted data during program execution. This program analysis technique has been implemented in various analysis platforms targeting specific programming ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 2024

1928 pages

ISBN:9798400706127

DOI:10.1145/3650212

General Chair:
Maria Christakis
TU Wien, Austria
,
Program Chair:
Michael Pradel
University of Stuttgart, Germany

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Defense Advanced Research Projects Agency

Conference

ISSTA '24

Sponsor:

SIGSOFT

ISSTA '24: 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
145
Total Downloads

Downloads (Last 12 months)145
Downloads (Last 6 weeks)10

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten