research-article

Automatic Discovery and Synthesis of Checksum Algorithms from Binary Data Samples

Authors:

Jared Chandler,

Kathleen FisherAuthors Info & Claims

PLAS'20: Proceedings of the 15th Workshop on Programming Languages and Analysis for Security

Pages 25 - 34

https://doi.org/10.1145/3411506.3417599

Published: 09 November 2020 Publication History

Abstract

Reverse engineering unknown binary message formats is an important part of security research. Error detecting codes such as checksums and Cyclic Redundancy Check codes (CRCs) are commonly added to messages as a guard against corrupt or untrusted input. Before an analyst can manufacture input for software which uses checksums they must discover the algorithm to calculate a valid checksum. To address this need, we have developed a program synthesis based approach for detecting and reverse-engineering checksum algorithms automatically.

Our approach takes a small set of binary messages as input and automatically returns a Python implementation of the checksum algorithm if one can be found. Our approach first performs a search over the message space to identify the location of the checksum and then uses program synthesis to identify the operations performed on the message to compute the checksum. We return to the user runnable code to both calculate a checksum from a message and to validate a message according to the checksum algorithm. We generate unit tests, allowing the user to validate the synthesized checksum algorithm is correct with regard to the input messages.

We created the Tufts Checksum Corpus comprised of 12 checksum inference questions collected from posts on reverse engineering question and answer sites and 2 instances of common internet protocol checksums.

Our approach successfully synthesized the underlying checksum algorithms for 12 out of 14 cases in our test suite.

References

[1]

Gregory Cook. 2020. CRC RevEng. Retrieved June 23, 2020 from http://reveng.sourceforge.net/

[2]

Weidong Cui, Marcus Peinado, Karl Chen, Helen J Wang, and Luis Irun-Briz. 2008. Tupni: Automatic reverse engineering of input formats. In Proceedings of the 15th ACM conference on Computer and communications security. 391--402.

Digital Library

[3]

Stack Exchange. 2014. Guessing CRC checksum algorithm. Retrieved June 23, 2020 from https://reverseengineering.stackexchange.com/questions/4460

[4]

Stack Exchange. 2014. Reversing simple message + checksum pairs (32 bytes). Retrieved June 23, 2020 from https://reverseengineering.stackexchange.com/questions/6927

[5]

Stack Exchange. 2020. Reverse Engineering Stack Exchange. Retrieved June 23, 2020 from https://reverseengineering.stackexchange.com

[6]

Kathleen Fisher and Robert Gruber. 2005. PADS: a domain-specific language for processing ad hoc data. ACM Sigplan Notices 40, 6 (2005), 295--304.

Digital Library

[7]

Peter D Grünwald and Abhijit Grunwald. 2007. The minimum description length principle. MIT press.

[8]

Sumit Gulwani. 2010. Dimensions in program synthesis. In Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming. 13--24.

Digital Library

[9]

Fred Halsall. 1995. Data communications, computer networks and open systems. Addison Wesley Longman Publishing Co., Inc. 102--112 pages.

[10]

Stephan Kleber, Lisa Maile, and Frank Kargl. 2018. Survey of protocol reverse engineering algorithms: Decomposition of tools for static traffic analysis. IEEE Communications Surveys & Tutorials 2018 (2018).

[11]

Zohar Manna and Richard Waldinger. 1980. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems (TOPLAS) 2, 1 (1980), 90--121.

Digital Library

[12]

John Narayan, Sandeep K Shukla, and T Charles Clancy. 2015. A survey of automatic protocol reverse engineering tools. ACM Computing Surveys (CSUR) 48, 3 (2015), 1--26.

Digital Library

[13]

Stack Overflow. 2020. Stack Overflow. Retrieved June 23, 2020 from https://stackoverflow.com/

[14]

Larry L Peterson and Bruce S Davie. 2007. Computer networks: a systems approach. Elsevier. 93--101 pages.

[15]

Johannes Pohl and Andreas Noack. 2019. Automatic wireless protocol reverse engineering. In 13th {USENIX} Workshop on Offensive Technologies (WOOT 19).

[16]

John Postel. 1981. Internet Control Message Protocol; RFC792. ARPANETWorking Group Requests for Comments 792 (1981).

[17]

Jon Postel. 1990. RFC 791: Internet Protocol, September 1981. Darpa Internet Protocol Specification (1990).

[18]

Edward J Schwartz, Thanassis Avgerinos, and David Brumley. 2010. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In 2010 IEEE symposium on Security and privacy. IEEE, 317--331.

Digital Library

[19]

C. E. Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (July 1948), 379--423. https://doi.org/10.1002/j.1538--7305.1948.tb01338.x

[20]

Michael Sutton, Adam Greene, and Pedram Amini. 2007. Fuzzing: brute force vulnerability discovery. Pearson Education.

[21]

David Wagner and R Dean. 2000. Intrusion detection via static analysis. In Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001. IEEE, 156--168.

[22]

Tielei Wang, Tao Wei, Guofei Gu, and Wei Zou. 2010. TaintScope: A checksumaware directed fuzzing tool for automatic software vulnerability detection. In 2010 IEEE Symposium on Security and Privacy. IEEE, 497--512.

Digital Library

Cited By

Index Terms

Automatic Discovery and Synthesis of Checksum Algorithms from Binary Data Samples
1. Security and privacy

Recommendations

Automatic correction of RTL designs using a lightweight partial high level synthesis
Abstract
Correction of the digital designs have emerged as a major bottleneck at Register Transfer Level (RTL) due to the growing complexity of the digital systems and shortening time-to-market. Existing automated correction methods face ...
Highlights
- A lightweight partial high level synthesis mechanism to correct RTL designs.
- ...
Analysis of Checksums, Extended-Precision Checksums, and Cyclic Redundancy Checks

The effectiveness of extended-precision checksums is thoroughly analyzed. It is demonstrated that the extended-precision checksums most effectively exploit natural redundancy occurring in program codes. Honeywell checksums and cyclic redundancy checks ...
Using Checksums to Detect Number Entry Error
CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Number entry is a common task in many domains. In safety-critical environments such as air traffic control or on hospital wards, incorrect number entry can have serious harmful consequences. Research has investigated how interface designs can help ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLAS'20: Proceedings of the 15th Workshop on Programming Languages and Analysis for Security

November 2020

46 pages

ISBN:9781450380928

DOI:10.1145/3411506

Program Chairs:
Alley Stoughton
Boston University, USA
,
Marco Vassena
CISPA, Germany

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Defense Advanced Research Projects Agency (DARPA)
Air Force Research Laboratory (AFRL)

Conference

CCS '20

Sponsor:

SIGSAC

CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security

November 13, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 43 of 77 submissions, 56%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
380
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten