research-article

Semantic crash bucketing

Authors:

Rijnard van Tonder,

John Kotheimer,

Claire Le GouesAuthors Info & Claims

ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Pages 612 - 622

https://doi.org/10.1145/3238147.3238200

Published: 03 September 2018 Publication History

Abstract

Precise crash triage is important for automated dynamic testing tools, like fuzzers. At scale, fuzzers produce millions of crashing inputs. Fuzzers use heuristics, like stack hashes, to cut down on duplicate bug reports. These heuristics are fast, but often imprecise: even after deduplication, hundreds of uniquely reported crashes can still correspond to the same bug. Remaining crashes must be inspected manually, incurring considerable effort. In this paper we present Semantic Crash Bucketing, a generic method for precise crash bucketing using program transformation. Semantic Crash Bucketing maps crashing inputs to unique bugs as a function of changing a program (i.e., a semantic delta). We observe that a real bug fix precisely identifies crashes belonging to the same bug. Our insight is to approximate real bug fixes with lightweight program transformation to obtain the same level of precision. Our approach uses (a) patch templates and (b) semantic feedback from the program to automatically generate and apply approximate fixes for general bug classes. Our evaluation shows that approximate fixes are competitive with using true fixes for crash bucketing, and significantly outperforms built-in deduplication techniques for three state of the art fuzzers.

References

[1]

2018.

[2]

https://github.com/google/ossfuzz. Online; accessed 26 April 2018. 2018.

[3]

https://www.cert.org/vulnerabilityanalysis/tools/bffdownload.cfm. Online; accessed 26 April, 2018. 2018.

[4]

https://github.com/google/honggfuzz. Online; accessed 26 April, 2018. 2018.

[5]

https://cve.mitre.org/. Online; accessed 26 April, 2018. 2018.

[6]

https://lcamtuf.blogspot.com/2015/04/findingbugsinsqliteeasyway. html. Online; accessed 26 April, 2018. 2018.

[7]

https://access.redhat.com/security/securityupdates/#/cve. Online; accessed 26 April, 2018. 2018. AFL-Fuzz. http://lcamtuf.coredump.cx/afl/. Online; accessed 26 April, 2018. 2018. CVE-2017-12762.

[8]

https://patchwork.kernel.org/patch/9880041/. Online; accessed 26 April, 2018. 2018.

[9]

Microsoft Security Risk Detection. https://www.microsoft.com/enus/ securityriskdetection/. Online; accessed 26 April, 2018.

[10]

Semantic Crash Bucketing ASE ’18, September 3–7, 2018, Montpellier, France 2018.

[11]

Public Vulnerabilities Discovered Using BFF. https://vuls.cert.org/ confluence/display/tools/Public+Vulnerabilities+Discovered+Using+BFF. Online; accessed 26 April, 2019.

[12]

Mohammad Amin Alipour, Alex Groce, Rahul Gopinath, and Arpit Christi. 2016.

[13]

Generating focused random tests using directed swarm testing. In International Symposium on Software Testing and Analysis (ISSTA ’16). 70–81.

Digital Library

[14]

Thanassis Avgerinos, Alexandre Rebert, Sang Kil Cha, and David Brumley. 2014.

[15]

Enhancing symbolic execution with veritesting. In International Conference on Software Engineering (ICSE ’14). 1083–1094.

Digital Library

[16]

Satish Chandra, Emina Torlak, Shaon Barman, and Rastislav Bodík. 2011. Angelic debugging. In International Conference on Software Engineering (ICSE ’11). 121– 130.

Digital Library

[17]

Yang Chen, Alex Groce, Chaoqiang Zhang, Weng-Keen Wong, Xiaoli Fern, Eric Eide, and John Regehr. 2013. Taming compiler fuzzers. In Conference on Programming Language Design and Implementation (PLDI ’13). 197–208.

Digital Library

[18]

Holger Cleve and Andreas Zeller. 2005. Locating causes of program failures. In International Conference on Software Engineering (ICSE ’05). 342–351.

Digital Library

[19]

Zack Coker and Munawar Hafiz. 2013. Program transformations to fix C integers. In International Conference on Software Engineering (ICSE ’13). 792–801.

Digital Library

[20]

Weidong Cui, Marcus Peinado, Sang Kil Cha, Yanick Fratantonio, and Vasileios P Kemerlis. 2016. RETracer: Triaging crashes by reverse execution from partial memory dumps. In International Conference on Software Engineering (ICSE ’16). 820–831.

Digital Library

[21]

Yingnong Dang, Rongxin Wu, Hongyu Zhang, Dongmei Zhang, and Peter Nobel. 2012. ReBucket: A method for clustering duplicate crash reports based on call stack similarity. In International Conference on Software Engineering (ICSE ’12). 1084–1093.

Digital Library

[22]

Vinod Ganapathy, Somesh Jha, David Chandler, David Melski, and David Vitek. 2003. Buffer overrun detection using linear programming and static analysis. In Conference on Computer and Communications Security (CCS ’03). 345–354.

Digital Library

[23]

Patrice Godefroid and Daniel Luchaup. 2011. Automatic partial loop summarization in dynamic test generation. In International Symposium on Software Testing and Analysis (ISSTA ’11). 23.

Digital Library

[24]

Denis Gopan, Evan Driscoll, Ducson Nguyen, Dimitri Naydich, Alexey Loginov, and David Melski. 2015. Data-Delineation in Software Binaries and its Application to Buffer-Overrun Discovery. In International Conference on Software Engineering (ICSE ’15). 145–155.

Digital Library

[25]

Rahul Gopinath, Carlos Jensen, and Alex Groce. 2017. The Theory of Composite Faults. In International Conference on Software Testing, Verification (ICST ’17). 47–57.

[26]

Alex Groce, Chaoqiang Zhang, Eric Eide, Yang Chen, and John Regehr. 2012.

[27]

Swarm testing. In International Symposium on Software Testing and Analysis (ISSTA ’12). 78–88.

Digital Library

[28]

Brian Hackett, Manuvir Das, Daniel Wang, and Zhe Yang. 2006. Modular checking for buffer overflows in the large. In International Conference on Software Engineering (ICSE ’06). 232–241.

Digital Library

[29]

James A. Jones and Mary Jean Harrold. 2005. Empirical evaluation of the tarantula automatic fault-localization technique. In International Conference on Automated Software Engineering (ASE ’05). 273–282.

Digital Library

[30]

Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013.

[31]

Automatic patch generation learned from human-written patches. In International Conference on Software Engineering (ICSE ’13). 802–811.

Digital Library

[32]

Shuvendu K. Lahiri, Rohit Sinha, and Chris Hawblitzel. 2015. Automatic Rootcausing for Program Equivalence Failures in Binaries. In Computer Aided Verification (CAV ’15). 362–379.

[33]

David Larochelle and David Evans. 2001.

[34]

Statically Detecting Likely Buffer Overflow Vulnerabilities. In USENIX Security Symposium.

Digital Library

[35]

Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In International Conference on Software Engineering (ICSE ’12). 3–13.

Digital Library

[36]

Claire Le Goues, Stephanie Forrest, and Westley Weimer. 2013. Current challenges in automatic software repair. Software Quality Journal 21, 3 (2013), 421–443.

Digital Library

[37]

Frank Li and Vern Paxson. 2017.

[38]

A Large-Scale Empirical Study of Security Patches. In Conference on Computer and Communications Security (CCS ’17). 2201–2215.

Digital Library

[39]

Ben Liblit, Mayur Naik, Alice X. Zheng, Alexander Aiken, and Michael I. Jordan. 2005.

[40]

Scalable statistical bug isolation. In Programming Language Design and Implementation (PLDI ’05). 15–26.

[41]

Zhiqiang Lin, Xuxian Jiang, Dongyan Xu, Bing Mao, and Li Xie. 2007. AutoPaG: towards automated software patch generation with source code root cause identification and repair. In Symposium on Information, Computer and Communications Security. 329–340.

Digital Library

[42]

Fan Long and Martin Rinard. 2016.

[43]

Automatic Patch Generation by Learning Correct Code. In Principles of Programming Languages (POPL ’16). 298–31.

Digital Library

[44]

Fan Long, Stelios Sidiroglou-Douskos, and Martin C. Rinard. 2014. Automatic runtime error repair and containment via recovery shepherding. In Conference on Programming Language Design and Implementation (PLDI ’14). 227–238.

Digital Library

[45]

Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016.

[46]

Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis. In International Conference on Software Engineering (ICSE ’16). 691–701.

Digital Library

[47]

D Molnar, XC Li, and DA Wagner. 2009. Dynamic test generation to find integer bugs in x86 binary linux programs. In USENIX Security Symposium. 67–82.

Digital Library

[48]

Paul Muntean, Vasantha Kommanapalli, Andreas Ibing, and Claudia Eckert. 2015.

[49]

Automated Generation of Buffer Overflow Quick Fixes Using Symbolic Execution and SMT. In Computer Safety, Reliability, and Security (SAFECOMP ’15). 441–456.

[50]

Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. SemFix: Program Repair via Semantic Analysis. International Conference on Software Engineering, 772–781.

Digital Library

[51]

Hui Peng, Yan Shoshitaishvili, and Mathias Payer. 2018.

[52]

T-Fuzz: fuzzing by program transformation. In IEEE Symposium on Security and Privacy.

[53]

Van-Thuan Pham, Sakaar Khurana, Subhajit Roy, and Abhik Roychoudhury. 2017.

[54]

Bucketing Failing Tests via Symbolic Analysis. In Fundamental Approaches to Software Engineering Conference (FASE ’17). 43–59.

Digital Library

[55]

Alexandre Rebert, Sang Kil Cha, Thanassis Avgerinos, Jonathan Foote, David Warren, Gustavo Grieco, and David Brumley. 2014. Optimizing Seed Selection for Fuzzing. In USENIX Security Symposium. 861–875.

Digital Library

[56]

Manos Renieris and Steven P. Reiss. 2003.

Digital Library

[57]

Fault Localization With Nearest Neighbor Queries. In International Conference on Automated Software Engineering (ASE ’03). 30–39.

Digital Library

[58]

Martin C Rinard, Cristian Cadar, Daniel Dumitran, Daniel M Roy, Tudor Leu, and William S Beebee. 2004. Enhancing Server Availability and Security Through Failure-Oblivious Computing. In OSDI, Vol. 4. 21–21.

Digital Library

[59]

Kostya Serebryany. 2017. OSS-Fuzz-Google’s continuous fuzzing service for open source software. In USENIX Security Symposium.

[60]

Mauricio Soto, Ferdian Thung, Chu-Pan Wong, Claire Le Goues, and David Lo. 2016. A deeper look into bug fixes: patterns, replacements, deletions, and additions. In International Conference on Mining Software Repositories (MSR ’16). 512–515.

Digital Library

[61]

Westley Weimer. 2006. Patches as better bug reports. In Generative Programming and Component Engineering (GPCE ’06). 181–190.

Digital Library

[62]

Maverick Woo, Sang Kil Cha, Samantha Gottlieb, and David Brumley. 2013.

[63]

Scheduling Black-box Mutational Fuzzing. In Conference on Computer & Communications Security (CCS ’13). 511–522.

Digital Library

Cited By

Qian CZhang MNie YLu SCao H(2023)A Survey on Bug Deduplication and Triage Methods from Multiple Points of ViewApplied Sciences10.3390/app1315878813:15(8788)Online publication date: 29-Jul-2023
https://doi.org/10.3390/app13158788
Gu JSun XZhang WJiang YWang CVaziri MLegunsen OXu TDruschel PKaufmann AMace JFlinn JSeltzer M(2023)Acto: Automatic End-to-End Testing for Operation Correctness of Cloud System ManagementProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613161(96-112)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613161
Tang XZhou HZhang MZhang YWu GLu HYu XTian Z(2023)Research on the Exploitability of Binary Software Vulnerabilities2023 IEEE 12th International Conference on Cloud Networking (CloudNet)10.1109/CloudNet59005.2023.10490070(403-407)Online publication date: 1-Nov-2023
https://doi.org/10.1109/CloudNet59005.2023.10490070
Show More Cited By

Index Terms

Semantic crash bucketing
1. Security and privacy
  1. Software and application security
    1. Software security engineering
2. Software and its engineering
  1. Software creation and management

Recommendations

Effective Bug Triage Based on Historical Bug-Fix Information
ISSRE '14: Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability Engineering

For complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to developers who have the expertise to fix the bugs. Many bug triage techniques have been ...
Utilizing a multi-developer network-based developer recommendation algorithm to fix bugs effectively
SAC '14: Proceedings of the 29th Annual ACM Symposium on Applied Computing

Recently, bug fixing has become an important part of software maintenance. In large-scale projects, developers rely on bug reports to guide any bug-fixing activities. Due to a great number of bug reports submitted into the bug repository, the workload ...
FixerCache: unsupervised caching active developers for diverse bug triage
ESEM '14: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Context: Bug triage aims to recommend appropriate developers for new bugs in order to reduce time and effort in bug resolution. Most previous approaches for bug triage are supervised. Before recommending developers, these approaches need to learn ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

September 2018

955 pages

ISBN:9781450359375

DOI:10.1145/3238147

General Chair:
Marianne Huchard
University of Montpellier, France
,
Program Chairs:
Christian Kästner
Carnegie Mellon University, USA
,
Gordon Fraser
University of Passau, Germany

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence
CNRS: Centre National De La Rechercue Scientifique
SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASE '18

Sponsor:

SIGAI
CNRS
SIGSOFT
IEEE-CS

ASE '18: 33rd ACM/IEEE International Conference on Automated Software Engineering

September 3 - 7, 2018

Montpellier, France

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
365
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)13

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qian CZhang MNie YLu SCao H(2023)A Survey on Bug Deduplication and Triage Methods from Multiple Points of ViewApplied Sciences10.3390/app1315878813:15(8788)Online publication date: 29-Jul-2023
https://doi.org/10.3390/app13158788
Gu JSun XZhang WJiang YWang CVaziri MLegunsen OXu TDruschel PKaufmann AMace JFlinn JSeltzer M(2023)Acto: Automatic End-to-End Testing for Operation Correctness of Cloud System ManagementProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613161(96-112)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613161
Tang XZhou HZhang MZhang YWu GLu HYu XTian Z(2023)Research on the Exploitability of Binary Software Vulnerabilities2023 IEEE 12th International Conference on Cloud Networking (CloudNet)10.1109/CloudNet59005.2023.10490070(403-407)Online publication date: 1-Nov-2023
https://doi.org/10.1109/CloudNet59005.2023.10490070
Kallingal Joshy ALe W(2022)FuzzerAid: Grouping Fuzzed Crashes Based On Fault SignaturesProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556959(1-12)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3551349.3556959
Song YXie XZhang XLiu QGao R(2022)Evolving Ranking-Based Failure Proximities for Better Clustering in Fault IsolationProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556922(1-13)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3551349.3556922
Jang DAskar AYun ITong SCai YKim T(2022)Fuzzing@Home: Distributed Fuzzing on Untrusted Heterogeneous ClientsProceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3545948.3545971(1-16)Online publication date: 26-Oct-2022
https://dl.acm.org/doi/10.1145/3545948.3545971
Zhang XChen JFeng CLi RDiao WZhang KLei JTang CDwyer MDamian DZeller A(2022)DeFaultProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3512760(635-646)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3512760
Shetty MBansal CNath SBowles SWang HArman OAhari SDwyer MDamian DZeller A(2022)DeepAnalyzeProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3512759(549-560)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3512759
Wu MJiang LXiang JHuang YCui HZhang LZhang YDwyer MDamian DZeller A(2022)One fuzzing strategy to rule them allProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510174(1634-1645)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510174
Zhang CChen BPeng XZhao WDwyer MDamian DZeller A(2022)BuildSheriffProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510132(312-324)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510132
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten