skip to main content
10.1145/3377813.3381369acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Debugging crashes using continuous contrast set mining

Published: 18 September 2020 Publication History

Abstract

Facebook operates a family of services used by over two billion people daily on a huge variety of mobile devices. Many devices are configured to upload crash reports should the app crash for any reason. Engineers monitor and triage millions of crash reports logged each day to check for bugs, regressions, and any other quality problems. Debugging groups of crashes is a manually intensive process that requires deep domain expertise and close inspection of traces and code, often under time constraints.
We use contrast set mining, a form of discriminative pattern mining, to learn what distinguishes one group of crashes from another. Prior works focus on discretization to apply contrast mining to continuous data. We propose the first direct application of contrast learning to continuous data, without the need for discretization. We also define a weighted anomaly score that unifies continuous and categorical contrast sets while mitigating bias, as well as uncertainty measures that communicate confidence to developers. We demonstrate the value of our novel statistical improvements by applying it on a challenging dataset from Facebook production logs, where we achieve 40x speedup over baseline approaches using discretization.

References

[1]
Stephen D. Bay. 2000. Multivariate Discretization of Continuous Variables for Set Mining. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '00). ACM, New York, NY, USA, 315--319.
[2]
Stephen D. Bay and Michael J. Pazzani. 1999. Detecting Change in Categorical Data: Mining Contrast Sets. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '99). ACM, New York, NY, USA, 302--306.
[3]
Joshua Charles Campbell, Eddie Antonio Santos, and Abram Hindle. 2016. The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR '16). ACM, New York, NY, USA, 269--280.
[4]
Marco Castelluccio, Carlo Sansone, Luisa Verdoliva, and Giovanni Poggi. 2017. Automatically Analyzing Groups of Crashes for Finding Correlations. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 717--726.
[5]
Jacob Cohen. 1992. Statistical Power Analysis. Current Directions in Psychological Science 1, 3 (1992), 98--101.
[6]
Tejinder Dhaliwal, Foutse Khomh, and Ying Zou. 2011. Classifying field crash reports for fixing bugs: A case study of Mozilla Firefox. IEEE International Conference on Software Maintenance, ICSM, 333--342.
[7]
L. Fan, T. Su, S. Chen, G. Meng, Y. Liu, L. Xu, G. Pu, and Z. Su. 2018. Large-Scale Analysis of Framework-Specific Exceptions in Android Apps. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 408--419.
[8]
Shivani Rao and Avinash Kak. 2011. Retrieval from Software Libraries for Bug Localization: A Comparative Study of Generic and Composite Text Models. In Proceedings of the 8th Working Conference on Mining Software Repositories (MSR '11). ACM, New York, NY, USA, 43--52.
[9]
Stephen Robertson. 2004. Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Documentation 60 (2004).
[10]
Mondelle Simeon and Robert Hilderman. 2008. Categorical Proportional Difference: A Feature Selection Method for Text Categorization. In Proceedings of the 7th Australasian Data Mining Conference - Volume 87 (AusDM '08). Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 201--208. http://dl.acm.org/citation.cfm?id=2449288.2449320
[11]
Geoffrey I. Webb, Shane Butler, and Douglas Newlands. 2003. On Detecting Differences Between Groups. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '03). ACM, New York, NY, USA, 256--265.
[12]
Rongxin Wu, Ming Wen, Shing-Chi Cheung, and Hongyu Zhang. 2018. Change-Locator: Locate Crash-inducing Changes Based on Crash Reports. In Empirical Software Engineering 23 (ESE 2018). ACM, New York, NY, USA, 2866--2900.
[13]
Gangyi Zhu, Yi Wang, and Gagan Agrawal. 2015. SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management (SSDBM '15). ACM, New York, NY, USA, Article 38, 6 pages.

Cited By

View all
  • (2023)CONAN: Diagnosing Batch Failures for Cloud Systems2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP58684.2023.00018(138-149)Online publication date: May-2023
  • (2023)Mining Contrast Rules in a Sleep Apnea Dataset2023 IEEE 12th Global Conference on Consumer Electronics (GCCE)10.1109/GCCE59613.2023.10315646(1057-1060)Online publication date: 10-Oct-2023
  • (2021)Explaining mispredictions of machine learning models using rule inductionProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468614(716-727)Online publication date: 20-Aug-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE-SEIP '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice
June 2020
258 pages
ISBN:9781450371230
DOI:10.1145/3377813
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

  • KIISE: Korean Institute of Information Scientists and Engineers
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. contrast set mining
  2. crash analysis
  3. descriptive rules
  4. emerging patterns
  5. multiple hypothesis testing
  6. rule learning
  7. subgroup discovery

Qualifiers

  • Research-article

Conference

ICSE '20
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)CONAN: Diagnosing Batch Failures for Cloud Systems2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP58684.2023.00018(138-149)Online publication date: May-2023
  • (2023)Mining Contrast Rules in a Sleep Apnea Dataset2023 IEEE 12th Global Conference on Consumer Electronics (GCCE)10.1109/GCCE59613.2023.10315646(1057-1060)Online publication date: 10-Oct-2023
  • (2021)Explaining mispredictions of machine learning models using rule inductionProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468614(716-727)Online publication date: 20-Aug-2021
  • (2021)Checking LTL[F,G,X] on compressed traces in polynomial timeProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468557(131-143)Online publication date: 20-Aug-2021
  • (2021)AI in Software Engineering at FacebookIEEE Software10.1109/MS.2021.306166438:4(52-61)Online publication date: 18-Jun-2021
  • (2021)Scalable statistical root cause analysis on app telemetryProceedings of the 43rd International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP52600.2021.00038(288-297)Online publication date: 25-May-2021
  • (2021)Industry-scale IR-based bug localizationProceedings of the 43rd International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP52600.2021.00028(188-197)Online publication date: 25-May-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media