skip to main content
10.1145/2670979.2671004acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial

ReproLite: A Lightweight Tool to Quickly Reproduce Hard System Bugs

Published: 03 November 2014 Publication History

Abstract

Cloud systems have become ubiquitous today -- they are used to store and process the tremendous amounts of data being generated by Internet users. These systems run on hundreds of commodity machines, and have a huge amount of non-determinism (thousands of threads and hundreds of processes) in their execution. Therefore, bugs that occur in cloud systems are hard to understand, reproduce, and fix. The state-of-the-art of debugging in the industry is to log messages during execution, and refer to those messages later in case of errors. In ReproLite, we augment the already widespread process of debugging using logs by enabling testers to quickly and easily specify the conjectures that they form regarding the cause of an error (or bug) from execution logs, and to also automatically validate those conjectures.
ReproLite includes a Domain Specific Language (DSL) that allows testers to specify all aspects of a potential scenario (e.g., specific workloads, execution operations and their orders, environment non-determinism) that causes a given bug. Given such a scenario, ReproLite can enforce the conditions in the scenario during system execution. Potential buggy scenarios can also be automatically generated from a sequence of log messages that a tester believes indicates the cause of the bug. We have experimented ReproLite with 11 bugs from two popular cloud systems, Cassandra and HBase. We were able to reproduce all of the bugs using ReproLite. We report on our experience with using ReproLite on those bugs.

References

[1]
Am.nodedeleted and ssh races creating problems for regions under split. https://issues.apache.org/jira/browse/HBASE-6070.
[2]
Apache ZooKeeper. http://zookeeper.apache.org.
[3]
Cassandra. http://cassandra.apache.org/.
[4]
Catalog janitor logic bug causes region leackage. https://issues.apache.org/jira/browse/HBASE-4799.
[5]
drop/recreate column family race condition. https://issues.apache.org/jira/browse/CASSANDRA-1477.
[6]
Hadoop. http://hadoop.apache.org/.
[7]
HBase. http://hbase.apache.org/.
[8]
System dashboard - ASF JIRA. https://issues.apache.org/jira.
[9]
The Aspectj Project. http://www.eclipse.org/aspectj/.
[10]
Apache jmeter. http://jmeter.apache.org/, July 2014.
[11]
Hp - load runner. http://www8.hp.com/us/en/software-solutions/loadrunner-load-testing/, July 2014.
[12]
Hp - unified functional testing. http://www8.hp.com/us/en/software-solutions/unified-functional-testing-automation/, July 2014.
[13]
Selenium automates browsers. http://www.seleniumhq.org/, July 2014.
[14]
T. Elmas, J. Burnim, G. Necula, and K. Sen. Concurrit: A domain specific language for reproducing concurrency bugs. PLDI, 2013.
[15]
D. Geels, G. Altekar, P. Maniatis, T. Roscoe, and I. Stoica. Friday: Global comprehension for distributed replay. NSDI, 2007.
[16]
D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay debugging for distributed applications. ATC, 2006.
[17]
H. S. Gunawi, T. Do, P. Joshi, P. Alvaro, J. M. Hellerstein, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, K. Sen, and D. Borthakur. FATE and DESTINI: A framework for cloud recovery testing. In NSDI, 2011.
[18]
H. S. Gunawi, M. Hao, T. Leesatapornwongsa, T. Patana-anake, T. Do, J. Adityatama, K. J. Eliazar, A. Laksono, J. F. Lukman, V. Martin, and A. D. Satria. What bugs live in the cloud? a study of 3000+ issues in cloud systems. In ACM Symposium on Cloud Computing (SOCC) (to appear), 2014.
[19]
P. Joshi, M. Ganai, G. Balakrishnan, A. Gupta, and N. Papakonstantinou. Setsudo: Perturbation-based testing framework for scalable distributed systems. TRIOS: Conference on Timely Results in Operating Systems, 2013.
[20]
P. Joshi, H. S. Gunawi, and K. Sen. PREFAIL: A programmable tool for multiple-failure injection. In OOPSLA, pages 171--188. ACM, 2011.
[21]
K. H. Lee, N. Sumner, X. Zhang, and P. Eugster. Unified debugging of distributed systems with recon. DSN, 2011.
[22]
T. Leesatapornwongsa, M. Hao, P. Joshi, J. F. Lukman, and H. S. Gunawi. Samc: Semantic-aware model checking for fast discovery of deep bugs in cloud systems. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 399--414, Broomfield, CO, Oct. 2014. USENIX Association.
[23]
X. Liu, Z. Guo, X. Wang, F. Chen, X. Lian, J. Tang, M. Wu, M. F. Kaashoek, and Z. Zhang. D3s: Debugging deployed distributed systems. NSDI, 2008.
[24]
X. Liu, W. Lin, A. Pan, and Z. Zhang. Wids checker: Combating bugs in distributed systems. NSDI, 2007.
[25]
H. Thane and H. Hansson. Using deterministic replay for debugging of distributed real-time systems. In Proceedings of the Euromicro Conference on Real-time Systems, ECRTS, 2000.
[26]
D. Yuan, H. Mai, W. Xiong, L. Tan, Y. Zhou, and S. Pasupathy. Sherlog: error diagnosis by connecting clues from runtime logs. In ACM SIGARCH Computer Architecture News, volume 38, pages 143--154. ACM, 2010.
[27]
C. Zamfir, G. Altekar, and I. Stoica. Automating the debugging of datacenter applications with adda. DSN, 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOCC '14: Proceedings of the ACM Symposium on Cloud Computing
November 2014
383 pages
ISBN:9781450332521
DOI:10.1145/2670979
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cloud Computing
  2. Debugging
  3. Hard System Bug
  4. Lightweight

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

SOCC '14
Sponsor:
SOCC '14: ACM Symposium on Cloud Computing
November 3 - 5, 2014
WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)FCatchACM SIGPLAN Notices10.1145/3296957.317716153:2(419-431)Online publication date: 19-Mar-2018
  • (2018)FCatchProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3177161(419-431)Online publication date: 19-Mar-2018
  • (2018)Controlling Cloud-Based Systems for Elasticity Test ReproductionCloud Computing and Service Science10.1007/978-3-319-94959-8_11(200-222)Online publication date: 14-Jul-2018
  • (2017)End-to-end regression testing for distributed systemsProceedings of the 18th Doctoral Symposium of the 18th International Middleware Conference10.1145/3152688.3152692(9-12)Online publication date: 11-Dec-2017
  • (2017)PensieveProceedings of the 26th Symposium on Operating Systems Principles10.1145/3132747.3132768(19-33)Online publication date: 14-Oct-2017
  • (2015)Tackling the reproducibility problem in storage systems research with declarative experiment specificationsProceedings of the 10th Parallel Data Storage Workshop10.1145/2834976.2834979(25-30)Online publication date: 15-Nov-2015
  • (2014)What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud SystemsProceedings of the ACM Symposium on Cloud Computing10.1145/2670979.2670986(1-14)Online publication date: 3-Nov-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media