skip to main content
10.1145/2484239.2484293acmconferencesArticle/Chapter ViewAbstractPublication PagespodcConference Proceedingsconference-collections
extended-abstract

Brief announcement: techniques for programmatically troubleshooting distributed systems

Published: 22 July 2013 Publication History

Abstract

The distributed systems research community has developed many provably correct algorithms and abstractions that are in wide use. However, practical implementations of distributed systems often contain many bugs, and practitioners spend much of their time troubleshooting these bugs. In this paper we present an algorithm, retrospective causal inference, to ease the burden of troubleshooting. We end by enumerating several open research problems related to the troubleshooting process.

References

[1]
P. Godefroid and N. Nagappan. Concurrency at Microsoft - An Exploratory Survey. CAV '08.
[2]
J. Y. Halpern and Y. Moses. Knowledge and Common Knowledge in a Distributed Environment. JACM '90.
[3]
J. C. King. Symbolic Execution and Program Testing. CACM '76.
[4]
C. Scott, A. Wundsam, S. Whitlock, A. Or, E. Huang, K. Zarifis, and S. Shenker. How Did We Get Into This Mess? Isolating Fault-Inducing Inputs to SDN Control Software. Technical Report UCB/EECS-2013-8, University of California, Berkeley, '13.
[5]
G. Tel. Introduction to Distributed Algorithms. Thm. 2.21. Cambridge University Press, 2000.
[6]
A. Whitaker, R. Cox, and S. Gribble. Configuration Debugging as Search: Finding the Needle in the Haystack. SOSP '04.
[7]
A. Zeller. Yesterday, my program worked. Today, it does not. Why? ESEC/FSE '99.
[8]
A. Zeller and R. Hildebrandt. Simplifying and Isolating Failure-Inducing Input. IEEE TSE '02.
[9]
H. Zeng, P. Kazemian, G. Varghese, and N. McKeown. A Survey on Network Troubleshooting. Technical Report TR12-HPNG-061012, Stanford University '12.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODC '13: Proceedings of the 2013 ACM symposium on Principles of distributed computing
July 2013
422 pages
ISBN:9781450320658
DOI:10.1145/2484239
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2013

Check for updates

Author Tags

  1. automation
  2. tools
  3. troubleshooting

Qualifiers

  • Extended-abstract

Conference

PODC '13
Sponsor:
PODC '13: ACM Symposium on Principles of Distributed Computing
July 22 - 24, 2013
Québec, Montréal, Canada

Acceptance Rates

PODC '13 Paper Acceptance Rate 37 of 145 submissions, 26%;
Overall Acceptance Rate 740 of 2,477 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 155
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media