skip to main content
research-article
Free access

Fail at scale

Published: 23 October 2015 Publication History

Abstract

Reliability in the face of rapid change

References

[1]
CoDel (controlled delay) algorithm; http://queue.acm.org/detail.cfm?id=2209336.
[2]
Cubism; https://square.github.io/cubism/.
[3]
HipHop Virtual Machine (HHVM); http://bit.ly/1Qw68bz
[4]
Thrift framework; https://github.com/facebook/fbthrift.
[5]
Wangle library; https://github.com/facebook/wangle/blob/master/wangle/concurrent/Codel.cpp.

Cited By

View all
  • (2023)Fail through the Cracks: Cross-System Interaction Failures in Modern Cloud SystemsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587448(433-451)Online publication date: 8-May-2023
  • (2023)Test Selection for Unified Regression Testing2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00145(1687-1699)Online publication date: May-2023
  • (2023)Vicious Cycles in Distributed Software SystemsProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00032(91-103)Online publication date: 11-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 58, Issue 11
November 2015
112 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2838899
  • Editor:
  • Moshe Y. Vardi
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2015
Published in CACM Volume 58, Issue 11

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)312
  • Downloads (Last 6 weeks)58
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Fail through the Cracks: Cross-System Interaction Failures in Modern Cloud SystemsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587448(433-451)Online publication date: 8-May-2023
  • (2023)Test Selection for Unified Regression Testing2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00145(1687-1699)Online publication date: May-2023
  • (2023)Vicious Cycles in Distributed Software SystemsProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00032(91-103)Online publication date: 11-Nov-2023
  • (2021)An Evolutionary Study of Configuration Design and Implementation in Cloud SystemsProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00029(188-200)Online publication date: 22-May-2021
  • (2020)Testing configuration changes in context to prevent production failuresProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488808(735-751)Online publication date: 4-Nov-2020
  • (2020)Overload control for µs-scale RPCs with breakwaterProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488783(299-314)Online publication date: 4-Nov-2020
  • (2020)Understanding and discovering software configuration dependencies in cloud and datacenter systemsProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3409727(362-374)Online publication date: 8-Nov-2020
  • (2019)TaijiProceedings of the 27th ACM Symposium on Operating Systems Principles10.1145/3341301.3359655(430-446)Online publication date: 27-Oct-2019
  • (2018)MaelstromProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291196(373-389)Online publication date: 8-Oct-2018
  • (2018)A Large Scale Study of Data Center Network ReliabilityProceedings of the Internet Measurement Conference 201810.1145/3278532.3278566(393-407)Online publication date: 31-Oct-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDFChinese translation

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media