skip to main content
10.1145/2830903.2830908acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
invited-talk

Evolution of fault tolerance

Published: 04 October 2015 Publication History

Abstract

Ken Birman's talk focused on controversies surrounding fault-tolerance and consistency. Looking at the 1990's, he pointed to debate around the so-called CATOCS question (CATOCS refers to causally and totally ordered communication primitives) and drew a parallel to the more modern debate about consistency at cloud scale (often referred to as the CAP conjecture). Ken argued that the underlying tension is actually one that opposes basic principles of the field against the seemingly unavoidable complexity of mechanisms strong enough to solve consensus, particularly the family of protocols with Paxos-like structures. Over time, this was resolved: He concluded that today, we finally know how to build very fast and scalable solutions (those who attended SOSP 2015 itself saw ten or more of the paper on such topics). On the other hand, Ken sees a new generation of challenges on the horizon: cloud-scale applications that will need a novel mix of scalable consistency and real-time guarantees, will need to leverage new new hardware options (RDMA, NVRAM and other "middle memory" options), and may need to be restructured to reflect a control-plane/data-plane split. These trends invite a new look at what has become a core topic for the SOSP community.

Supplementary Material

Reflections on the History of Operating Systems Research in Fault Tolerance (invited essay) (s1-birman.pdf)
Essay accompanying the Evolution of Fault Tolerance article and talk
MP4 File (a7.mp4)

Cited By

View all
  • (2022)Mandrake: multiagent systems as a basis for programming fault-tolerant decentralized applicationsAutonomous Agents and Multi-Agent Systems10.1007/s10458-021-09540-836:1Online publication date: 8-Feb-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '15: SOSP History Day 2015
October 2015
391 pages
ISBN:9781450340175
DOI:10.1145/2830903
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2015

Check for updates

Qualifiers

  • Invited-talk

Conference

SOSP '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Mandrake: multiagent systems as a basis for programming fault-tolerant decentralized applicationsAutonomous Agents and Multi-Agent Systems10.1007/s10458-021-09540-836:1Online publication date: 8-Feb-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media