skip to main content
10.1145/3477132.3483544acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections

Log-structured Protocols in Delos

Published: 26 October 2021 Publication History


Developers have access to a wide range of storage APIs and functionality in large-scale systems, such as relational databases, key-value stores, and namespaces. However, this diversity comes at a cost: each API is implemented by a complex distributed system that is difficult to develop and operate. Delos amortizes this cost by enabling different APIs on a shared codebase and operational platform. The primary innovation in Delos is a log-structured protocol: a fine-grained replicated state machine executing above a shared log that can be layered into reusable protocol stacks under different databases. We built and deployed two production databases using Delos at Facebook, creating nine different log-structured protocols in the process. We show via experiments and production data that log-structured protocols impose low overhead, while allowing optimizations that can improve latency by up to 100X (e.g., via leasing) and throughput by up to 2X (e.g., via batching).


Adya, A., Grandl, R., Myers, D., and Qin, H. Fast key-value stores: An idea whose time has come and gone. In HotOS 2019.
Aguilera, M. K., Leners, J. B., and Walfish, M. Yesquel: Scalable sql storage for web applications. In ACM SOSP 2015.
Armbrust, M., Das, T., Sun, L., Yavuz, B., Zhu, S., Murthy, M., Torres, J., van Hovell, H., Ionescu, A., Łuszczak, A., et al. Delta lake: high-performance acid table storage over cloud object stores. In VLDB 2020.
Azagury, A., Factor, M. E., Satran, J., and Micka, W. Point-in-time copy: Yesterday, today and tomorrow. In IEEE MSST 2002.
Balakrishnan, M., Flinn, J., Shen, C., Dharamshi, M., Jafri, A., Shi, X., Ghosh, S., Hassan, H., Sagar, A., Shi, R., et al. Virtual Consensus in Delos. In USENIX OSDI 2020.
Balakrishnan, M., Malkhi, D., Prabhakaran, V., Wobber, T., Wei, M., and Davis, J. D. CORFU: A Shared Log Design for Flash Clusters. In USENIX NSDI 2012.
Balakrishnan, M., Malkhi, D., Wobber, T., Wu, M., Prabhakaran, V., Wei, M., Davis, J. D., Rao, S., Zou, T., and Zuck, A. Tango: Distributed Data Structures over a Shared Log. In ACM SOSP 2013.
Bernstein, P. A., Das, S., Ding, B., and Pilman, M. Optimizing Optimistic Concurrency Control for Tree-Structured, Log-Structured Databases. In Proceedings of ACM SIGMOD 2015.
Bittman, D., Alvaro, P., Mehra, P., Long, D. D., and Miller, E. L. Twizzler: a Data-Centric OS for Non-Volatile Memory. In USENIX ATC 2020.
Burrows, M. The Chubby lock service for loosely-coupled distributed systems. In USENIX OSDI 2006.
Cao, Z., Dong, S., Vemuri, S., and Du, D. H. Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook. In USENIX FAST 2020.
Chrysafis, C., Collins, B., Dugas, S., Dunkelberger, J., Ehsan, M., Gray, S., Grieser, A., Herrnstadt, O., Lev-Ari, K., Lin, T., McMahon, M., Schiefer, N., and Shraer, A. FoundationDB record layer: A Multi-Tenant Structured Datastore. In ACM SIGMOD 2019.
Clark, D. D. The structuring of systems using upcalls. In ACM SOSP 1985.
Coburn, J., Caulfield, A. M., Akel, A., Grupp, L. M., Gupta, R. K., Jhala, R., and Swanson, S. NV-Heaps: Making Persistent Objects Fast and Safe with Next-Generation, Non-Volatile Memories. In ACM ASPLOS 2011.
Corbett, J. C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J. J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., et al. Spanner: Google's Globally Distributed Database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 1--22.
Cui, H., Gu, R., Liu, C., Chen, T., and Yang, J. PAXOS Made Transparent. In ACM SOSP 2015.
Cui, H., Simsa, J., Lin, Y.-H., Li, H., Blum, B., Xu, X., Yang, J., Gibson, G. A., and Bryant, R. E. Parrot: A Practical Runtime for Deterministic, Stable, and Reliable Threads. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013), pp. 388---405.
DeWitt, D. J., Katz, R. H., Olken, F., Shapiro, L. D., Stonebraker, M. R., and Wood, D. A. Implementation techniques for main memory database systems. In ACM SIGMOD 1984.
Ding, C., Chu, D., Zhao, E., Li, X., Alvisi, L., and van Renesse, R. Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log. In USENIX NSDI 2020.
Dragojević, A., Narayanan, D., Castro, M., and Hodson, O. FaRM: Fast Remote Memory. In USENIX NSDI 2014.
Friedman, M., Herlihy, M., Marathe, V., and Petrank, E. A Persistent Lock-Free Queue for Non-Volatile Memory. ACM SIGPLAN Notices 53, 1 (2018), 28--40.
Garbinato, B., and Guerraoui, R. Flexible protocol composition in BAST. In ICDCS 1998.
Gray, C., and Cheriton, D. Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency. In ACM SOSP 1989.
Guy, R. G., Heidemann, J. S., Mak, W., Page Jr, T. W., Popek, G. J., and Rothmeier, D. Implementation of the Ficus Replicated File System. In USENIX Summer 1990.
Heidemann, J. S., and Popek, G. J. File-System Development with Stackable Layers. ACM Transactions on Computer Systems (TOCS) 12, 1 (1994), 58--89.
Herlihy, M. P., and Wing, J. M. Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst. 12, 3 (July 1990), 463--492.
Hunt, P., Konar, M., Junqueira, F. P., and Reed, B. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In USENIX ATC 2010.
Hutchinson, N. C., and Peterson, L. L. The x-kernel: An Architecture for Implementing Network Protocols. IEEE Transactions on Software engineering 17, 1 (1991), 64.
Jia, Z., and Witchel, E. Boki: Stateful Serverless Computing with Shared Logs. In ACM SOSP 2021.
Junqueira, F. P., Reed, B. C., and Serafini, M. Zab: High-performance broadcast for primary-backup systems. In IEEE DSN 2011.
Khalidi, Y. A., and Nelson, M. N. Extensible File Systems in Spring. In ACM SOSP 1993.
Kogias, M., and Bugnion, E. HovercRaft: Achieving Scalability and Fault-tolerance for microsecond-scale Datacenter Services. In ACM EuroSys 2020.
Kulkarni, C., Moore, S., Naqvi, M., Zhang, T., Ricci, R., and Stutsman, R. Splinter: Bare-Metal Extensions for Multi-Tenant Low-Latency Storage. In USENIX OSDI 2018.
Lamport, L. The Part-Time Parliament. ACM Transactions on Computer Systems (TOCS) 16, 2 (1998), 133--169.
Lee, C., Park, S. J., Kejriwal, A., Matsushita, S., and Ousterhout, J. Implementing Linearizability at Large Scale and Low Latency. In ACM SOSP 2015.
Li, T., Chandramouli, B., Faleiro, J. M., Madden, S., and Kossmann, D. Asynchronous Prefix Recoverability for Fast Distributed Stores. In ACM SIGMOD 2021.
Liu, T., Curtsinger, C., and Berger, E. D. Dthreads: efficient deterministic multithreading. In ACM SOSP 2011.
Liu, X., Kreitz, C., van Renesse, R., Hickey, J., Hayden, M., Birman, K., and Constable, R. Building Reliable, High-Performance Communication Systems from Components. In ACM SOSP 1999.
Lorch, J. R., Adya, A., Bolosky, W. J., Chaiken, R., Douceur, J. R., and Howell, J. The SMART Way to Migrate Replicated Stateful Services. In ACM EuroSys 2006.
Ongaro, D., and Ousterhout, J. K. In Search of an Understandable Consensus Algorithm. In USENIX ATC 2014.
Ostrowski, K., Birman, K., Dolev, D., and Ahnn, J. H. Programming with Live Distributed Objects. In ECOOP 2008.
Pedone, F., Guerraoui, R., and Schiper, A. The Database State Machine Approach. Distributed and Parallel Databases 14, 1 (2003), 71--98.
Peng, D., and Dabek, F. Large-scale Incremental Processing Using Distributed Transactions and Notifications. In USENIX OSDI 2010.
Ritchie, D. M. The UNIX System: A Stream Input-Output System. AT&T Bell Laboratories Technical Journal 63, 8 (1984), 1897--1910.
Schneider, F. B. Implementing Fault-tolerant Services using the State Machine Approach: A Tutorial. ACM Computing Surveys (CSUR) 22, 4 (1990), 299--319.
Shute, J., Vingralek, R., Samwel, B., Handy, B., Whipkey, C., Rollins, E., Littlefield, M. O. K., Menestrina, D., Cieslewicz, S. E. J., Rae, I., Stancescu, T., and Apte, H. F1: A Distributed SQL Database That Scales. In VLDB 2013.
Tang, C., Yu, K., Veeraraghavan, K., Kaldor, J., Michelson, S., Kooburat, T., Anbudurai, A., Clark, M., Gogia, K., Cheng, L., Christensen, B., Gartrell, A., Khutornenko, M., Kulkarni, S., Pawlowski, M., Pelkonen, T., Rodrigues, A., Tibrewal, R., Pawlowski, M., Pelkonen, T., Rodrigues, A., Tibrewal, R., Venkatesan, V., and Zhang, P. Twine: A Unified Cluster Management System for Shared Infrastructure. In USENIX OSDI 2020.
Van Renesse, R., and Altinbuken, D. Paxos Made Moderately Complex. ACM Computing Surveys (CSUR) 47, 3 (2015), 1--36.
van Renesse, R., Birman, K. P., Friedman, R., Hayden, M., and Karr, D. A. A Framework for Protocol Composition in Horus. In ACM PODC 1995.
Van Renesse, R., Birman, K. P., and Maffeis, S. Horus: A Flexible Group Communication System. Communications of the ACM 39, 4 (1996), 76--83.
Wei, M., Tai, A., Rossbach, C. J., Abraham, I., Munshed, M., Dhawan, M., Stabile, J., Wieder, U., Fritchie, S., Swanson, S., et al. vCorfu: A Cloud-Scale Object Store on a Shared Log. In USENIX NSDI 2017.
You, J., Wu, J., Jin, X., and Chowdhury, M. Ship Compute or Ship Data? Why Not Both? In USENIX NSDI 2021.
Zhang, W., Shenker, S., and Zhang, I. Persistent state machines for recoverable in-memory storage systems with nvram. In USENIX OSDI 2020.
Zimmermann, H. OSI Reference Model - The ISO Model of Architecture for Open Systems Interconnection. IEEE Transactions on Communications 28, 4 (1980), 425--432.

Cited By

View all
  • (2024)The Key Ideas Behind Boki's Shared LogsACM SIGOPS Operating Systems Review10.1145/3689051.368905458:1(7-14)Online publication date: 14-Aug-2024
  • (2024)IndiLog: Bridging Scalability and Performance in Stateful Serverless Computing with Shared LogsProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689159(1-13)Online publication date: 16-Sep-2024
  • (2024)Boki: Towards Data Consistency and Fault Tolerance with Shared Logs in Stateful Serverless ComputingACM Transactions on Computer Systems10.1145/365307242:3-4(1-35)Online publication date: 13-Nov-2024
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
SOSP '21: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles
October 2021
899 pages
This work is licensed under a Creative Commons Attribution International 4.0 License.



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2021

Check for updates

Author Tags

  1. Consensus
  2. State Machine Replication


  • Research-article
  • Research
  • Refereed limited


SOSP '21

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)7
Reflects downloads up to 08 Mar 2025

Other Metrics


Cited By

View all
  • (2024)The Key Ideas Behind Boki's Shared LogsACM SIGOPS Operating Systems Review10.1145/3689051.368905458:1(7-14)Online publication date: 14-Aug-2024
  • (2024)IndiLog: Bridging Scalability and Performance in Stateful Serverless Computing with Shared LogsProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689159(1-13)Online publication date: 16-Sep-2024
  • (2024)Boki: Towards Data Consistency and Fault Tolerance with Shared Logs in Stateful Serverless ComputingACM Transactions on Computer Systems10.1145/365307242:3-4(1-35)Online publication date: 13-Nov-2024
  • (2024)Optimizing Distributed Protocols with Query RewritesProceedings of the ACM on Management of Data10.1145/36392572:1(1-25)Online publication date: 26-Mar-2024
  • (2023)Fine-Grained Re-Execution for Efficient Batched Commit of Distributed TransactionsProceedings of the VLDB Endowment10.14778/3594512.359452316:8(1930-1943)Online publication date: 1-Apr-2023
  • (2023)Halfmoon: Log-Optimal Fault-Tolerant Stateful Serverless ComputingProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613154(314-330)Online publication date: 23-Oct-2023
  • (2023)DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient StepsProceedings of the ACM on Management of Data10.1145/35892621:2(1-27)Online publication date: 20-Jun-2023
  • (2023)FlexLog: A Shared Log for Stateful Serverless ComputingProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592993(195-209)Online publication date: 7-Aug-2023
  • (2021)BokiProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483541(691-707)Online publication date: 26-Oct-2021

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media