skip to main content
10.1145/3064176.3064210acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Saturn: a Distributed Metadata Service for Causal Consistency

Published: 23 April 2017 Publication History

Abstract

This paper presents the design, implementation, and evaluation of Saturn, a metadata service for geo-replicated systems. Saturn can be used in combination with several distributed and replicated data services to ensure that remote operations are made visible in an order that respects causality, a requirement central to many consistency criteria.
Saturn addresses two key unsolved problems inherent to previous approaches. First, it eliminates the tradeoff between throughput and data freshness, when deciding what metadata to use for tracking causality. Second, it enables genuine partial replication, a key property to ensure scalability when the number of geo-locations increases. Saturn addresses these challenges while keeping metadata size constant, independently of the number of clients, servers, data partitions, and locations. By decoupling metadata management from data dissemination, and by using clever metadata propagation techniques, it ensures that the throughput and visibility latency of updates on a given item are (mostly) shielded from operations on other items or locations.
We evaluate Saturn in Amazon EC2 using realistic benchmarks under both full and partial geo-replication. Results show that weakly consistent datastores can lean on Saturn to upgrade their consistency guarantees to causal consistency with a negligible penalty on performance.

References

[1]
M. Ahamad, G. Neiger, J. Burns, P. Kohli, and P. Hutto. Causal memory: definitions, implementation, and programming. Distributed Computing, 9(1):37--49, 1995.
[2]
P. Ajoux, N. Bronson, S. Kumar, W. Lloyd, and K. Veeraraghavan. Challenges to adopting stronger consistency at scale. In Proceeding of the 15th Workshop on Hot Topics in Operating Systems, HotOS '15, 2015.
[3]
D. D. Akkoorath, A. Z. Tomsic, M. Bravo, Z. Li, T. Crain, A. Bieniusa, N. Preguia, and M. Shapiro. Cure: Strong semantics meets high availability and low latency. In Proceeding of the IEEE 36th International Conference on Distributed Computing Systems, ICDCS '16, pages 405--414, 2016.
[4]
M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In Proceedings of the ACM SIGCOMM Conference on Data Communication, SIGCOMM '08, pages 63--74, Seattle, WA, USA, 2008.
[5]
S. Almeida, J. a. Leitão, and L. Rodrigues. ChainReaction: A causal+ consistent datastore based on chain replication. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 85--98, Prague, Czech Republic, 2013.
[6]
D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. Fawn: A fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, pages 1--14, Big Sky, Montana, USA, 2009.
[7]
T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. Linkbench: A database benchmark based on the facebook social graph. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 1185--1196, New York, New York, USA, 2013.
[8]
H. Attiya, F. Ellen, and A. Morrison. Limitations of highly-available eventually-consistent data stores. In Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC '15, pages 385--394, Donostia-San Sebastián, Spain, 2015.
[9]
P. Bailis, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. The potential dangers of causal consistency and an explicit solution. In Proceedings of the 3rd ACM Symposium on Cloud Computing, SoCC '12, pages 22:1--22:7, San Jose, California, 2012.
[10]
P. Bailis, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Bolton causal consistency. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 761--772, New York, New York, USA, 2013.
[11]
P. Bailis, A. Fekete, M. J. Franklin, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Feral concurrency control: An empirical investigation of modern application integrity. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 1327--1342, Melbourne, Victoria, Australia, 2015.
[12]
V. Balegas, S. Duarte, C. Ferreira, R. Rodrigues, N. Preguiça, M. Najafzadeh, and M. Shapiro. Putting consistency back into eventual consistency. In Proceedings of the 10th European Conference on Computer Systems, EuroSys '15, pages 6:1--6:16, Bordeaux, France, 2015.
[13]
Basho. Basho Bench. http://github.com/basho/basho_bench,.
[14]
Basho. Riak core. http://github.com/basho/riak_core,.
[15]
F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida. Characterizing user behavior in online social networks. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement Conference, IMC '09, pages 49--62, Chicago, Illinois, USA, 2009.
[16]
K. Birman, A. Schiper, and P. Stephenson. Lightweight causal and atomic group multicast. ACM Trans. Comput. Syst., 9(3), Aug. 1991.
[17]
M. Bravo, N. Diegues, J. Zeng, P. Romano, and L. Rodrigues. On the use of clocks to enforce consistency in the cloud. IEEE Data Engineering Bulleting, 38(1):18--31, 2015.
[18]
E. A. Brewer. Towards robust distributed systems. In Keynote at the ACM Symposium on Principles of Distributed Computing, PODC, 2000.
[19]
A. Brodersen, S. Scellato, and M. Wattenhofer. Youtube around the world: Geographic popularity of videos. In Proceedings of the 21st International Conference on World Wide Web, WWW '12, pages 241--250, Lyon, France, 2012.
[20]
M. Castro and B. Liskov. Practical byzantine fault tolerance. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation, OSDI '99, pages 173--186, New Orleans, Louisiana, USA, 1999.
[21]
B. Charron-Bost. Concerning the size of logical clocks in distributed systems. Information Processing Letters, 39(1): 11--16, July 1991.
[22]
D. R. Cheriton and D. Skeen. Understanding the limitations of causally and totally ordered communication. In Proceedings of the 14th ACM Symposium on Operating Systems Principles, SOSP '93, pages 44--57, Asheville, North Carolina, USA, 1993.
[23]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's globally-distributed database. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 251--264, Hollywood, CA, USA, 2012.
[24]
M. Dahlin, L. Gao, A. Nayate, A. Venkataramana, P. Yalagandula, and J. Zheng. Practi replication. In Proceedings of the 3rd USENIX Symposium on Networked Systems Design and Implementation, NSDI '06, 2006.
[25]
J. Du, S. Elnikety, A. Roy, andW. Zwaenepoel. Orbe: Scalable causal consistency using dependency matrices and physical clocks. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 11:1--11:14, Santa Clara, California, 2013.
[26]
J. Du, C. Iorgulescu, A. Roy, and W. Zwaenepoel. Gentlerain: Cheap and scalable causal consistency with physical clocks. In Proceedings of the 5th ACM Symposium on Cloud Computing, SOCC '14, pages 4:1--4:13, Seattle, WA, USA, 2014.
[27]
R. Escriva, A. Dubey, B. Wong, and E. G. Sirer. Kronos: The design and implementation of an event ordering service. In Proceedings of the 9th European Conference on Computer Systems, EuroSys '14, pages 3:1--3:14, Amsterdam, The Netherlands, 2014.
[28]
S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51--59, June 2002.
[29]
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. Vl2: A scalable and flexible data center network. In Proceedings of the ACM SIGCOMM Conference on Data Communication, SIGCOMM '09, pages 51--62, Barcelona, Spain, 2009.
[30]
R. Guerraoui and A. Schiper. Genuine atomic multicast in asynchronous distributed systems. Theoretical Computer Science, 254(1):297--316, 2001.
[31]
R. Guerraoui, M. Pavlovic, and D.-A. Seredinschi. Trade-offs in replicated systems. IEEE Data Engineering Bulletin, 39: 14--26, 2016.
[32]
C. Gunawardhana, M. Bravo, and L. Rodrigues. Unobtrusive deferred update stabilization for efficient geo-replication. Arxiv preprint arXiv:1702.01786, Feb. 2017.
[33]
M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463--492, July 1990.
[34]
S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experience with a globally-deployed software defined wan. In Proceedings of the ACM SIGCOMM Conference, SIGCOMM '13, pages 3--14, Hong Kong, China, 2013.
[35]
R. M. Karp. Reducibility among Combinatorial Problems, pages 85--103. Springer US, Boston, MA, 1972.
[36]
R. Ladin, B. Liskov, L. Shrira, and S. Ghemawat. Providing high availability using lazy replication. ACM Trans. Comput. Syst., 10(4):360--391, Nov. 1992.
[37]
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7), July 1978.
[38]
C. Li, D. Porto, A. Clement, J. Gehrke, N. Preguiça, and R. Rodrigues. Making geo-replicated systems fast as possible, consistent when necessary. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI '12, pages 265--278, 2012.
[39]
W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Don't settle for eventual: Scalable causal consistency for wide-area storage with cops. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, SOSP '11, pages 401--416, Cascais, Portugal, 2011.
[40]
W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Stronger semantics for low-latency geo-replicated storage. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI '13, pages 313--328, 2013.
[41]
P. Mahajan, L. Alvisi, and M. Dahlin. Consistency, availability, and convergence. Technical Report TR-11-21, University of Texas at Austin, Austin, Texas, 2011.
[42]
S. A. Mehdi, C. Littley, N. Crooks, L. Alvisi, N. Bronson, and W. Lloyd. I can't believe it's not causal! In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI '17, 2017.
[43]
NTP. The network time protocol. http://www.ntp.org.
[44]
OscaR Team. OscaR: Scala in OR. https://bitbucket.org/oscarlib/oscar.
[45]
K. Petersen, M. J. Spreitzer, D. B. Terry, M. M. Theimer, and A. J. Demers. Flexible update propagation for weakly consistent replication. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, SOSP '97, pages 288--301, Saint Malo, France, 1997.
[46]
J. M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez. The little engine(s) that could: Scaling online social networks. In Proceedings of the ACM SIGCOMM Conference, SIGCOMM '10, pages 375--386, New Delhi, India, 2010.
[47]
F. B. Schneider. Byzantine generals in action: Implementing fail-stop processors. ACM Trans. Comput. Syst., 2(2):145--154, May 1984.
[48]
F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv., 22(4):299--319, Dec. 1990.
[49]
Y. Sovran, R. Power, M. K. Aguilera, and J. Li. Transactional storage for geo-replicated systems. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, SOSP '11, pages 385--400, Cascais, Portugal, 2011.
[50]
D. B. Terry, A. J. Demers, K. Petersen, M. J. Spreitzer, M. M. Theimer, and B. B. Welch. Session guarantees for weakly consistent replicated data. In Proceedings of 3rd International Conference on Parallel and Distributed Information Systems, PDIS '94, pages 140--149, Sep 1994.
[51]
R. Van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation, OSDI '04, 2004.
[52]
B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM Workshop on Online Social Networks, WOSN '09, pages 37--42, Barcelona, Spain, 2009.
[53]
M. Zawirski, N. Preguiça, S. Duarte, A. Bieniusa, V. Balegas, and M. Shapiro. Write fast, read in the past: Causal consistency for client-side applications. In Proceedings of the 16th Annual Middleware Conference, Middleware '15, pages 75--87, Vancouver, BC, Canada, 2015.

Cited By

View all
  • (2024)Approaches to Conflict-free Replicated Data TypesACM Computing Surveys10.1145/369524957:2(1-36)Online publication date: 9-Sep-2024
  • (2024)PrompTCC: Transactional Causally Consistent Reads Can Be Fast and Fresh2024 IEEE 29th Pacific Rim International Symposium on Dependable Computing (PRDC)10.1109/PRDC63035.2024.00022(87-97)Online publication date: 13-Nov-2024
  • (2024)Large-Scale Causal Data Replication for Stateful Edge Applications2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00028(209-220)Online publication date: 23-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems
April 2017
648 pages
ISBN:9781450349383
DOI:10.1145/3064176
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

EuroSys '17
Sponsor:
EuroSys '17: Twelfth EuroSys Conference 2017
April 23 - 26, 2017
Belgrade, Serbia

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)3
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Approaches to Conflict-free Replicated Data TypesACM Computing Surveys10.1145/369524957:2(1-36)Online publication date: 9-Sep-2024
  • (2024)PrompTCC: Transactional Causally Consistent Reads Can Be Fast and Fresh2024 IEEE 29th Pacific Rim International Symposium on Dependable Computing (PRDC)10.1109/PRDC63035.2024.00022(87-97)Online publication date: 13-Nov-2024
  • (2024)Large-Scale Causal Data Replication for Stateful Edge Applications2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00028(209-220)Online publication date: 23-Jul-2024
  • (2023)Antipode: Enforcing Cross-Service Causal Consistency in Distributed ApplicationsProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613176(298-313)Online publication date: 23-Oct-2023
  • (2023)Brief Announcement: CausalEC: A Causally Consistent Data Storage Algorithm based on Cross-Object Erasure CodingProceedings of the 2023 ACM Symposium on Principles of Distributed Computing10.1145/3583668.3594603(374-377)Online publication date: 19-Jun-2023
  • (2022)The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and AvailabilityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.317057433:12(3850-3869)Online publication date: 1-Dec-2022
  • (2022)Omega: A Secure Event Ordering Service for the EdgeIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.307852019:5(2952-2964)Online publication date: 1-Sep-2022
  • (2022)Engage: Session Guarantees for the Edge2022 International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN54977.2022.9868846(1-10)Online publication date: Jul-2022
  • (2022)GeoRep—Resilient Storage for Wide Area NetworksIEEE Access10.1109/ACCESS.2022.319168610(75772-75788)Online publication date: 2022
  • (2022)CCESHP: Causal Consistency Model of Edge Storage Based on Hash Ring and Partial Geo-ReplicationThe Computer Journal10.1093/comjnl/bxac12966:12(2874-2886)Online publication date: 16-Sep-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media