research-article

Leveraging sharding in the design of scalable replication protocols

Authors:

Hussam Abu-Libdeh,

Robbert van Renesse,

Ymir VigfussonAuthors Info & Claims

SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

Article No.: 12, Pages 1 - 16

https://doi.org/10.1145/2523616.2523623

Published: 01 October 2013 Publication History

Abstract

Most if not all datacenter services use sharding and replication for scalability and reliability. Shards are more-or-less independent of one another and individually replicated. In this paper, we challenge this design philosophy and present a replication protocol where the shards interact with one another: A protocol running within shards ensures linearizable consistency, while the shards interact in order to improve availability. We provide a specification for the protocol, prove its safety, analyze its liveness and availability properties, and evaluate a working implementation.

References

[1]

P. Alsberg and J. Day. A principle for resilient sharing of distributed resources. In Proc. of the 2nd Int. Conf. on Software Engineering (ICSE'76), pages 627--644, San Francisco, CA, Oct. 1976. IEEE.

Digital Library

[2]

J. Armstrong. The development of Erlang. In Proc. of the SIGPLAN Int. Conf. on Functional Programming, pages 196--203. ACM Press, 1997.

Digital Library

[3]

H. Attiya, A. Bar Noy, and D. Dolev. Sharing memory robustly in message passing systems. Journal of the ACM, 42(1): 121--132, 1995.

Digital Library

[4]

P. Bailis and A. Ghodsi. Eventual consistency today: limitations, extensions, and beyond. CACM, 56(5): 55--63, May 2013.

Digital Library

[5]

J. Baker, C. Bond, J. Corbett, J. J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In CIDR, pages 223--234. www.cidrdb.org, 2011.

[6]

N. Belaramani, J. Zheng, A. Nayte, M. Dahlin, and R. Grimm. PADS: A Policy Architecture for building Distributed Storage systems. In 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Apr. 2009.

Digital Library

[7]

K. Birman, D. Malkhi, and R. Van Renesse. Virtually Synchronous Methodology for Dynamic Service Replication. Technical Report MSR-TR-2010-151, Microsoft Research, 2010.

[8]

K. P. Birman and T. A. Joseph. Exploiting virtual synchrony in distributed systems. In Proc. of the 11th ACM Symp. on Operating Systems Principles, Austin, TX, Nov. 1987. ACM Press.

Digital Library

[9]

N. Budhiraja, K. Marzullo, F. Schneider, and S. Toueg. The primary-backup approach. In S. Mullender, editor, Distributed systems (2nd Ed.). ACM Press/Addison-Wesley, New York, NY, 1993.

Digital Library

[10]

M. Burrows. The Chubby Lock Service for loosely-coupled distributed systems. In 7th Symposium on Operating System Design and Implementation, Seattle, WA, Nov. 2006.

Digital Library

[11]

J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, et al. Spanner: Google's globally-distributed database. In Proc. of the 10th Symposium on Operating Systems Design and Implementation (OSDI'12), Hollywood, CA, Oct. 2012. USENIX.

Digital Library

[12]

S. B. Davidson, H. Garcia-Molina, and D. Skeen. Consistency in partitioned networks. ACM Computing Surveys, 17(3): 341--370, Sept. 1985.

Digital Library

[13]

G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proc. of 21st Symposium on Operating Systems Principles, 2007.

Digital Library

[14]

M. Fischer, N. Lynch, and M. Patterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2): 374--382, Apr. 1985.

Digital Library

[15]

S. Ghermawat, H. Gobioff, and S.-T. Leung. The Google file system. In Proc. of the 19th ACM Symp. on Operating Systems Principles, Bolton Landing, NY, Oct. 2003.

Digital Library

[16]

L. Glendenning, I. Beschastnikh, A. Krishnamurthy, and T. Anderson. Scalable consistency in scatter. In Symposium on Operating Systems Principles (SOSP '11), Cascais, Portugal, Oct. 2011.

Digital Library

[17]

C. Gray and D. Cheriton. Leases: an efficient fault-tolerant mechanism for distributed file cache consistency. In Proc. of the Twelfth ACM Symp. on Operating Systems Principles, pages 202--210, Litch-field Park, AZ, Nov. 1989.

Digital Library

[18]

M. Herlihy. A quorum consensus replication method for abstract data types. Trans. on Computer Systems, 4(1): 32--53, Feb. 1986.

Digital Library

[19]

M. Herlihy and J. Wing. Linearizability: A correctness condition for concurrent objects. Trans. on Programming Languages and Systems, 12(3): 463--492, July 1990.

Digital Library

[20]

P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, volume 8, pages 11--11, 2010.

Digital Library

[21]

R. Jimenéz-Peris and M. Patiño-Martínez. Are quorums an alternative for data replication? ACM Trans. Database Syst., 28(3): 257--294, Sept. 2003.

Digital Library

[22]

A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2): 35--40, 2010.

Digital Library

[23]

L. Lamport. Time, clocks, and the ordering of events in a distributed system. CACM, 21(7): 558--565, July 1978.

Digital Library

[24]

L. Lamport. The part-time parliament. Trans. on Computer Systems, 16(2): 133--169, 1998.

Digital Library

[25]

L. Lamport, D. Malkhi, and L. Zhou. Brief announcement: Vertical Paxos and Primary-Backup replication. In Proc. of the 28th ACM Symp. on Principles of Distributed Computing, Aug. 2009.

Digital Library

[26]

L. Lamport and M. Massa. Cheap Paxos. In Proceedings of the 2004 International Conference on Dependable Systems and Networks, DSN '04, Washington, DC, 2004. IEEE Computer Society.

Digital Library

[27]

W. Lloyd, M. Freedman, M. Kaminsky, and D. Andersen. Don't settle for eventual: Scalable causal consistency for wide-area storage with COPS. In Symposium on Operating Systems Principles (SOSP '11), Cascais, Portugal, Oct. 2011.

Digital Library

[28]

A. Muthitacharoen, S. Gilbert, and R. Morris. Etna: A fault-tolerant algorithm for atomic mutable DHT data. Technical Report MIT-LCS-TR-993, MIT Laboratory for Computer Science, June 2004.

[29]

J.-F. Paris. Voting with witnesses: A consistency scheme for replicated files. In Proc. of the 6th Int. Conf. on Distributed Computer Systems. IEEE, 1986.

[30]

C. Pu and A. Leff. Replica control in distributed systems: An asynchronous approach. In Proc. of the 1991 ACM SIGMOD Int Conf. on Management of Data, pages 377--386, 1991.

Digital Library

[31]

B. Reed and F. P. Junqueira. A simple totally ordered broadcast protocol. In proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, page 2. ACM, 2008.

Digital Library

[32]

R. Rodrigues and B. Liskov. Rosebud: A scalable Byzantine-fault-tolerant storage architecture. Technical Report MIT-LCS-TR-992, MIT Laboratory for Computer Science, Dec. 2003.

[33]

R. Rodrigues, B. Liskov, and L. Shrira. The design of a robust peer-to-peer system. In Proc. of the 10th ACM SIGOPS Eur. Workshop, Sept. 2002.

Digital Library

[34]

S. Sankararaman, B.-G. Chun, C. Yatin, and S. Shenker. Key consistency in DHTs. Technical Report UCB/EECS-2005-21, UC Berkeley, 2005.

[35]

R. Schlichting and F. Schneider. Fail-stop processors: an approach to designing fault-tolerant computing systems. Trans. on Computer Systems, 1(3): 222--238, Aug. 1983.

Digital Library

[36]

F. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4): 299--319, Dec. 1990.

Digital Library

[37]

T. Shafaat, M. Moser, A. Ghodsi, T. Schütt, S. Haridi, and A. Reinefeld. On consistency of data in structured overlay networks. In Proceedings of the 3rd CoreGRID Integration Workshop, Apr. 2008.

[38]

K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1--10. IEEE, 2010.

Digital Library

[39]

R. Thomas. A solution to the concurrency control problem for multiple copy databases. In Proc. of COMPCON 78 Spring, pages 88--93, Washington, D.C., Feb. 1978. IEEE Computer Society.

[40]

R. van Renesse, C. Ho, and N. Schiper. Byzantine chain replication. In OPODIS, Rome, Italy, December 2012.

[41]

R. van Renesse and F. Schneider. Chain Replication for supporting high throughput and availability. In 6th Symp. on Operating Systems Design and Implementation (OSDI '04), Dec. 2004.

Digital Library

[42]

W. Vogels. Eventually consistent. ACM Queue, 6(6), Dec. 2008.

Digital Library

[43]

H. Yu and A. Vahdat. The cost and limits of availability for replicated services. In Proc. of the 18th ACM Symp. on Operating Systems Principles, Banff, Canada, Oct. 2001.

Digital Library

Cited By

Liya BS PS RK N(2023)Decentralized E-Commerce Platform Implemented using Smart Contracts2023 3rd International Conference on Smart Data Intelligence (ICSMDI)10.1109/ICSMDI57622.2023.00013(23-27)Online publication date: Mar-2023
https://doi.org/10.1109/ICSMDI57622.2023.00013
Ganesan AAlagappan RRebello AArpaci-Dusseau AArpaci-Dusseau R(2022)Exploiting Nil-external Interfaces for Fast Replicated StorageACM Transactions on Storage10.1145/354282118:3(1-35)Online publication date: 2-Sep-2022
https://dl.acm.org/doi/10.1145/3542821
Ganesan AAlagappan RArpaci-Dusseau AArpaci-Dusseau R(2021)Exploiting Nil-Externality for Fast Replicated StorageProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483543(440-456)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3477132.3483543
Show More Cited By

Index Terms

Leveraging sharding in the design of scalable replication protocols

Recommendations

QUORUM-ORIENTED MULTICAST PROTOCOLS FOR DATA REPLICATION
Database Replication
Branch replication scheme: A new model for data replication in large scale data grids

Data replication is a practical and effective method to achieve efficient and fault-tolerant data access in grids. Traditionally, data replication schemes maintain an entire replica in each site where a file is replicated, providing a read-only model. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

October 2013

427 pages

ISBN:9781450324281

DOI:10.1145/2523616

General Chair:
Guy Lohman

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

SOCC '13

Sponsor:

SOCC '13: ACM Symposium on Cloud Computing

October 1 - 3, 2013

California, Santa Clara

Acceptance Rates

SOCC '13 Paper Acceptance Rate 23 of 114 submissions, 20%;

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
259
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liya BS PS RK N(2023)Decentralized E-Commerce Platform Implemented using Smart Contracts2023 3rd International Conference on Smart Data Intelligence (ICSMDI)10.1109/ICSMDI57622.2023.00013(23-27)Online publication date: Mar-2023
https://doi.org/10.1109/ICSMDI57622.2023.00013
Ganesan AAlagappan RRebello AArpaci-Dusseau AArpaci-Dusseau R(2022)Exploiting Nil-external Interfaces for Fast Replicated StorageACM Transactions on Storage10.1145/354282118:3(1-35)Online publication date: 2-Sep-2022
https://dl.acm.org/doi/10.1145/3542821
Ganesan AAlagappan RArpaci-Dusseau AArpaci-Dusseau R(2021)Exploiting Nil-Externality for Fast Replicated StorageProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483543(440-456)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3477132.3483543
Liu JShen HChi HNarman HYang YCheng LChung W(2021)A Low-Cost Multi-Failure Resilient Replication Scheme for High-Data Availability in Cloud StorageIEEE/ACM Transactions on Networking10.1109/TNET.2020.302781429:4(1436-1451)Online publication date: Aug-2021
https://doi.org/10.1109/TNET.2020.3027814
Brooker MChen TPing FBhagwan RPorter G(2020)Millions of tiny databasesProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388276(463-478)Online publication date: 25-Feb-2020
https://dl.acm.org/doi/10.5555/3388242.3388276
Dhulavvagol PBhajantri VTotad S(2020)Performance Analysis of Distributed Processing System using Shard Selection Techniques on ElasticsearchProcedia Computer Science10.1016/j.procs.2020.03.373167(1626-1635)Online publication date: 2020
https://doi.org/10.1016/j.procs.2020.03.373
Liu JShen HNarman H(2019)Popularity-Aware Multi-Failure Resilient and Cost-Effective Replication for High Data Durability in Cloud StorageIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.287338430:10(2355-2369)Online publication date: 1-Oct-2019
https://doi.org/10.1109/TPDS.2018.2873384
Liu CXiao YJavangula VHu QWang SCheng X(2019)NormaChain: A Blockchain-Based Normalized Autonomous Transaction Settlement System for IoT-Based E-CommerceIEEE Internet of Things Journal10.1109/JIOT.2018.28776346:3(4680-4693)Online publication date: Jun-2019
https://doi.org/10.1109/JIOT.2018.2877634
Gramoli VBass LFekete ASun D(2016)RollupIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.249977227:9(2711-2724)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1109/TPDS.2015.2499772
Liu GShen HChandler H(2016)Selective Data Replication for Online Social Networks with Distributed DatacentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.248526627:8(2377-2393)Online publication date: 1-Aug-2016
https://doi.org/10.1109/TPDS.2015.2485266
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten