skip to main content
10.1145/2523616.2523623acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Leveraging sharding in the design of scalable replication protocols

Published: 01 October 2013 Publication History

Abstract

Most if not all datacenter services use sharding and replication for scalability and reliability. Shards are more-or-less independent of one another and individually replicated. In this paper, we challenge this design philosophy and present a replication protocol where the shards interact with one another: A protocol running within shards ensures linearizable consistency, while the shards interact in order to improve availability. We provide a specification for the protocol, prove its safety, analyze its liveness and availability properties, and evaluate a working implementation.

References

[1]
P. Alsberg and J. Day. A principle for resilient sharing of distributed resources. In Proc. of the 2nd Int. Conf. on Software Engineering (ICSE'76), pages 627--644, San Francisco, CA, Oct. 1976. IEEE.
[2]
J. Armstrong. The development of Erlang. In Proc. of the SIGPLAN Int. Conf. on Functional Programming, pages 196--203. ACM Press, 1997.
[3]
H. Attiya, A. Bar Noy, and D. Dolev. Sharing memory robustly in message passing systems. Journal of the ACM, 42(1): 121--132, 1995.
[4]
P. Bailis and A. Ghodsi. Eventual consistency today: limitations, extensions, and beyond. CACM, 56(5): 55--63, May 2013.
[5]
J. Baker, C. Bond, J. Corbett, J. J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In CIDR, pages 223--234. www.cidrdb.org, 2011.
[6]
N. Belaramani, J. Zheng, A. Nayte, M. Dahlin, and R. Grimm. PADS: A Policy Architecture for building Distributed Storage systems. In 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Apr. 2009.
[7]
K. Birman, D. Malkhi, and R. Van Renesse. Virtually Synchronous Methodology for Dynamic Service Replication. Technical Report MSR-TR-2010-151, Microsoft Research, 2010.
[8]
K. P. Birman and T. A. Joseph. Exploiting virtual synchrony in distributed systems. In Proc. of the 11th ACM Symp. on Operating Systems Principles, Austin, TX, Nov. 1987. ACM Press.
[9]
N. Budhiraja, K. Marzullo, F. Schneider, and S. Toueg. The primary-backup approach. In S. Mullender, editor, Distributed systems (2nd Ed.). ACM Press/Addison-Wesley, New York, NY, 1993.
[10]
M. Burrows. The Chubby Lock Service for loosely-coupled distributed systems. In 7th Symposium on Operating System Design and Implementation, Seattle, WA, Nov. 2006.
[11]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, et al. Spanner: Google's globally-distributed database. In Proc. of the 10th Symposium on Operating Systems Design and Implementation (OSDI'12), Hollywood, CA, Oct. 2012. USENIX.
[12]
S. B. Davidson, H. Garcia-Molina, and D. Skeen. Consistency in partitioned networks. ACM Computing Surveys, 17(3): 341--370, Sept. 1985.
[13]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proc. of 21st Symposium on Operating Systems Principles, 2007.
[14]
M. Fischer, N. Lynch, and M. Patterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2): 374--382, Apr. 1985.
[15]
S. Ghermawat, H. Gobioff, and S.-T. Leung. The Google file system. In Proc. of the 19th ACM Symp. on Operating Systems Principles, Bolton Landing, NY, Oct. 2003.
[16]
L. Glendenning, I. Beschastnikh, A. Krishnamurthy, and T. Anderson. Scalable consistency in scatter. In Symposium on Operating Systems Principles (SOSP '11), Cascais, Portugal, Oct. 2011.
[17]
C. Gray and D. Cheriton. Leases: an efficient fault-tolerant mechanism for distributed file cache consistency. In Proc. of the Twelfth ACM Symp. on Operating Systems Principles, pages 202--210, Litch-field Park, AZ, Nov. 1989.
[18]
M. Herlihy. A quorum consensus replication method for abstract data types. Trans. on Computer Systems, 4(1): 32--53, Feb. 1986.
[19]
M. Herlihy and J. Wing. Linearizability: A correctness condition for concurrent objects. Trans. on Programming Languages and Systems, 12(3): 463--492, July 1990.
[20]
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, volume 8, pages 11--11, 2010.
[21]
R. Jimenéz-Peris and M. Patiño-Martínez. Are quorums an alternative for data replication? ACM Trans. Database Syst., 28(3): 257--294, Sept. 2003.
[22]
A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2): 35--40, 2010.
[23]
L. Lamport. Time, clocks, and the ordering of events in a distributed system. CACM, 21(7): 558--565, July 1978.
[24]
L. Lamport. The part-time parliament. Trans. on Computer Systems, 16(2): 133--169, 1998.
[25]
L. Lamport, D. Malkhi, and L. Zhou. Brief announcement: Vertical Paxos and Primary-Backup replication. In Proc. of the 28th ACM Symp. on Principles of Distributed Computing, Aug. 2009.
[26]
L. Lamport and M. Massa. Cheap Paxos. In Proceedings of the 2004 International Conference on Dependable Systems and Networks, DSN '04, Washington, DC, 2004. IEEE Computer Society.
[27]
W. Lloyd, M. Freedman, M. Kaminsky, and D. Andersen. Don't settle for eventual: Scalable causal consistency for wide-area storage with COPS. In Symposium on Operating Systems Principles (SOSP '11), Cascais, Portugal, Oct. 2011.
[28]
A. Muthitacharoen, S. Gilbert, and R. Morris. Etna: A fault-tolerant algorithm for atomic mutable DHT data. Technical Report MIT-LCS-TR-993, MIT Laboratory for Computer Science, June 2004.
[29]
J.-F. Paris. Voting with witnesses: A consistency scheme for replicated files. In Proc. of the 6th Int. Conf. on Distributed Computer Systems. IEEE, 1986.
[30]
C. Pu and A. Leff. Replica control in distributed systems: An asynchronous approach. In Proc. of the 1991 ACM SIGMOD Int Conf. on Management of Data, pages 377--386, 1991.
[31]
B. Reed and F. P. Junqueira. A simple totally ordered broadcast protocol. In proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, page 2. ACM, 2008.
[32]
R. Rodrigues and B. Liskov. Rosebud: A scalable Byzantine-fault-tolerant storage architecture. Technical Report MIT-LCS-TR-992, MIT Laboratory for Computer Science, Dec. 2003.
[33]
R. Rodrigues, B. Liskov, and L. Shrira. The design of a robust peer-to-peer system. In Proc. of the 10th ACM SIGOPS Eur. Workshop, Sept. 2002.
[34]
S. Sankararaman, B.-G. Chun, C. Yatin, and S. Shenker. Key consistency in DHTs. Technical Report UCB/EECS-2005-21, UC Berkeley, 2005.
[35]
R. Schlichting and F. Schneider. Fail-stop processors: an approach to designing fault-tolerant computing systems. Trans. on Computer Systems, 1(3): 222--238, Aug. 1983.
[36]
F. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4): 299--319, Dec. 1990.
[37]
T. Shafaat, M. Moser, A. Ghodsi, T. Schütt, S. Haridi, and A. Reinefeld. On consistency of data in structured overlay networks. In Proceedings of the 3rd CoreGRID Integration Workshop, Apr. 2008.
[38]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1--10. IEEE, 2010.
[39]
R. Thomas. A solution to the concurrency control problem for multiple copy databases. In Proc. of COMPCON 78 Spring, pages 88--93, Washington, D.C., Feb. 1978. IEEE Computer Society.
[40]
R. van Renesse, C. Ho, and N. Schiper. Byzantine chain replication. In OPODIS, Rome, Italy, December 2012.
[41]
R. van Renesse and F. Schneider. Chain Replication for supporting high throughput and availability. In 6th Symp. on Operating Systems Design and Implementation (OSDI '04), Dec. 2004.
[42]
W. Vogels. Eventually consistent. ACM Queue, 6(6), Dec. 2008.
[43]
H. Yu and A. Vahdat. The cost and limits of availability for replicated services. In Proc. of the 18th ACM Symp. on Operating Systems Principles, Banff, Canada, Oct. 2001.

Cited By

View all
  • (2023)Decentralized E-Commerce Platform Implemented using Smart Contracts2023 3rd International Conference on Smart Data Intelligence (ICSMDI)10.1109/ICSMDI57622.2023.00013(23-27)Online publication date: Mar-2023
  • (2022)Exploiting Nil-external Interfaces for Fast Replicated StorageACM Transactions on Storage10.1145/354282118:3(1-35)Online publication date: 2-Sep-2022
  • (2021)Exploiting Nil-Externality for Fast Replicated StorageProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483543(440-456)Online publication date: 26-Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing
October 2013
427 pages
ISBN:9781450324281
DOI:10.1145/2523616
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SOCC '13
Sponsor:
SOCC '13: ACM Symposium on Cloud Computing
October 1 - 3, 2013
California, Santa Clara

Acceptance Rates

SOCC '13 Paper Acceptance Rate 23 of 114 submissions, 20%;
Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Decentralized E-Commerce Platform Implemented using Smart Contracts2023 3rd International Conference on Smart Data Intelligence (ICSMDI)10.1109/ICSMDI57622.2023.00013(23-27)Online publication date: Mar-2023
  • (2022)Exploiting Nil-external Interfaces for Fast Replicated StorageACM Transactions on Storage10.1145/354282118:3(1-35)Online publication date: 2-Sep-2022
  • (2021)Exploiting Nil-Externality for Fast Replicated StorageProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483543(440-456)Online publication date: 26-Oct-2021
  • (2021)A Low-Cost Multi-Failure Resilient Replication Scheme for High-Data Availability in Cloud StorageIEEE/ACM Transactions on Networking10.1109/TNET.2020.302781429:4(1436-1451)Online publication date: Aug-2021
  • (2020)Millions of tiny databasesProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388276(463-478)Online publication date: 25-Feb-2020
  • (2020)Performance Analysis of Distributed Processing System using Shard Selection Techniques on ElasticsearchProcedia Computer Science10.1016/j.procs.2020.03.373167(1626-1635)Online publication date: 2020
  • (2019)Popularity-Aware Multi-Failure Resilient and Cost-Effective Replication for High Data Durability in Cloud StorageIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.287338430:10(2355-2369)Online publication date: 1-Oct-2019
  • (2019)NormaChain: A Blockchain-Based Normalized Autonomous Transaction Settlement System for IoT-Based E-CommerceIEEE Internet of Things Journal10.1109/JIOT.2018.28776346:3(4680-4693)Online publication date: Jun-2019
  • (2016)RollupIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.249977227:9(2711-2724)Online publication date: 1-Sep-2016
  • (2016)Selective Data Replication for Online Social Networks with Distributed DatacentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.248526627:8(2377-2393)Online publication date: 1-Aug-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media