ABSTRACT
Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing [14]. This work introduces consistent quorums as a solution for achieving atomic consistency. We present the design and implementation of CATS, a key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Our system evaluation shows that consistency can be achieved with practical performance and modest overhead: 5% decrease in throughput for read-intensive workloads, and 25% throughput loss for write-intensive workloads. CATS delivers submillisecond operation latencies under light load, single-digit millisecond operation latencies at 50% load, and it sustains a throughput of one thousand operations per second, per server, while scaling linearly to hundreds of servers.
- M. K. Aguilera, I. Keidar, D. Malkhi, and A. Shraer. Dynamic atomic storage without consensus. In Proceedings of the 28th ACM symposium on Principles of distributed computing, PODC '09, pages 17--25, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- C. Arad. Programming Model and Protocols for Reconfigurable Distributed Systems. PhD thesis, KTH -- Royal Institute of Technology, Stockholm, Sweden, June 2013.Google Scholar
- C. Arad, T. M. Shafaat, and S. Haridi. CATS: Linearizability and partition tolerance in scalable and self-organizing key-value stores. Technical Report T2012:04, Swedish Institute of Computer Science, 2012.Google Scholar
- M. Burrows. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th symposium on Operating systems design and implementation, OSDI '06, pages 335--350, Berkeley, CA, USA, 2006. USENIX. Google ScholarDigital Library
- T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: an engineering perspective. In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing, PODC '07, pages 398--407, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2): 4:1--4:26, June 2008. Google ScholarDigital Library
- G. V. Chockler, S. Gilbert, V. Gramoli, P. M. Musial, and A. A. Shvartsman. Reconfigurable distributed storage for dynamic networks. J. Parallel Distrib. Comput., 69: 100--116, January 2009. Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of Twenty-first Symposium on Operating Systems Principles, SOSP '07, pages 205--220, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- S. Gilbert, N. A. Lynch, and A. A. Shvartsman. Rambo II: rapidly reconfigurable atomic memory for dynamic networks. In International Conference on Dependable Systems and Networks, DSN '03, pages 259--268, 2003.Google ScholarCross Ref
- L. Glendenning, I. Beschastnikh, A. Krishnamurthy, and T. Anderson. Scalable consistency in Scatter. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 15--28, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX Annual Technical Conference, ATC '10, pages 1--14, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarDigital Library
- HBase. http://hbase.apache.org/, 2012.Google Scholar
- F. P. Junqueira, B. C. Reed, and M. Serafini. Zab: High-performance broadcast for primary-backup systems. In Proceedings of the 41st International Conference on Dependable Systems & Networks, DSN '11, pages 245--256, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarDigital Library
- D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, STOC '97, pages 654--663, New York, NY, USA, 1997. ACM. Google ScholarDigital Library
- A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2): 35--40, Apr. 2010. Google ScholarDigital Library
- J. R. Lorch, A. Adya, W. J. Bolosky, R. Chaiken, J. R. Douceur, and J. Howell. The SMART way to migrate replicated stateful services. In Proceedings of the 1st EuroSys European Conference on Computer Systems, EuroSys '06, pages 103--115, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- N. A. Lynch and A. A. Shvartsman. RAMBO: A reconfigurable atomic memory service for dynamic networks. In Proceedings of the 16th International Conference on Distributed Computing, DISC '02, pages 173--190, London, UK, 2002. Springer-Verlag. Google ScholarDigital Library
- MongoDB. http://www.mongodb.org/, 2012.Google Scholar
- J. Rao, E. J. Shekita, and S. Tata. Using Paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow., 4: 243--254, Jan. 2011. Google ScholarDigital Library
- T. M. Shafaat. Partition Tolerance and Data Consistency in Structured Overlay Networks. PhD thesis, KTH -- Royal Institute of Technology, Stockholm, Sweden, June 2013.Google Scholar
- I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, SIGCOMM '01, pages 149--160, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- D. Terry, M. Theimer, K. Petersen, A. Demers, M. Spreitzer, and C. Hauser. Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings of the fifteenth ACM symposium on Operating systems principles, SOSP '95, pages 172--182, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
Index Terms
- CATS: a linearizable and self-organizing key-value store
Recommendations
Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory
We propose an axiomatic generic framework for modelling weak memory. We show how to instantiate this framework for Sequential Consistency (SC), Total Store Order (TSO), C++ restricted to release-acquire atomics, and Power. For Power, we compare our model ...
HPDA: A hybrid parity-based disk array for enhanced performance and reliability
Flash-based Solid State Drive (SSD) has been productively shipped and deployed in large scale storage systems. However, a single flash-based SSD cannot satisfy the capacity, performance and reliability requirements of the modern storage systems that ...
Higher reliability redundant disk arrays: Organization, operation, and coding
Parity is a popular form of data protection in redundant arrays of inexpensive/independent disks (RAID). RAID5 dedicates one out of N disks to parity to mask single disk failures, that is, the contents of a block on a failed disk can be reconstructed by ...
Comments