Abstract
In this paper, we present Rambo, an algorithm for emulating a read/write distributed shared memory in a dynamic, rapidly changing environment. Rambo provides a highly reliable, highly available service, even as participants join, leave, and fail. In fact, the entire set of participants may change during an execution, as the initial devices depart and are replaced by a new set of devices. Even so, Rambo ensures that data stored in the distributed shared memory remains available and consistent. There are two basic techniques used by Rambo to tolerate dynamic changes. Over short intervals of time, replication suffices to provide fault-tolerance. While some devices may fail and leave, the data remains available at other replicas. Over longer intervals of time, Rambo copes with changing participants via reconfiguration, which incorporates newly joined devices while excluding devices that have departed or failed. The main novelty of Rambo lies in the combination of an efficient reconfiguration mechanism with a quorum-based replication strategy for read/write shared memory. The Rambo algorithm can tolerate a wide variety of aberrant behavior, including lost and delayed messages, participants with unsynchronized clocks, and, more generally, arbitrary asynchrony. Despite such behavior, Rambo guarantees that its data is stored consistency. We analyze the performance of Rambo during periods when the system is relatively well-behaved: messages are delivered in a timely fashion, reconfiguration is not too frequent, etc. We show that in these circumstances, read and write operations are efficient, completing in at most eight message delays.
Similar content being viewed by others
References
Abraham I., Malkhi D.: Probabilistic quorums for dynamic systems. Distrib. Comput. 18(2), 113–124 (2005)
Agrawal, D., El Abbadi, A.: Resilient logical structures for efficient management of replicated data. In: Proceedings of the International Conference on Very Large Data Bases, pp. 151–162 (1992)
Aguilera, M.K., Keidar, I., Malkhi, D., Shraer, A.: Dynamic atomic storage without consensus. In: Proceedings of the Symposium on Principles of Distributed Computing, pp. 17–25 (2009)
Albrecht, J.R., Saito, Y.: Rambo for Dummies. Technical Report HPL-2005-39, Hewlett-Packard (2005)
Alvisi L., Malkhi D., Pierce E.T., Reiter M.K.: Fault detection for Byzantine quorum systems. Trans. Parallel Distrib. Syst. 12(9), 996–1007 (2001)
Amir, Y., Dolev, D., Melliar-Smith, P.M., Moser, L.: Robust and Efficient Replication Using Group Communication. Technical Report 1994-20, Hebrew University (1994)
Amir, Y., Wool, A.: Evaluating quorum systems over the internet. In: Proceedings of the International Symposium on Fault-Tolerant Computing, pp. 26–35 (1996)
Attiya H., Bar-Noy A., Dolev D.: Sharing memory robustly in message-passing systems. J. ACM 42(1), 124–142 (1995)
Beal, J., Gilbert, S.: RamboNodes for the metropolitan ad hoc network. In: Workshop on Dependability Issues in Wireless Ad Hoc Networks and Sensor Networks (2004)
Bearden, M., Bianchini, R.P., Jr.: A fault-tolerant algorithm for decentralized on-line quorum adaptation. In: Proceedings of the International Symposium on Fault-Tolerant Computing Systems, pp. 262–271 (1998)
Bernstein P.A., Hadzilacos V., Goodman N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)
Chandra T.D., Toueg S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)
Charron-Bost, B., Schiper, A.: Improving fast Paxos: being optimistic with no overhead. In: Proceedings of the Pacific Rim International Symposium on Dependable Computing, pp. 287–295 (2006)
Chockler, G., Gilbert, S., Gramoli, V., Musial, P.M., Shvartsman, A.A.: Reconfigurable distributed storage for dynamic networks. In: Proceedings of the International Conference on Principles of Distributed Systems, pp. 214–219 (2005)
Davidson S.B., Garcia-Molina H., Skeen D.: Consistency in partitioned networks. ACM Comput. Surv. 17(3), 341–370 (1985)
Dolev S., Gilbert S., Lynch N.A., Shvartsman A.A., Welch J.L.: Geoquorums: implementing atomic memory in mobile ad hoc networks. Distrib. Comput. 18(2), 125–155 (2005)
El Abbadi, A., Skeen, D., Cristian, F.: An efficient fault-tolerant protocol for replicated data management. In: Proceedings of the Symposium on Principles of Databases, pp. 215–228 (1985)
El Abbadi A., Toueg S.: Maintaining availability in partitioned replicated databases. Trans. Database Syst. 14(2), 264–290 (1989)
Englert, B., Shvartsman, A.A.: Graceful quorum reconfiguration in a robust emulation of shared memory. In: Proceedings of the International Conference on Distributed Computer Systems, pp. 454–463 (2000)
Fekete A., Lynch N.A., Shvartsman A.A.: Specifying and using a partitionable group communication service. Trans. Comput. Syst. 19(2), 171–216 (2001)
Fischer M.J., Lynch N.A., Paterson M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)
Garcia-Molina H., Barbara D.: How to assign votes in a distributed system. J. ACM 32(4), 841–860 (1985)
Georgiou, C., Musial, P.M., Shvartsman, A.A.: Developing a consistent domain-oriented distributed object service. In: Proceedings of the International Symposium on Network Computing and Applications, pp. 149–158 (2005)
Georgiou C., Musial P.M., Shvartsman A.A.: Long-lived Rambo: Trading knowledge for communication. Theor. Comput. Sci. 383(1), 59–85 (2007)
Gifford, D.K.: Weighted voting for replicated data. In: Proceedings of the Symposium on Operating Systems Principles, pp. 150–162 (1979)
Gilbert, S.: Rambo II: Rapidly Reconfigurable Atomic Memory for Dynamic Networks. Master’s thesis, MIT (2003)
Gilbert, S., Lynch, N.A., Shvartsman, A.A.: Rambo II: Rapidly reconfigurable atomic memory for dynamic networks. In: Proceedings of the International Conference on Dependable Systems and Networks, pp. 259–268 (2003)
Goldman K., Lynch N.A.: Quorum consensus in nested transaction systems. Trans. Database Syst. 19(4), 537–585 (1994)
Gramoli, V.: Rambo III: Speeding-up the Reconfiguration of an Atomic Memory Service in Dynamic Distributed System. Master’s thesis, Université Paris Sud, Orsay (2004)
Gramoli, V., Musial, P.M., Shvartsman, A.A.: Operation liveness and gossip management in a dynamic distributed atomic data service. In: Proceedings of the International Conference on Parallel and Distributed Computing Systems, pp. 206–211 (2005)
Herlihy, M.: Replication Methods for Abstract Data Types. PhD thesis, Massachusettes Institute of Technology (1984)
Herlihy M.: Dynamic quorum adjustment for partitioned data. Trans. Database Syst. 12(2), 170–194 (1987)
Jajodia S., Mutchler D.: Dynamic voting algorithms for maintaining the consistency of a replicated database. Trans. Database Syst. 15(2), 230–280 (1990)
Kaynar, D.K., Lynch, N.A., Segala, R., Vaandrager, F.: The Theory of Timed I/O Automata. Technical Report MIT-LCS-TR-917a, MIT (2004)
Keidar, I.: A highly Available Paradigm for Consistent Object Replication. Master’s thesis, Hebrew University, Jerusalem (1994)
Keidar, I., Dolev, D.: Efficient message ordering in dynamic networks. In: Proceedings of the Symposium on Principles of Distributed Domputing, pp. 68–76 (1996)
Konwar, K.M., Musial, P.M., Nicolaou, N.C., Shvartsman, A.A.: Implementing atomic data through indirect learning in dynamic networks. In: Proceedings of the International Symposium on Network Computing and Applications, pp. 223–230 (2007)
Lamport L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)
Lamport L.: The part-time parliament. Trans. Comput. Syst. 16(2), 133–169 (1998)
Lamport, L.: Fast Paxos. Technical Report MSR-TR-2005-12, Microsoft (2005)
Lamport L.: Fast Paxos. Distrib. Comput. 19(2), 79–103 (2006)
Liu, M., Agrawal, D., El Abaddi, A.: On the implementation of the quorum consensus protocol. In: Proceedings of the International Conference on Parallel and Distributed Computing Systems, pp. 318–325 (1995)
Lotem, E.Y., Keidar, I., Dolev, D.: Dynamic voting for consistent primary components. In: Proceedings of the Symposium on Principles of Distributed Computing pp. 63–71 (1997)
Lynch N.A.: Distributed Algorithms. Morgan Kaufman, San Francisco (1996)
Lynch, N.A., Shvartsman, A.A.: Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts. In: Proceedings of the International Symposium on Fault-Tolerant Computing, pp. 272–281 (1997)
Lynch, N.A., Shvartsman, A.A.: Rambo: A reconfigurable atomic memory service for dynamic networks. In: Proceedings of the International Symposium on Distributed Computing, pp. 173–190 (2002)
Malkhi, D., Reiter, M.K.: Byzantine quorum systems. In: Proceedings of the Symposium on Theory of Computing, pp. 569–578 (1997)
Musial, P.M.: From High Level Specification to Executable Code: Specification, Refinement, and Implementation of a Survivable and Consistent Data Service for Dynamic Networks. PhD thesis, University of Connecticut, Storrs (2007)
Musial, P.M., Shvartsman, A.A.: Implementing a reconfigurable atomic memory service for dynamic networks. In: Proceedings of the International Parallel and Distributed Processing Symposium, p. 208b (2004)
Muthitacharoen, A., Gilbert, S., Morris, R.: Etna: A Fault-Tolerant Algorithm for Atomic Mutable DHT Data. Technical Report MIT-LCS-TR-993, MIT (2005)
Naor, M., Wieder, U.: Scalable and dynamic quorum systems. In: Proceedings of the Symposium on Principles of Distributed Computing, pp. 114–122 (2003)
Naor M., Wool A.: The load, capacity, and availability of quorum systems. J. Comput. 27(2), 423–447 (1998)
Peleg D., Wool A.: The availability of quorum systems. Inf. Comput. 123(2), 210–223 (1995)
Peleg, D., Wool, A.: How to be an efficient snoop, or the probe complexity of quorum systems. In: Proceedings of the Symposium on Principles of Distributed Computing, pp. 290–299 (1996)
De Prisco, R., Fekete, A., Lynch, N.A., Shvartsman, A.A.: A dynamic primary configuration group communication service. In: Proceedings of the International Symposium on Distributed Computing, pp. 64–78 (1999)
De Priso R., Lampson B., Lynch N.: Revisiting the Paxos algorithm. Theor. Comput. Sci. 243(1–2), 35–91 (2000)
Rangarajan, S., Tripathi, S.: A robust distributed mutual exclusion algorithm. In: Proceedings of the International Workshop on Distributed Algorithms, pp. 295–308 (1991)
Saito, Y., Frølund, S., Veitch, A.C., Merchant, A., Spence, S.: FAB: building distributed enterprise disk arrays from commodity components. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 48–58 (2004)
Sanders B.A.: The information structure of distributed mutual exclusion algorithms. Trans. Comput. Syst. 5(3), 284–299 (1987)
Shraer, A., Martin, J.-P., Malkhi, D., Keidar, I.: Data-centric reconfiguration with network attached disks. In: Proceedings of LADIS (2010)
Upfal E., Wigderson A.: How to share memory in a distributed system. J. ACM 34(1), 116–127 (1987)
Author information
Authors and Affiliations
Corresponding author
Additional information
Preliminary versions of this work appeared as the following extended abstracts: (a) Nancy A. Lynch, Alexander A. Shvartsman: RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks. DISC 2002:173–190, and (b) Seth Gilbert, Nancy A. Lynch, Alexander A. Shvartsman: RAMBO II: Rapidly Reconfigurable Atomic Memory for Dynamic Networks. DSN 2003:259–268. This work was supported in part by the NSF ITR Grant CCR-0121277. The work of the second author was additionally supported by the NSF Grant 9804665, and the work of the third author was additionally supported in part by the NSF Grants 9984778, 9988304, and 0311368.
Rights and permissions
About this article
Cite this article
Gilbert, S., Lynch, N.A. & Shvartsman, A.A. Rambo: a robust, reconfigurable atomic memory service for dynamic networks. Distrib. Comput. 23, 225–272 (2010). https://doi.org/10.1007/s00446-010-0117-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00446-010-0117-1