Abstract
Abstract We present an improvement to the Disk Paxos protocol by Gafni and Lamport which utilizes extended functionality and flexibility provided by Active Disks and supports unmediated concurrent data access by an unlimited number of processes. The solution facilitates coordination by an infinite number of clients using finite shared memory. It is based on a collection of read-modify-write objects with faults, that emulate a new, reliable shared memory abstraction called a ranked register. The required read-modify-write objects are readily available in Active Disks and in Object Storage Device controllers, making our solution suitable for state-of-the-art Storage Area Network (SAN) environments.
Similar content being viewed by others
References
Afek, Y., Greenberg, D.S., Merritt, M., Taubenfeld, G.: Computing with faulty shared objects. J. ACM 42(6), 1231-1274 (1995)
Acharya, A., Uysal, M., Saltz, J.: Active Disks: programming model, algorithms and evaluation. In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII) (1998)
Amiri, K., Gibson, G.A., Golding, R.: Highly concurrent shared storage. In: Proceedings of the International Conference on Distributed Computing Systems (ICDCS'2000) (2000)
Anderson, T., Dahlin, M., Neefe, J., Patterson, D., Roselli, D., Wang, R.: Serverless network file systems. ACM Trans. Comput. Syst. 14(1), 41-79 (1996)
Birman, I.K., Joseph, T.: Exploiting virtual synchrony in distributed systems. In: Proceedings of the 11th Annual Symposium on Operating Systems Principles, pp. 123-138 (1987)
Boichat, R., Dutta, P., Frolund, S., Guerraoui, R.: Deconstructing Paxos. Technical Report DSC ID:200106, Communication Systems Department (DSC), École Polytechnic Fédérale de Lausanne (EPFL) (2001). Available at http://dscwww.epfl.ch/EN/publications/documents/tr01\006.pdf
Boichat, R., Dutta, P., Frolund, S., Guerraoui, R.: Deconstructing paxos. ACM SIGACT News Distrib. Comput. Column. 34(1), 47-67 (2003)
Burns, R.: Data management in a distributed file system for Storage Area Networks. PhD Thesis, Department of Computer Science, University of California, Santa Cruz (2000)
Burns, J., Lynch, N.: Bounds on shared memory for mutual exclusion. Inform. Comput. 107(2), 171-184 (1993)
Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685-722 (1996)
Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225-267 (1996)
Chockler, G.V., Keidar, I., Vitenberg, R.: Group communication specifications: a comprehensive study. ACM Comput. Surv. 33(4), 1-43 (2001)
Chockler, G.V., Keidar, I., Malkhi, D.: Computing with Byzantine storage. In: Preparation.
Chockler, G., Malkhi, D., Dolev, D.: State-machine replication with infinitely many processes: a position paper. In: Proceedings of the International Workshop on Future Directions in Distributed Computing (FuDiCo), Bertinoro, Italy (2002)
Chockler, G., Malkhi, D., Reiter, M.K.: Backoff protocols for distributed mutual exclusion and ordering. In: Proceedings of the 21st International Conference on Distributed Computing Systems, pp. 11-20 (2001)
Chor, B., Dwork, C.: Randomization in Byzantine agreement. In: Micali, S. (ed.). Advances in Computing Research, Randomness in Computation, vol. 5, pp. 443-497. JAI Press (1989)
Cristian, F., Fetzer, C.: The timed asynchronous distributed system model. In: Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing (1998)
DePrisco, R., Lampson, B., Lynch, N.: Fundamental study: revisiting the Paxos algorithm. Theoret. Comput. Sci. 243, 35-91 (2000)
Dolev, D., Dwork, C., Stockmeyer, L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77-97 (1987)
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288-323 (1988)
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374-382 (1985)
Fekete, A., Lynch, N., Shvartsman, A.: Specifying and using a partitionable group communication service. ACM Trans. Comput. Syst. 19(2), 171-216 (2001)
Gafni, E., Lamport, L.: Disk Paxos. Distribut. Comput. 16(1), 1-20 (2003)
Gafni, E., Merritt, M., Taubenfeld, G.: The concurrency hierarchy, and algorithms for unbounded concurrency. In: Proceedings of the 20th ACM Symposium on Principles of Distributed Computing (PODC 2001) (2001)
Gibson, G.A., Nagle, D.F., Amiri, K., Butler, J., Chang, F.W., Gobioff, H., Hardin, C., Riedel, E., Rochberg, D., Zelenka, J.: A cost-effective high-bandwidth storage architecture. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (1998)
Gibson, G.A., Nagle, D.F., Amiri, K., Chang, F.W., Gobioff, H., Riedel, E., Rochberg, D., Zelenka, J.: Filesystems for network-attached secure disks. Technical Report CMU-CS-97-118 (1997)
Gobioff, H., Gibson, G.A., Tygar, D.: Security for network attached storage devices. Technical Report CMU-CS-97-185 (1997)
Hotz, S.,Van Meter, R., Finn, G.: Internet protocols for network-attached peripherals. In: Proceedings of the Sixth NASA Goddard Conference on Mass Storage Systems and Technologies in conjunction with 15th IEEE Symposium on Mass Storage Systems (1998)
Hartman, J.H., Murdock, I., Spalink, T.: The Swarm scalable storage system. In: Proceedings of the 19th IEEE International Conference on Distributed Computing Systems (ICDCS'99) (1999)
Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Languag. Syst. 11(1), 124-149 (1991)
Jayanti, P., Chandra, T., Toueg, S.: Fault-tolerant wait-free shared objects. J. ACM 45(3), 451-500 (1998)
Keidar, I., Dolev, D.: Totally ordered broadcast in the face of network partitions: exploiting group communication for replication in partitionable networks. In: Avresky, D. (ed.). Dependable Network Computing, Chap. 3. Kluwer Academic Publications (2000)
Lamport, L.: Time, clocks, and the ordering of events in distributed systems. Communi. ACM 21(7), 558-565 (1978)
Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133-169 (1998)
Lamport, L.: Paxos made simple. Distribut. Comput. Column. SIGACT News 32(4), 34-58 (2001)
Lamport, L., Shostak, R., Pease, M.: The Byzantine generals problem. ACM Trans. Program. Languag. Syst. 4(3), 382-401 (1982)
Lampson, B.W.: How to build a highly available system using consensus. In: Proceedings of the 10th International Workshop on Distributed Algorithms (WDAG), LNCS 1151. Springer-Verlag, Berlin (1996)
Lee, E.K., Thekkath, C.: Petal: distributed virtual disks. In: Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 84-92 (1996)
Lo, W.K., Hadzilacos, V.: Using failure detectors to solve consensus in asynchronous shared-memory systems. In: Proceedings of the 8th International Workshop on Distributed Algorithms (WDAG), LNCS 857, pp. 280-295. Springer-Verlag, Berlin (1994)
Loui, M.C., Abu-Amara, H.H.: Memory requirements for agreement among unreliable asynchronous processes, In: Franco, P.P. (ed.). Parallel and Distributed Computing: vol. 4 of Advances in Computing Research, pp. 163-183. JAI Press, Greenwich, Conn. (1987)
Malkhi, D.: From Byzantine agreement to practical survivability. In: The International Workshop on Self-Repairing and Self-Configurable Distributed Systems (RCDS'2002) Osaka, Japan (2002)
Malkhi, D., Reiter, M.K.: An architecture for survivable coordination in large-scale systems. IEEE Transact. Knowledge Data Eng. 12(2), 187-202 (2000)
Merritt, M., Taubenfeld, G.: Computing with infinitely many processes. In: Proceedings of 14th International Symposium on Distributed Computing (DISC'2000), pp. 164-178 (2000)
Mostéfaoui, A., Raynal, M.: Leader-based consensus. Parallel Process. Lett. 11(1), 95-107 (2001)
National Storage Industry Consortium. http://www.nsic.org/nasd
Powell, D. (ed.): Group communication. Commun. ACM 39(4), 50-97 (1996)
Riedel, E., Faloutsos, C., Gibson, G.A., Nagle, D.: Active disks for large-scale data processing. IEEE Comput. 68-74 (2001)
Skeen, M.D.: Nonblocking commit protocols. In: SIGMOD International Conference Management of Data (1981)
Skeen, M.D.: Crash recovery in a distributed database system. PhD Thesis, UC Berkeley (1982)
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299-319 (1990)
Thekkath, C., Mann, T., Lee, E.K.: Frangipani: a scalable distributed file system. In: Proceedings of the 16th ACM Symposium on Operating Systems Principles, pp. 224-237 (1997)
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this work appears in Proceedings of the 21st ACM Symposium on Principles of Distributed Computing (PODC02), August 2002.
Rights and permissions
About this article
Cite this article
Chockler, G., Malkhi, D. Active Disk Paxos with infinitely many processes. Distrib. Comput. 18, 73–84 (2005). https://doi.org/10.1007/s00446-005-0123-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00446-005-0123-x