Active Disk Paxos with infinitely many processes

Chockler, Gregory; Malkhi, Dahlia

doi:10.1007/s00446-005-0123-x

Active Disk Paxos with infinitely many processes

Special Issue PODC
Published: 07 April 2005

Volume 18, pages 73–84, (2005)
Cite this article

Distributed Computing Aims and scope Submit manuscript

Gregory Chockler¹ &
Dahlia Malkhi¹

194 Accesses
22 Citations
6 Altmetric
Explore all metrics

Abstract

Abstract We present an improvement to the Disk Paxos protocol by Gafni and Lamport which utilizes extended functionality and flexibility provided by Active Disks and supports unmediated concurrent data access by an unlimited number of processes. The solution facilitates coordination by an infinite number of clients using finite shared memory. It is based on a collection of read-modify-write objects with faults, that emulate a new, reliable shared memory abstraction called a ranked register. The required read-modify-write objects are readily available in Active Disks and in Object Storage Device controllers, making our solution suitable for state-of-the-art Storage Area Network (SAN) environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AutoBahn: a concurrency control framework for non-volatile file buffer

Article 29 July 2019

Separating Data and Control: Asynchronous BFT Storage with 2t + 1 Data Replicas

Erasure-Coded Byzantine Storage with Separate Metadata

References

Afek, Y., Greenberg, D.S., Merritt, M., Taubenfeld, G.: Computing with faulty shared objects. J. ACM 42(6), 1231-1274 (1995)
Article MathSciNet Google Scholar
Acharya, A., Uysal, M., Saltz, J.: Active Disks: programming model, algorithms and evaluation. In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII) (1998)
Amiri, K., Gibson, G.A., Golding, R.: Highly concurrent shared storage. In: Proceedings of the International Conference on Distributed Computing Systems (ICDCS'2000) (2000)
Anderson, T., Dahlin, M., Neefe, J., Patterson, D., Roselli, D., Wang, R.: Serverless network file systems. ACM Trans. Comput. Syst. 14(1), 41-79 (1996)
Article Google Scholar
Birman, I.K., Joseph, T.: Exploiting virtual synchrony in distributed systems. In: Proceedings of the 11th Annual Symposium on Operating Systems Principles, pp. 123-138 (1987)
Boichat, R., Dutta, P., Frolund, S., Guerraoui, R.: Deconstructing Paxos. Technical Report DSC ID:200106, Communication Systems Department (DSC), École Polytechnic Fédérale de Lausanne (EPFL) (2001). Available at http://dscwww.epfl.ch/EN/publications/documents/tr01\006.pdf
Google Scholar
Boichat, R., Dutta, P., Frolund, S., Guerraoui, R.: Deconstructing paxos. ACM SIGACT News Distrib. Comput. Column. 34(1), 47-67 (2003)
Google Scholar
Burns, R.: Data management in a distributed file system for Storage Area Networks. PhD Thesis, Department of Computer Science, University of California, Santa Cruz (2000)
Google Scholar
Burns, J., Lynch, N.: Bounds on shared memory for mutual exclusion. Inform. Comput. 107(2), 171-184 (1993)
MathSciNet Google Scholar
Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685-722 (1996)
Article MathSciNet Google Scholar
Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225-267 (1996)
Article MathSciNet Google Scholar
Chockler, G.V., Keidar, I., Vitenberg, R.: Group communication specifications: a comprehensive study. ACM Comput. Surv. 33(4), 1-43 (2001)
Article Google Scholar
Chockler, G.V., Keidar, I., Malkhi, D.: Computing with Byzantine storage. In: Preparation.
Chockler, G., Malkhi, D., Dolev, D.: State-machine replication with infinitely many processes: a position paper. In: Proceedings of the International Workshop on Future Directions in Distributed Computing (FuDiCo), Bertinoro, Italy (2002)
Google Scholar
Chockler, G., Malkhi, D., Reiter, M.K.: Backoff protocols for distributed mutual exclusion and ordering. In: Proceedings of the 21st International Conference on Distributed Computing Systems, pp. 11-20 (2001)
Chor, B., Dwork, C.: Randomization in Byzantine agreement. In: Micali, S. (ed.). Advances in Computing Research, Randomness in Computation, vol. 5, pp. 443-497. JAI Press (1989)
Cristian, F., Fetzer, C.: The timed asynchronous distributed system model. In: Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing (1998)
DePrisco, R., Lampson, B., Lynch, N.: Fundamental study: revisiting the Paxos algorithm. Theoret. Comput. Sci. 243, 35-91 (2000)
MathSciNet Google Scholar
Dolev, D., Dwork, C., Stockmeyer, L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77-97 (1987)
Article MathSciNet Google Scholar
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288-323 (1988)
Article MathSciNet Google Scholar
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374-382 (1985)
Article MathSciNet Google Scholar
Fekete, A., Lynch, N., Shvartsman, A.: Specifying and using a partitionable group communication service. ACM Trans. Comput. Syst. 19(2), 171-216 (2001)
Article Google Scholar
Gafni, E., Lamport, L.: Disk Paxos. Distribut. Comput. 16(1), 1-20 (2003)
Google Scholar
Gafni, E., Merritt, M., Taubenfeld, G.: The concurrency hierarchy, and algorithms for unbounded concurrency. In: Proceedings of the 20th ACM Symposium on Principles of Distributed Computing (PODC 2001) (2001)
Gibson, G.A., Nagle, D.F., Amiri, K., Butler, J., Chang, F.W., Gobioff, H., Hardin, C., Riedel, E., Rochberg, D., Zelenka, J.: A cost-effective high-bandwidth storage architecture. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (1998)
Gibson, G.A., Nagle, D.F., Amiri, K., Chang, F.W., Gobioff, H., Riedel, E., Rochberg, D., Zelenka, J.: Filesystems for network-attached secure disks. Technical Report CMU-CS-97-118 (1997)
Gobioff, H., Gibson, G.A., Tygar, D.: Security for network attached storage devices. Technical Report CMU-CS-97-185 (1997)
Hotz, S.,Van Meter, R., Finn, G.: Internet protocols for network-attached peripherals. In: Proceedings of the Sixth NASA Goddard Conference on Mass Storage Systems and Technologies in conjunction with 15th IEEE Symposium on Mass Storage Systems (1998)
Hartman, J.H., Murdock, I., Spalink, T.: The Swarm scalable storage system. In: Proceedings of the 19th IEEE International Conference on Distributed Computing Systems (ICDCS'99) (1999)
Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Languag. Syst. 11(1), 124-149 (1991)
Google Scholar
Jayanti, P., Chandra, T., Toueg, S.: Fault-tolerant wait-free shared objects. J. ACM 45(3), 451-500 (1998)
Article MathSciNet Google Scholar
Keidar, I., Dolev, D.: Totally ordered broadcast in the face of network partitions: exploiting group communication for replication in partitionable networks. In: Avresky, D. (ed.). Dependable Network Computing, Chap. 3. Kluwer Academic Publications (2000)
Lamport, L.: Time, clocks, and the ordering of events in distributed systems. Communi. ACM 21(7), 558-565 (1978)
MATH Google Scholar
Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133-169 (1998)
Article Google Scholar
Lamport, L.: Paxos made simple. Distribut. Comput. Column. SIGACT News 32(4), 34-58 (2001)
Google Scholar
Lamport, L., Shostak, R., Pease, M.: The Byzantine generals problem. ACM Trans. Program. Languag. Syst. 4(3), 382-401 (1982)
Google Scholar
Lampson, B.W.: How to build a highly available system using consensus. In: Proceedings of the 10th International Workshop on Distributed Algorithms (WDAG), LNCS 1151. Springer-Verlag, Berlin (1996)
Google Scholar
Lee, E.K., Thekkath, C.: Petal: distributed virtual disks. In: Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 84-92 (1996)
Lo, W.K., Hadzilacos, V.: Using failure detectors to solve consensus in asynchronous shared-memory systems. In: Proceedings of the 8th International Workshop on Distributed Algorithms (WDAG), LNCS 857, pp. 280-295. Springer-Verlag, Berlin (1994)
Google Scholar
Loui, M.C., Abu-Amara, H.H.: Memory requirements for agreement among unreliable asynchronous processes, In: Franco, P.P. (ed.). Parallel and Distributed Computing: vol. 4 of Advances in Computing Research, pp. 163-183. JAI Press, Greenwich, Conn. (1987)
Google Scholar
Malkhi, D.: From Byzantine agreement to practical survivability. In: The International Workshop on Self-Repairing and Self-Configurable Distributed Systems (RCDS'2002) Osaka, Japan (2002)
Malkhi, D., Reiter, M.K.: An architecture for survivable coordination in large-scale systems. IEEE Transact. Knowledge Data Eng. 12(2), 187-202 (2000)
Google Scholar
Merritt, M., Taubenfeld, G.: Computing with infinitely many processes. In: Proceedings of 14th International Symposium on Distributed Computing (DISC'2000), pp. 164-178 (2000)
Mostéfaoui, A., Raynal, M.: Leader-based consensus. Parallel Process. Lett. 11(1), 95-107 (2001)
MathSciNet Google Scholar
National Storage Industry Consortium. http://www.nsic.org/nasd
Powell, D. (ed.): Group communication. Commun. ACM 39(4), 50-97 (1996)
Riedel, E., Faloutsos, C., Gibson, G.A., Nagle, D.: Active disks for large-scale data processing. IEEE Comput. 68-74 (2001)
Skeen, M.D.: Nonblocking commit protocols. In: SIGMOD International Conference Management of Data (1981)
Skeen, M.D.: Crash recovery in a distributed database system. PhD Thesis, UC Berkeley (1982)
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299-319 (1990)
Article Google Scholar
Thekkath, C., Mann, T., Lee, E.K.: Frangipani: a scalable distributed file system. In: Proceedings of the 16th ACM Symposium on Operating Systems Principles, pp. 224-237 (1997)

Download references

Author information

Authors and Affiliations

MIT Computer Science and Artificial Intelligence Laboratory, The Stata Center, Building 32, 32 Vassar St.,32-G696, Cambridge, MA, 02139, USA
Gregory Chockler & Dahlia Malkhi

Authors

Gregory Chockler
View author publications
You can also search for this author in PubMed Google Scholar
Dahlia Malkhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dahlia Malkhi.

Additional information

A preliminary version of this work appears in Proceedings of the 21st ACM Symposium on Principles of Distributed Computing (PODC02), August 2002.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chockler, G., Malkhi, D. Active Disk Paxos with infinitely many processes. Distrib. Comput. 18, 73–84 (2005). https://doi.org/10.1007/s00446-005-0123-x

Download citation

Received: 01 October 2002
Revised: 01 June 2003
Accepted: 01 September 2004
Published: 07 April 2005
Issue Date: July 2005
DOI: https://doi.org/10.1007/s00446-005-0123-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active Disk Paxos with infinitely many processes

Abstract

Access this article

Similar content being viewed by others

AutoBahn: a concurrency control framework for non-volatile file buffer

Separating Data and Control: Asynchronous BFT Storage with 2t + 1 Data Replicas

Erasure-Coded Byzantine Storage with Separate Metadata

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Active Disk Paxos with infinitely many processes

Abstract

Access this article

Similar content being viewed by others

AutoBahn: a concurrency control framework for non-volatile file buffer

Separating Data and Control: Asynchronous BFT Storage with 2t + 1 Data Replicas

Erasure-Coded Byzantine Storage with Separate Metadata

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation