Skip to main content
Log in

Active Disk Paxos with infinitely many processes

  • Special Issue PODC
  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Abstract We present an improvement to the Disk Paxos protocol by Gafni and Lamport which utilizes extended functionality and flexibility provided by Active Disks and supports unmediated concurrent data access by an unlimited number of processes. The solution facilitates coordination by an infinite number of clients using finite shared memory. It is based on a collection of read-modify-write objects with faults, that emulate a new, reliable shared memory abstraction called a ranked register. The required read-modify-write objects are readily available in Active Disks and in Object Storage Device controllers, making our solution suitable for state-of-the-art Storage Area Network (SAN) environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Afek, Y., Greenberg, D.S., Merritt, M., Taubenfeld, G.: Computing with faulty shared objects. J. ACM 42(6), 1231-1274 (1995)

    Article  MathSciNet  Google Scholar 

  2. Acharya, A., Uysal, M., Saltz, J.: Active Disks: programming model, algorithms and evaluation. In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII) (1998)

  3. Amiri, K., Gibson, G.A., Golding, R.: Highly concurrent shared storage. In: Proceedings of the International Conference on Distributed Computing Systems (ICDCS'2000) (2000)

  4. Anderson, T., Dahlin, M., Neefe, J., Patterson, D., Roselli, D., Wang, R.: Serverless network file systems. ACM Trans. Comput. Syst. 14(1), 41-79 (1996)

    Article  Google Scholar 

  5. Birman, I.K., Joseph, T.: Exploiting virtual synchrony in distributed systems. In: Proceedings of the 11th Annual Symposium on Operating Systems Principles, pp. 123-138 (1987)

  6. Boichat, R., Dutta, P., Frolund, S., Guerraoui, R.: Deconstructing Paxos. Technical Report DSC ID:200106, Communication Systems Department (DSC), École Polytechnic Fédérale de Lausanne (EPFL) (2001). Available at http://dscwww.epfl.ch/EN/publications/documents/tr01\006.pdf

    Google Scholar 

  7. Boichat, R., Dutta, P., Frolund, S., Guerraoui, R.: Deconstructing paxos. ACM SIGACT News Distrib. Comput. Column. 34(1), 47-67 (2003)

    Google Scholar 

  8. Burns, R.: Data management in a distributed file system for Storage Area Networks. PhD Thesis, Department of Computer Science, University of California, Santa Cruz (2000)

    Google Scholar 

  9. Burns, J., Lynch, N.: Bounds on shared memory for mutual exclusion. Inform. Comput. 107(2), 171-184 (1993)

    MathSciNet  Google Scholar 

  10. Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685-722 (1996)

    Article  MathSciNet  Google Scholar 

  11. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225-267 (1996)

    Article  MathSciNet  Google Scholar 

  12. Chockler, G.V., Keidar, I., Vitenberg, R.: Group communication specifications: a comprehensive study. ACM Comput. Surv. 33(4), 1-43 (2001)

    Article  Google Scholar 

  13. Chockler, G.V., Keidar, I., Malkhi, D.: Computing with Byzantine storage. In: Preparation.

  14. Chockler, G., Malkhi, D., Dolev, D.: State-machine replication with infinitely many processes: a position paper. In: Proceedings of the International Workshop on Future Directions in Distributed Computing (FuDiCo), Bertinoro, Italy (2002)

    Google Scholar 

  15. Chockler, G., Malkhi, D., Reiter, M.K.: Backoff protocols for distributed mutual exclusion and ordering. In: Proceedings of the 21st International Conference on Distributed Computing Systems, pp. 11-20 (2001)

  16. Chor, B., Dwork, C.: Randomization in Byzantine agreement. In: Micali, S. (ed.). Advances in Computing Research, Randomness in Computation, vol. 5, pp. 443-497. JAI Press (1989)

  17. Cristian, F., Fetzer, C.: The timed asynchronous distributed system model. In: Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing (1998)

  18. DePrisco, R., Lampson, B., Lynch, N.: Fundamental study: revisiting the Paxos algorithm. Theoret. Comput. Sci. 243, 35-91 (2000)

    MathSciNet  Google Scholar 

  19. Dolev, D., Dwork, C., Stockmeyer, L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77-97 (1987)

    Article  MathSciNet  Google Scholar 

  20. Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288-323 (1988)

    Article  MathSciNet  Google Scholar 

  21. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374-382 (1985)

    Article  MathSciNet  Google Scholar 

  22. Fekete, A., Lynch, N., Shvartsman, A.: Specifying and using a partitionable group communication service. ACM Trans. Comput. Syst. 19(2), 171-216 (2001)

    Article  Google Scholar 

  23. Gafni, E., Lamport, L.: Disk Paxos. Distribut. Comput. 16(1), 1-20 (2003)

    Google Scholar 

  24. Gafni, E., Merritt, M., Taubenfeld, G.: The concurrency hierarchy, and algorithms for unbounded concurrency. In: Proceedings of the 20th ACM Symposium on Principles of Distributed Computing (PODC 2001) (2001)

  25. Gibson, G.A., Nagle, D.F., Amiri, K., Butler, J., Chang, F.W., Gobioff, H., Hardin, C., Riedel, E., Rochberg, D., Zelenka, J.: A cost-effective high-bandwidth storage architecture. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (1998)

  26. Gibson, G.A., Nagle, D.F., Amiri, K., Chang, F.W., Gobioff, H., Riedel, E., Rochberg, D., Zelenka, J.: Filesystems for network-attached secure disks. Technical Report CMU-CS-97-118 (1997)

  27. Gobioff, H., Gibson, G.A., Tygar, D.: Security for network attached storage devices. Technical Report CMU-CS-97-185 (1997)

  28. Hotz, S.,Van Meter, R., Finn, G.: Internet protocols for network-attached peripherals. In: Proceedings of the Sixth NASA Goddard Conference on Mass Storage Systems and Technologies in conjunction with 15th IEEE Symposium on Mass Storage Systems (1998)

  29. Hartman, J.H., Murdock, I., Spalink, T.: The Swarm scalable storage system. In: Proceedings of the 19th IEEE International Conference on Distributed Computing Systems (ICDCS'99) (1999)

  30. Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Languag. Syst. 11(1), 124-149 (1991)

    Google Scholar 

  31. Jayanti, P., Chandra, T., Toueg, S.: Fault-tolerant wait-free shared objects. J. ACM 45(3), 451-500 (1998)

    Article  MathSciNet  Google Scholar 

  32. Keidar, I., Dolev, D.: Totally ordered broadcast in the face of network partitions: exploiting group communication for replication in partitionable networks. In: Avresky, D. (ed.). Dependable Network Computing, Chap. 3. Kluwer Academic Publications (2000)

  33. Lamport, L.: Time, clocks, and the ordering of events in distributed systems. Communi. ACM 21(7), 558-565 (1978)

    MATH  Google Scholar 

  34. Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133-169 (1998)

    Article  Google Scholar 

  35. Lamport, L.: Paxos made simple. Distribut. Comput. Column. SIGACT News 32(4), 34-58 (2001)

    Google Scholar 

  36. Lamport, L., Shostak, R., Pease, M.: The Byzantine generals problem. ACM Trans. Program. Languag. Syst. 4(3), 382-401 (1982)

    Google Scholar 

  37. Lampson, B.W.: How to build a highly available system using consensus. In: Proceedings of the 10th International Workshop on Distributed Algorithms (WDAG), LNCS 1151. Springer-Verlag, Berlin (1996)

    Google Scholar 

  38. Lee, E.K., Thekkath, C.: Petal: distributed virtual disks. In: Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 84-92 (1996)

  39. Lo, W.K., Hadzilacos, V.: Using failure detectors to solve consensus in asynchronous shared-memory systems. In: Proceedings of the 8th International Workshop on Distributed Algorithms (WDAG), LNCS 857, pp. 280-295. Springer-Verlag, Berlin (1994)

    Google Scholar 

  40. Loui, M.C., Abu-Amara, H.H.: Memory requirements for agreement among unreliable asynchronous processes, In: Franco, P.P. (ed.). Parallel and Distributed Computing: vol. 4 of Advances in Computing Research, pp. 163-183. JAI Press, Greenwich, Conn. (1987)

    Google Scholar 

  41. Malkhi, D.: From Byzantine agreement to practical survivability. In: The International Workshop on Self-Repairing and Self-Configurable Distributed Systems (RCDS'2002) Osaka, Japan (2002)

  42. Malkhi, D., Reiter, M.K.: An architecture for survivable coordination in large-scale systems. IEEE Transact. Knowledge Data Eng. 12(2), 187-202 (2000)

    Google Scholar 

  43. Merritt, M., Taubenfeld, G.: Computing with infinitely many processes. In: Proceedings of 14th International Symposium on Distributed Computing (DISC'2000), pp. 164-178 (2000)

  44. Mostéfaoui, A., Raynal, M.: Leader-based consensus. Parallel Process. Lett. 11(1), 95-107 (2001)

    MathSciNet  Google Scholar 

  45. National Storage Industry Consortium. http://www.nsic.org/nasd

  46. Powell, D. (ed.): Group communication. Commun. ACM 39(4), 50-97 (1996)

  47. Riedel, E., Faloutsos, C., Gibson, G.A., Nagle, D.: Active disks for large-scale data processing. IEEE Comput. 68-74 (2001)

  48. Skeen, M.D.: Nonblocking commit protocols. In: SIGMOD International Conference Management of Data (1981)

  49. Skeen, M.D.: Crash recovery in a distributed database system. PhD Thesis, UC Berkeley (1982)

  50. Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299-319 (1990)

    Article  Google Scholar 

  51. Thekkath, C., Mann, T., Lee, E.K.: Frangipani: a scalable distributed file system. In: Proceedings of the 16th ACM Symposium on Operating Systems Principles, pp. 224-237 (1997)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dahlia Malkhi.

Additional information

A preliminary version of this work appears in Proceedings of the 21st ACM Symposium on Principles of Distributed Computing (PODC02), August 2002.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chockler, G., Malkhi, D. Active Disk Paxos with infinitely many processes. Distrib. Comput. 18, 73–84 (2005). https://doi.org/10.1007/s00446-005-0123-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-005-0123-x

Keywords

Navigation