Summary.
This work considers the problem of performing t tasks in a distributed system of p fault-prone processors. This problem, called do-all herein, was introduced by Dwork, Halpern and Waarts. The solutions presented here are for the model of computation that abstracts a synchronous message-passing distributed system with processor stop-failures and restarts. We present two new algorithms based on a new aggressive coordination paradigm by which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of \(f < p\) stop-failures and does not allow restarts. Its available processor steps (work) complexity is \(S = {\cal O}((t + p\log p/\log\log p) \cdot\log f)\) and its message complexity is \(M = {\cal O}(t + p\log p/\log\log p + fp)\). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for p =t and largef, it achieves better work complexity. This algorithm is used as the basis for another algorithm that tolerates stop-failures and restarts. This new algorithm is the first solution for the do-all problem that efficiently deals with processor restarts. Its available processor steps is \(S = {\cal O}((t + p\log p + f)\cdot \min\{\log p,\log f\})\), and its message complexity is \(M={\cal O}(t+p\log p+ f p)\), wheref is the total number of failures.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Received: October 1998 / Accepted: September 2000
Rights and permissions
About this article
Cite this article
Chlebus, B., De Prisco, R. & Shvartsman, A. Performing tasks on synchronous restartable message-passing processors. Distrib Comput 14, 49–64 (2001). https://doi.org/10.1007/PL00008926
Issue Date:
DOI: https://doi.org/10.1007/PL00008926