Randomization Helps to Perform Tasks on Processors Prone to Failures

Chlebus, Bogdan S.; Kowalski, Dariusz R.

doi:10.1007/3-540-48169-9_20

Bogdan S. Chlebus⁵ &
Dariusz R. Kowalski⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1693))

Included in the following conference series:

International Symposium on Distributed Computing

548 Accesses
1 Citations

Abstract

The problem of performing t tasks in a distributed system of p processors is studied. The tasks are assumed to be independent, similar (each takes one stepto be completed), and idempotent (can be performed many times and concurrently). The processors communicate by passing messages and each of them may fail. This problem is usually called DO-ALL, it was introduced by Dwork, Halpern and Waarts.

The distributed setting considered in this paper is as follows: The sys- tem is synchronous, the processors fail by stopping, reliable multicast is available. The occurrence of faults is modeled by an adversary who has to choose at least c · p processors prior to the start of the computation, for a fixed constant 0 < c < 1, must not fail the selected processors but may fail any of the remaining processors at any time.

The main result is showing that there is a sharp difference between the expected performance of randomized algorithms versus the worst-case deterministic performance of algorithms solving the DO-ALL problem in such a setting.

Performance is measured in terms of work and communication of algorithms. Work is the total number of steps performed by all the processors while they are operational, including idling. Communication is the total number of point-to-point messages exchanged. Let effort be the sum of work and communication. A randomized algorithm is developed which has the expected effort O(t + p.(1 + log^* p – log^*(p/t))), where log^* is the number of iterations of the log function required to go with the value of function down to 1. For deterministic algorithms and their worst-case behavior, a lower bound Ω(t+p.logt/loglogt on work holds, and it is matched by the work performed by a simple algorithm.

Abstract

This research was supported by KBN contract 8 T11C 036 14.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Randomized algorithms for distributed computation of principal component analysis and singular value decomposition

Article 19 March 2018

The entropy of a distributed computation random number generation from memory interleaving

Article 22 September 2017

Better Sooner Rather Than Later

References

R.J. Anderson, and H. Woll, Wait-Free Parallel Algorithms for the Union-Find Problem, in Proc. 23rd Symp. on Theory of Computing, 1991, pp. 370–380.
Google Scholar
J. Buss, P.C. Kanellakis, P. Ragde, and A.A. Shvartsman, Parallel Algorithms with Processor Failures and Delays, J. Algorithms, 20(1996) 45–86.
Article MATH MathSciNet Google Scholar
B.S. Chlebus, R. De Prisco, and A.A. Shvartsman, Performing Tasks on Restartable Message-Passing Processors, in Proc. 11th International Workshop on Distributed Algorithms, 1997, LNCS 1320, pp. 96–110.
Google Scholar
R. De Prisco, A. Mayer, and M. Yung, Time-Optimal Message-Efficient Work Performance in the Presence of Faults, in Proc. 13th Symp. on Principles of Distributed Computing, 1994, pp. 161–172.
Google Scholar
C. Dwork, J. Halpern, O. Waarts, Performing Work Efficiently in the Presence of Faults, SIAM J. on Computing, 27 (1998) 1457–1491.
Article MATH MathSciNet Google Scholar
Z. Galil, A. Mayer, and M. Yung, Resolving Message Complexity of Byzantine Agreement and Beyond, in Proc. 36th Symp. on Foundations of Computer Science, 1995, pp. 724–733.
Google Scholar
G. Grimmett, and D. Stirzaker, “Probability and Random Processes,” Oxford University Press, 1992.
Google Scholar
V. Hadzilacos and S. Toueg, Fault-Tolerant Broadcasts and Related Problems, in “Distributed Systems,” 2nd Ed., S. Mullender, ed., Addison-Wesley and ACM Press, 1993.
Google Scholar
P.C. Kanellakis and A.A. Shvartsman, Efficient Parallel Algorithms Can Be Made Robust, Distributed Computing, 5 (1992) 201–217.
Article MATH Google Scholar
P.C. Kanellakis and A.A. Shvartsman, “Fault-Tolerant Parallel Computation,” Kluwer Academic Publishers, 1997.
Google Scholar
Z.M. Kedem, K.V. Palem, M.O. Rabin, A. Raghunathan, Efficient Program Transformations for Resilient Parallel Computation via Randomization, in Proc. 24th Symp. on Theory of Comp., 1992, pp. 306–318.
Google Scholar
Z.M. Kedem, K.V. Palem, A. Raghunathan, and P. Spirakis, Combining Tentative and Definite Executions for Dependable Parallel Computing, in Proc. 23rd Symp. on Theory of Computing, 1991, pp. 381–390.
Google Scholar
Z.M. Kedem, K.V. Palem, and P. Spirakis, Efficient Robust Parallel Computations, in Proc. 22nd Symp. on Theory of Computing, 1990, pp. 138–148.
Google Scholar
C. Martel, and R. Subramonian, On the Complexity of Certified Write-All Algorithms, J. Algorithms, 16 (1994) 361–387.
Article MATH MathSciNet Google Scholar
C. Martel, A. Park, and R. Subramonian, Work-Optimal Asynchronous Algorithms for Shared Memory Parallel Computers, SIAM J. Comput., 21 (1992) 1070–1099.
Article MATH MathSciNet Google Scholar
C. McDiarmid, On the Method of Bounded Differences, in J. Siemon, ed., “Surveys in Combinatorics,” Cambridge University Press, 1989, pp. 148–188, London Math. Soc. Lecture Note Series 141.
Google Scholar

Download references

Author information

Authors and Affiliations

Instytut Informatyki, Uniwersytet Warszawski, Banacha 2, 02-097, Warszawa, Poland
Bogdan S. Chlebus & Dariusz R. Kowalski

Authors

Bogdan S. Chlebus
View author publications
You can also search for this author in PubMed Google Scholar
Dariusz R. Kowalski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department for Computer Science 6211 Sudikoff Laboratory for Computer Science, Dartmouth College, Hanover, NH, 03755, USA
Prasad Jayanti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chlebus, B.S., Kowalski, D.R. (1999). Randomization Helps to Perform Tasks on Processors Prone to Failures. In: Jayanti, P. (eds) Distributed Computing. DISC 1999. Lecture Notes in Computer Science, vol 1693. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48169-9_20

Download citation

DOI: https://doi.org/10.1007/3-540-48169-9_20
Published: 03 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66531-1
Online ISBN: 978-3-540-48169-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Randomization Helps to Perform Tasks on Processors Prone to Failures