Skip to main content

A general method for maximizing the error-detecting ability of distributed algorithms

  • Conference paper
  • First Online:
PARLE'94 Parallel Architectures and Languages Europe (PARLE 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 817))

Abstract

The bound on component failures and their spatial distribution govern the fault tolerance of any candidate error-detecting algorithm. For distributed memory multiprocessors, the specific algorithm and the topology of the processor interconnection network define these bounds. This paper introduces the maximal fault index, derived from the system topology and local communication patterns, to demonstrate how a maximal number of simultaneous component failures can be tolerated for a particular interconnection network and error-detecting algorithm. The index is used to design a mapping of processes to processor groups such that the error-detecting ability of the algorithm is preserved for certain multiple simultaneous processor failures.

This work was supported in part by the National Science Foundation under Grant Numbers MSS-9216479 and CDA-9222827, and, in part, from the Air Force Office of Scientific Research under contract numbers F49620-92-J-0546 and F49620-93-I-0409 and, in part, by a grant from the University of Missouri Research Board.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W. Dally and C. Seitz. The torus routing chip. Journal of Distributed Computing, 1(3):187–196, 1986.

    Google Scholar 

  2. J. Fortes and C. Raghavendra. Graceful degradable processor arrays. IEEE Trans. On Computers, C-34:1033–1044, November 1985.

    Google Scholar 

  3. M. Garey and Johnson D. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, San Francisco, 1979.

    Google Scholar 

  4. R.W. Hamming. Error detecting and error correcting codes. Bell Syst. Tech. J., 29:147–160, April 1950.

    Google Scholar 

  5. J. Hayes. A graph model for fault-tolerant computing systems. IEEE Trans. On Computers, C-25:875–883, September 1976.

    Google Scholar 

  6. H. Lutfiyya, M. Schollmeyer, and B. McMillin. Fault-tolerant distributed sort generated from a verification proof outline. 2nd Responsive Systems Symposium, 1992. Springer Verlag.

    Google Scholar 

  7. B. McMillin. Reliable parallel processing: The application-oriented paradigm. Ph.D Thesis, Computer Science Department, Michigan State University, 1988.

    Google Scholar 

  8. B. McMillin and L. Ni. Reliable distributed sorting through the application-oriented fault tolerance paradigm. IEEE Trans. of Parallel and Distributed Computing, 3(4):411–420, 1992.

    Article  Google Scholar 

  9. P. Ramanathan and S. Chalasani. Resource placement in k-ary n-cubes. Proc. Intern. Conf. on Parallel Processing, II:133–140, 1992.

    Google Scholar 

  10. A. Rosenberg. The diogenes approach to testable fault-tolerant arrays of processors. IEEE Trans. On Computers, C-32:902–910, October 1983.

    Google Scholar 

  11. M. Schollmeyer and B. McMillin. A general method for maximizing the error-detecting ability of distributed algorithms. UMR Department of Computer Science Technical Report CS-93-16, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Costas Halatsis Dimitrios Maritsas George Philokyprou Sergios Theodoridis

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schollmeyer, M., McMillin, B. (1994). A general method for maximizing the error-detecting ability of distributed algorithms. In: Halatsis, C., Maritsas, D., Philokyprou, G., Theodoridis, S. (eds) PARLE'94 Parallel Architectures and Languages Europe. PARLE 1994. Lecture Notes in Computer Science, vol 817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58184-7_144

Download citation

  • DOI: https://doi.org/10.1007/3-540-58184-7_144

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58184-0

  • Online ISBN: 978-3-540-48477-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics