skip to main content
article

On the completion time distribution for tasks that must restart from the beginning if a failure occurs

Published:01 December 2006Publication History
Skip Abstract Section

Abstract

For many systems, failure is so common that the design choice of how to deal with it may have a significant impact on the performance of the system. There are many specific and distinct failure recovery schemes, but they can be grouped into three broad classes: RESUME, also referred to as preemptive resume (prs), or check-pointing; REPLACE, also referred to as preemptive repeat different (prd); and RESTART, also referred to as preemptive repeat identical (pri). The following describes the three recovery schemes: (1) RESUME: when a task is fails, it knows exactly where it stops, and can continue from that point when allowed to resume; (2)REPLACE: given a task fails, then when it begins processing again, it starts with a brand new task sampled from the same task time distribution; and, (3) RESTART: When a task fails, it loses all that it had acquired to up to that point and must start anew when upon continuing later. This is distinctly different from (2) since the task must run at least as long as it did before it failed, whereas a new sample, selected at random, might run for a shorter or longer time.

References

  1. P. Fiorini, R. Sheahan, and L. Lipsky, "On Unreliable Computing Systems When Heavy-Tails Appear as a Result of The Recovery Procedure," ACM Sigmetrics Perf. Eval. Rev., Vol. 33(2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Kulkarni, V. Nicola, and K. Trivedi, "The Completion Time of a Job on a Multmode System," Advances in Applied Probability, 19:932--954, 1987.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 34, Issue 3
    December 2006
    62 pages
    ISSN:0163-5999
    DOI:10.1145/1215956
    Issue’s Table of Contents

    Copyright © 2006 Authors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 December 2006

    Check for updates

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader