Conferences >2010 International Conference...

Using replication and checkpointing for reliable task management in computational Grids

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In large-scale Grid computing environments, providing fault-tolerance is required for both scientific computation and file-sharing to increase their reliability. In previ...Show More

Metadata

Abstract:

In large-scale Grid computing environments, providing fault-tolerance is required for both scientific computation and file-sharing to increase their reliability. In previous works, several mechanisms were proposed for the Grids or distributed computing systems. However, some of them used only space redundancy (hardware replication), and others used only time redundancy (checkpointing and rollback). For this reason, the existing mechanisms are inefficient in terms of their resource utilization on the Grids. The main goal of ART is reducing the number of replications by using checkpointing and rollback scheme for each replication. In ART, the minimum number of replications is adaptively selected based on analysis of probability of successful execution within the given deadline and reliability requirement of each task. Our simulation results show that ART can significantly reduce the number of replications and improve scalability compared with existing mechanisms.

Published in: 2010 International Conference on High Performance Computing & Simulation

Date of Conference: 28 June 2010 - 02 July 2010

Date Added to IEEE Xplore: 12 August 2010

ISBN Information:

DOI: 10.1109/HPCS.2010.5547140

Conference Location: Caen, France

Contents

References is not available for this document.

Using replication and checkpointing for reliable task management in computational Grids

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Using replication and checkpointing for reliable task management in computational Grids

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?