ecms_neu_mini.png

Digital Library

of the European Council for Modelling and Simulation

 

Title:

The Median Resource Failure Checkpointing

Authors:

Suleman Khan, Khizar Hayat, Sajjad A. Madani, Samee U. Khan,

Joanna Kolodziej

Published in:

 

(2012).ECMS 2012 Proceedings edited by: K. G. Troitzsch, M. Moehring, U. Lotzmann. European Council for Modeling and Simulation. doi:10.7148/2012 

 

ISBN: 978-0-9564944-4-3

 

26th European Conference on Modelling and Simulation,

Shaping reality through simulation

Koblenz, Germany, May 29 – June 1 2012

 

Citation format:

Khan, S., Hayat, K., Madani, S. A., Khan, S. U., & Kolodziej, J. (2012). The Median Resource Failure Checkpointing. ECMS 2012 Proceedings edited by: K. G. Troitzsch, M. Moehring, U. Lotzmann (pp. 483-489). European Council for Modeling and Simulation. doi:10.7148/2012-0483-0489

DOI:

http://dx.doi.org/10.7148/2012-0483-0489

Abstract:

In grid computing, the realization of an enviable fault tolerance ability is linked with the proper utilization of resources and scheduling of jobs. The literature offers two solutions to these two challenging tasks, viz. check- pointing and replication. A checkpointing strategy is being proposed that uses the median of failure inter- vals of the resources in deciding the checkpoint intervals for the given jobs. The strategy shows improved sys- tem throughput, job losses and job execution times while eliminating unnecessary checkpoints.

Full text: