Improving checkpointing intervals by considering individual job failure probabilities | IEEE Conference Publication | IEEE Xplore