Abstract
In the online checkpointing problem, the task is to continuously maintain a set of k checkpoints that allow to rewind an ongoing computation faster than by a full restart. The only operation allowed is to replace an old checkpoint by the current state. Our aim are checkpoint placement strategies that minimize rewinding cost, i.e., such that at all times T when requested to rewind to some time t ≤ T the number of computation steps that need to be redone to get to t from a checkpoint before t is as small as possible. In particular, we want that the closest checkpoint earlier than t is not further away from t than q k times the ideal distance T / (k + 1), where q k is a small constant.
Improving over earlier work showing 1 + 1/k ≤ q k ≤ 2, we show that q k can be chosen asymptotically less than 2. We present algorithms with asymptotic discrepancy q k ≤ 1.59 + o(1) valid for all k and q k ≤ ln (4) + o(1) ≤ 1.39 + o(1) valid for k being a power of two. Experiments indicate the uniform bound p k ≤ 1.7 for all k. For small k, we show how to use a linear programming approach to compute good checkpointing algorithms. This gives discrepancies of less than 1.55 for all k < 60.
We prove the first lower bound that is asymptotically more than one, namely q k ≥ 1.30 − o(1). We also show that optimal algorithms (yielding the infimum discrepancy) exist for all k.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ahlroth, L., Pottonen, O., Schumacher, A.: Approximately uniform online checkpointing with bounded memory. Algorithmica (to appear, 2013)
Bern, M.W., Greene, D.H., Raghunathan, A., Sudan, M.: On-line algorithms for locating checkpoints. Algorithmica 11(1), 33–52 (1994)
Bringmann, K., Doerr, B., Neumann, A., Sliacan, J.: Online checkpointing with improved worst-case guarantees. CoRR, abs/1302.4216 (2013)
Chandy, K.M., Ramamoorthy, C.V.: Rollback and recovery strategies for computer programs. IEEE Transactions on Computers C-21, 546–556 (1972)
Elnozahy, E.N.M., Alvisi, L., Wang, Y.-M., Johnson, D.B.: A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys 34(3), 375–408 (2002)
Gelenbe, E.: On the optimum checkpoint interval. Journal of the ACM 26(2), 259–270 (1979)
Heuveline, V., Walther, A.: Online checkpointing for parallel adjoint computation in PDEs: Application to goal-oriented adaptivity and flow control. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 689–699. Springer, Heidelberg (2006)
Österlind, F., Dunkels, A., Voigt, T., Tsiftes, N., Eriksson, J., Finne, N.: Sensornet checkpointing: Enabling repeatability in testbeds and realism in simulations. In: Roedig, U., Sreenan, C.J. (eds.) EWSN 2009. LNCS, vol. 5432, pp. 343–357. Springer, Heidelberg (2009)
Stumm, P., Walther, A.: New algorithms for optimal online checkpointing. SIAM Journal on Scientific Computing 32(2), 836–854 (2010)
Toueg, S., Babaoglu, Ö.: On the optimum checkpoint selection problem. SIAM Journal on Computing 13(3), 630–649 (1984)
Yi, S., Kondo, D., Andrzejak, A.: Reducing costs of spot instances via checkpointing in the Amazon elastic compute cloud. In: IEEE 3rd International Conference on Cloud Computing (CLOUD 2010), pp. 236–243 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bringmann, K., Doerr, B., Neumann, A., Sliacan, J. (2013). Online Checkpointing with Improved Worst-Case Guarantees. In: Fomin, F.V., Freivalds, R., Kwiatkowska, M., Peleg, D. (eds) Automata, Languages, and Programming. ICALP 2013. Lecture Notes in Computer Science, vol 7965. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39206-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-39206-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39205-4
Online ISBN: 978-3-642-39206-1
eBook Packages: Computer ScienceComputer Science (R0)