Abstract
We present a scheme based on the comparison of intermediate checkpoints that accelerates the detection of computing errors of bag-of-tasks executed on volunteer desktop grids. Currently, in the state-of-the-art, replicated task execution is used for result validation. Our method also uses replication, but instead of only comparing results at the end of the replicated computations, we validate ongoing executions by comparing checkpoints of their intermediate execution points. This scheme significantly reduces the time to detect a computational error, which we show with both theoretical analysis and simulation results. In particular, we develop a model that gives the benefit of intermediate checkpointing as a function of checkpoint frequency and error rate, and we confirm this model with simulation experiments. We find that with an error rate of 5% and checkpoint frequency of 20 times per task, the gain is as high as 35% compared to the case where error detection is done only at the end of task execution; for higher checkpoint frequencies or high error rates, the benefits are even greater. In addition, when an erroneous computation is detected at an intermediate execution point, we propose the immediate replacement of that computation with a correct replica from another worker. In this way, useful execution and further validation can continue from that point onward instead of being delayed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Agbaria and R. Friedman. A replication-and checkpoint-based approach for anomaly-based intrusion detection and recovery. Distributed Computing Systems Workshops, 2005. 25th IEEE International Conference on, pages 137–143, 2005.
D. Allen. Personal communication, June 2006.
C. An. Personal communication, March 2006.
D. Anderson. BOINC: A system for public-resource computing and storage. In 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, USA, 2004.
D. Antonelli, A. Cordero, and A. Mettler. Securing Distributed Computation with Untrusted Participants. 2004.
J. Bohannon. Grassroots supercomputing. Science, 308(6 May):810–813, 2005.
C. Christensen, T. Aina, and D. Stainforth. The challenge of volunteer computing with lengthy climate model simulations. In 1st IEEE International Conference on e-Science and Grid Computing, pages 8–15, Melbourne, Australia, 2005. IEEE Computer Society.
W. Du, J. Jia, M. Mangal, and M. Murugesan. Uncheatable grid computing. Distributed Computing Systems, 2004. Proceedings. 24th International Conference on, pages 4–11, 2004.
D. Eastlake and P. Jones. RFC 3174: US Secure Hash Algorithm 1 (SHA1). Request for Comments, September, 2001.
G. Fedak, C. Germain, V. Neri, and F. Cappello. Xtremweb: A generic global computing system. In 1st Int’l Symposium on Cluster Computing and the Grid (CCGRID’01), pages 582–587, Brisbane, 2001.
A. Holohan and A. Garg. Collaboration Online: The Example of Distributed Computing. Journal of Computer-Mediated Communication, 10(4), 2005.
D. Molnar. The SETI@home Problem. ACM Crossroads Student Magazine, september 2000.
R. Rivest. RFC-1321 The MD5 Message-Digest Algorithm. Network Working Group, IETF, April 1992.
L. Sarmenta. Sabotage-tolerance mechanisms for volunteer computing systems. In 1st International Symposium on Cluster Computing and the Grid, page 337, 2001.
L. M. Silva and J. G. Silva. System-level versus user-defined checkpointing. In Symposium on Reliable Distributed Systems, pages 68–74, 1998.
S. Son and M. Livny. Recovering Internet Symmetry in Distributed Computing. Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003. 3rd IEEE/ACM International Symposium on, pages 542–549, 2003.
M. Taufer, P. J. Teller, D. P. Anderson, and I. Charles L. Brooks. Metrics for effective resource management in global computing environments. e-science, 0:204–211, 2005.
XtremLab. http://xtremlab.lri.fr.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Araujo, F., Domingues, P., Kondo, D., Silva, L.M. (2008). Validating Desktop Grid Results By Comparing Intermediate Checkpoints. In: Gorlatch, S., Bubak, M., Priol, T. (eds) Achievements in European Research on Grid Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-72812-4_2
Download citation
DOI: https://doi.org/10.1007/978-0-387-72812-4_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-72811-7
Online ISBN: 978-0-387-72812-4
eBook Packages: Computer ScienceComputer Science (R0)