Skip to main content

Fault-Tolerant Scheduling for Bag-of-Tasks Grid Applications

  • Conference paper
Advances in Grid Computing - EGC 2005 (EGC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3470))

Included in the following conference series:

Abstract

In this paper we propose a fault-tolerant scheduler for Bag-of-Tasks Grid applications, called WorkQueue with Replication Fault Tolerant (WQR-FT), obtained by adding checkpointing and replication to the WorkQueue with Replication (WQR) scheduling algorithm. By using discrete-event simulation, we show that WQR-FT not only ensures the successful completion of all the tasks in a bag, but also achieves performance better than WQR and other fault-tolerant schedulers obtained by coupling WQR with replication only, or with checkpointing only.

This work has been supported by the Italian MIUR under the project Societá dell’Informazione, Sottoprogetto 3 – Grid Computing: Tecnologie abilitanti ed applicazioni per eScience, L. 449/97, anno 1999.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abawajy, J.H.: Fault-Tolerant Scheduling Policy for Grid Computing Systems. In: Proc. of 18th Int. Parallel and Distributed Processing Symposium, Workshop on. IEEE-CS Press, Los Alamitos (April 2004)

    Google Scholar 

  2. Berman, F., Wolski, R., et al.: Adaptive Computing on the Grid Using AppLeS. IEEE Trans. on Parallel and Distributed Systems 14(4) (April 2004)

    Google Scholar 

  3. Casanova, H., Berman, F., Obertelli, G., Wolski, R.: The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid. In: Proc. of Supercomputing 2000. IEEE CS Press, Los Alamitos (2000)

    Google Scholar 

  4. Casanova, H., Legrand, A., Zagorodnov, D., et al.: Heuristics for Scheduling Parameter Sweeping Application in Grif Environments. In: Proc. of Heterogeneous Computing Workshop. IEEE CS Press, Los Alamitos (2000)

    Google Scholar 

  5. Dinda, P., Lu, D.: GridG: Generating Realistic Computational Grids. Performance Evaluation Review 30 (2003)

    Google Scholar 

  6. da Silva, D.P., Cirne, W., Brasileiro, F.V.: Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 169–180. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Brevik, J., Nurmi, D., Wolski, R.: Modeling machine availability in enterprise and wide-area distributed computing environments. Technical Report 37, Department of Computer Science, University of California, Santa Barbara (2003)

    Google Scholar 

  8. Brevik, J., Nurmi, D., Wolski, R.: Automatic Methods for Predicting Machine Availability in Desktop Grid and Peer-to-peer Systems. In: Proc. of 4th Int. Workshop on Global and Peer-to-Peer Computing, Chicago, Illinois (USA), April 19-22. IEEE Press, Los Alamitos (2004)

    Google Scholar 

  9. Medeiros, R., Cirne, W., Brasileiro, F., Sauvé, J.: Fault in Grids: Why are they so bad and What can be done about it? In: Proc. 4th Int. Workshop on Grid Computing (Grid 2003). IEEE-CS Press, Los Alamitos (November 2003)

    Google Scholar 

  10. Schwetman, H.: Object-oriented simulation modeling with c++/csim. In: Proc. of 1995 Winter Simulation Conference (December 1995)

    Google Scholar 

  11. Cirne, W., et al.: Grid Computing for Bag of Tasks Applications. In: Proc. of 3rd IFIP Conf. on E-Commerce, E-Business and E-Government, Sao Paulo, Brazil (September 2003)

    Google Scholar 

  12. Weissman, J., Womack, D.: Fault Tolerant Scheduling in Distributed Networks. Technical Report TR CS-96-10, Department of Computer Science, University of Texas, San Antonio (September 1996)

    Google Scholar 

  13. Young, J.W.: A First-order Approximation to the Optimum Checkpoint. Communications of the ACM 17 (1974)

    Google Scholar 

  14. Hwang, S., Kesselman, C.: A Flexible Framework for Fault Tolerance in the Grid. Journal of Grid Computing 1(3) (2003)

    Google Scholar 

  15. Zhang, X., Zagorodnov, D., Hiltunen, M., Marzullo, K., Schlichting, R.D.: Fault-tolerant Grid Services Using Primary-Backup: Feasibility and Performance. In: Proc. IEEE Int. Conf. on Cluster Computing. IEEE-CS Press, Los Alamitos (September 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Anglano, C., Canonico, M. (2005). Fault-Tolerant Scheduling for Bag-of-Tasks Grid Applications. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds) Advances in Grid Computing - EGC 2005. EGC 2005. Lecture Notes in Computer Science, vol 3470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11508380_64

Download citation

  • DOI: https://doi.org/10.1007/11508380_64

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26918-2

  • Online ISBN: 978-3-540-32036-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics