Skip to main content

A Task Replication and Fair Resource Management Scheme for Fault Tolerant Grids

  • Conference paper
Advances in Grid Computing - EGC 2005 (EGC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3470))

Included in the following conference series:

  • 469 Accesses

Abstract

In this paper we study a fault tolerant model for Grid environments based on the task replication concept. The basic idea is to produce and submit to the Grid multiple replicas of a given task, given the fact that the failure probability for each one of them is known a priori. We introduce a scheme for the calculation of the number of replicas for the case of having diverse failure probabilities of each task replica and propose an efficient resource management scheme, based on fair share technique, which handles the task replicas so as to maintain in a fair way the fault tolerance in the Grid. Our study concludes with the presentation of the simulation results which validate the proposed scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lyu, M.R.: Software Fault Tolerance. John Wiley & Sons, Chichester (1995)

    Google Scholar 

  2. Weissman, J.B.: Fault Tolerant Computing on the Grid: What are My Options? HPDC 1999 (1999)

    Google Scholar 

  3. Wang, F., Ramamritham, K., Stankovic, J.A.: Determining redundancy levels for fault tolerant real-time systems. IEEE Trans. Computers 44(2), 292–303 (1995)

    Article  MATH  Google Scholar 

  4. Nguyen-Tuong, A.: Integrating Fault-Tolerance Techniques in Grid Applications, PhD Dissertation, University of Virginia (August. 2000)

    Google Scholar 

  5. Scheduling Working Group of the Grid Forum, Document: 10.5 (September 2001)

    Google Scholar 

  6. Ramamritham, K., Stankovic, J.A., Shiah, P.-F.: Efficient Scheduling Algorithms for Real-time Multiprocessor Systems. IEEE Trans. on Parallel and Distributed Systems 1(2), 184–194 (1990)

    Article  Google Scholar 

  7. Jackson, L.E., Rouskas, G.N.: Deterministic Preemptive Scheduling of Real Time Tasks. IEEE Computer 35(5), 72–79 (2002)

    Google Scholar 

  8. Demers, A., Keshav, S., Shenker, S.: Design and Analysis of a Fair Queuing Algorithm. In: Proc. of the ACM SIGCOMM (1989)

    Google Scholar 

  9. Bertsekas, D., Gallager, R.: Data Networks. Prentice Hall, Englewood Cliffs (1992); The section on max-min fairness starts on p. 524

    Google Scholar 

  10. Leung, J.Y.-T., Merrill, M.L.: A Note on Preemptive, Scheduling of Periodic, Real-Time Tasks. Information Processing Letters 11(3), 115–118 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  11. Dertouzos, M.L., Mok, A.K.-L.: Multiprocessor On-line scheduling for Hard Real Time Tasks. IEEE Trans. on Software Eng. 15(12), 1497–1506 (1989)

    Article  Google Scholar 

  12. Tanenbaum, A.S., van Steen, M.: Distributed Systems: Principles and Paradigms. Computer Science. Prentice Hall, Englewood Cliffs (2002)

    MATH  Google Scholar 

  13. Varvarigou, T., Trotter, J.: Module replication for fault-tolerant real-time distributed systems. IEEE Transactions on Reliability 47(1), 8–18 (1998)

    Article  Google Scholar 

  14. Doulamis, N., Doulamis, A., Panagakis, A., Dolkas, K., Varvarigou, T., Varvarigos, E.: A Combined Fuzzy -Neural Network Model for Non-Linear Prediction of 3D Rendering Workload in Grid Computing. IEEE Trans. on Systems Man and Cybernetics, Part-B (accepted for publication)

    Google Scholar 

  15. The Globus project, http://www-fp.globus.org/hbm/

  16. Nguyen-Tuong, A., Grimshaw, A.S.: Using Reflection to Incorporate Fault-Tolerance Techniques in Distributed Applications. Computer Science Technical Report, University of Virginia, CS 98-34 (1998)

    Google Scholar 

  17. Casanova, H., Dongarra, J., Johnson, C., Miller, M.: Application-Specific Tools. In: Foster, I., Kesselman, C. (eds.) The GRID: Blueprint for a New Computing Infrastructure, ch. 7, pp. 159–180 (1998)

    Google Scholar 

  18. Grimshaw, A.S., Ferrari, A., West, E.A.: Mentat. In: Wilson, G.V., Lu, P. (eds.) Parallel Programming Using C++, ch. 10, pp. 382–427 (1996)

    Google Scholar 

  19. Gartner, F.C.: Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments. ACM Computing Surveys 31(1) (1999)

    Google Scholar 

  20. Access to Knowledge through the Grid in a Mobile World (AKOGRIMO) Integrated Project FP6-2003-IST-004293, http://www.akogrimo.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Litke, A., Tserpes, K., Dolkas, K., Varvarigou, T. (2005). A Task Replication and Fair Resource Management Scheme for Fault Tolerant Grids. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds) Advances in Grid Computing - EGC 2005. EGC 2005. Lecture Notes in Computer Science, vol 3470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11508380_104

Download citation

  • DOI: https://doi.org/10.1007/11508380_104

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26918-2

  • Online ISBN: 978-3-540-32036-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics