Skip to main content

Transparent Fault Tolerance for Grid Applications

  • Conference paper
Advances in Grid Computing - EGC 2005 (EGC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3470))

Included in the following conference series:

Abstract

A major challenge facing grid applications is the appropriate handling of failures. In this paper we address the problem of making parallel Java applications based on Remote Method Invocation (RMI) fault tolerant in a way transparent to the programmer. We use globally consistent checkpointing to avoid having to restart long-running computations from scratch after a system crash. The application’s execution state can be captured at any time also when some of the application’s threads are blocked waiting for the result of a (nested) remote method call. We modify only the program’s bytecode which makes our solution independent from a particular Java Virtual Machine (JVM) implementation. The bytecode transformation algorithm performs a compile time analysis to reduce the number of modifications in the application’s code which has a direct impact on the application’s performance. The fault tolerance extensions encompass also the RMI components such as the RMI registry. Since essential data as checkpoints are replicated, our system is resilient to simultaneous failures of multiple machines. Experimental results show negligible performance overhead of our fault-tolerance extensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Java platform debugger architecture (jpda), http://java.sun.com/products/jpda/

  2. Java remote method invocation specification. revision 1.10, jdk 1.5.0, 2004, http://java.sun.com/j2se/1.5/pdf/rmi-spec-1.5.0.pdf .

  3. Allen, G., Benger, W., Goodale, T., Hege, H.C., Lanfermann, G., Merzky, A., Radke, T., Seidel, E., Shalf, J.: The cactus code: A problem solving environment for the grid. In: The Ninth IEEE International Symposium on High Performance Distributed Computing (HPDC9), Pittsburgh, PA, USA (August 2000)

    Google Scholar 

  4. Arnold, D.C., Dongarra, J.: The netsolve environment: Progressing towards the seamless grid. In: International Workshop on Parallel Processing, Toronto, Canada (August 2000)

    Google Scholar 

  5. Bouchenak, S.: Making java applications mobile or persistent. In: Conference on Object-Oriented Technologies and Systems, San Antonio, TX, USA (January 2001)

    Google Scholar 

  6. Chandy, K.M., Lamport, L.: Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems 3(1), 63–75 (1985)

    Article  Google Scholar 

  7. Clark, R., Jensen, E., Reynolds, F.: An architectural overview of the alpha real-time distributed kernel. In: USENIX Winter Conference, San Diego, CA, USA (January 1993)

    Google Scholar 

  8. Coninx, T., Truyen, E., Vanhaute, B., Berbers, Y., Joosen, W., Verbaeten, P.: On the use of threads in mobile object systems. In: Malenfant, J., Moisan, S., Moreira, A.M.D. (eds.) ECOOP 2000 Workshops. LNCS, vol. 1964. Springer, Heidelberg (2000)

    Google Scholar 

  9. Fuenfrocken, S.: Transparent migration of java-based mobile agents. In: Rothermel, K., Hohl, F. (eds.) MA 1998. LNCS, vol. 1477, p. 26. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  10. Gosling, J., Joy, B., Steele Jr., G.L., Bracha, G.: The Java Language Specification, 2nd edn. Addison-Wesley, Reading (2000), http://java.sun.com/docs/books/jls/

    Google Scholar 

  11. Illman, T., Krueger, T., Kargl, F., Weber, M.: Transparent migration of mobile agents using the java platform debugger architecture. In: The Fifth IEEE International Conference on Mobile Agents, Atlanta, GA, USA (December 2001)

    Google Scholar 

  12. Lindholm, T., Yellin, F.: The Java Virtual Machine Specification. Addison-Wesley, Reading (1999), http://java.sun.com/docs/books/vmspec/

    Google Scholar 

  13. Maassen, J., van Nieuwpoort, R., Veldema, R., Bal, H., Kielmann, T., Jacobs, C., Hofman, R.: Efficient java rmi for parallel programming. ACM Transactions on Programming Languages and Systems 23(6), 747–775 (2001)

    Article  Google Scholar 

  14. Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems 9(1), 21–65 (1991)

    Article  Google Scholar 

  15. Sekiguchi, T., Masuhara, H., Yonezawa, A.: A simple extension of java language for controllable transparent migration and its portable implementation. In: 3rd International Conference on Coordination Models and Languages, Amsterdam, The Netherlands (April 1999)

    Google Scholar 

  16. Stone, N., Simmel, D., Kielmann, T.: GWD-I: An architecture for grid checkpoint recovery services and a GridCPR API. In: Grid Checkpoint Recovery Working Group Draft 3.0, Global Grid Forum (May 2004), http://gridcpr.psc.edu/GGF/docs/draft-ggf-gridcpr-Architecture-2.0.pdf

  17. Suri, N., Bradshaw, J., Breedy, M., Groth, P., Hill, A.G., Jeffers, R.: Strong mobility and fine-grained resource control in nomads. In: Agent Systems and Applications / Mobile Agents, Zurich, Switzerland (September 2000)

    Google Scholar 

  18. Tanenbaum, A.S., van Steen, M.: Distributed Systems: Principles and Paradigms. Prentice-Hall, Englewood Cliffs (2002)

    MATH  Google Scholar 

  19. Tang, P., Yew, P.C.: Algorithms for distributing hot spot addressing. Technical report, Center for Supercomputing Research and Development, University of Illinois Urbana-Champaign (January 1987)

    Google Scholar 

  20. van Nieuwpoort, R.V., Maassen, J., Hofman, R., Kielmann, T., Bal, H.E.: Ibis: An efficient java-based grid programming environment. In: Joint ACM Java Grande - ISCOPE 2002 Conference, Seattle, WA, USA (November 2002)

    Google Scholar 

  21. van Nieuwpoort, R.V., Maassen, J., Hofman, R., Kielmann, T., Bal, H.E.: Satin: Simple and efficient java-based grid programming. In: AGridM 2003 Workshop on Adaptive Grid Middleware, New Orleans, LA, USA (September 2003)

    Google Scholar 

  22. Weyns, D., Truyen, E., Verbaeten, P.: Distributed threads in java. In: International Symposium on Parallel and Distributed Computing, Iasi, Romania (July 2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Garbacki, P., Biskupski, B., Bal, H. (2005). Transparent Fault Tolerance for Grid Applications. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds) Advances in Grid Computing - EGC 2005. EGC 2005. Lecture Notes in Computer Science, vol 3470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11508380_68

Download citation

  • DOI: https://doi.org/10.1007/11508380_68

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26918-2

  • Online ISBN: 978-3-540-32036-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics