Skip to main content

Self-configuring Algorithm for Software Fault Tolerance in (n,k)-way Cluster Systems

  • Conference paper
  • First Online:
Computational Science and Its Applications — ICCSA 2003 (ICCSA 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2667))

Included in the following conference series:

Abstract

Complex software-intensive applications can be built with commercially available systems for cluster systems. To improve availability of (n,k)-way cluster systems, we develop self-configuring algorithm that not only determines the number of primary and backup nodes for meeting the requirement of availability and waiting time deadline, but also uses software rejuvenation for dealing with dormant software faults. Availability modeling of (n,k)-way cluster systems with software rejuvenation has a view of fault tolerance and switchover states with a semi-Markov process. According to the operating parameters, steady-state probabilities and availability are calculated, which are used for self-configuring algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buyya, R.: High Performance Cluster Computing: Architectures and Systems. Prentice-Hall (1999)

    Google Scholar 

  2. Du, X., Zhang, X.: Memory Hierarchy Considerations for Cost-effective Cluster Computing. IEEE Transactions on Computer (2000) 915–933

    Google Scholar 

  3. Sullivan, M. and Chillarehe, R.: Software Defects and Their Impact on System Availability-A Study of Field Failures in Operating Systems. Proceedings of the 21st IEEE International Symposium on Fault-Tolerant Computing (1991) 2–9

    Google Scholar 

  4. Gray, J. and Siewiorek, D.P.: High-Availability Computer Systems. IEEE Computer 24 (1991) 39–48

    Google Scholar 

  5. Huang, Y., Kintala, C., Kolettis, N., and Fultion, N.D.: Software Rejuvenation: Analysis, Module and Applications. Proceedings of the 25th Symposium on Fault Tolerant Computer Systems (1995) 318–390

    Google Scholar 

  6. Garg, S., Moorsel, A.van, Vaidyanathan, K., and Trivedi, K.: A Methodology for Detection and Estimation of Software Aging. Proceedings of the 9th International Symposium on Software Reliability Engineering (1998) 282–292

    Google Scholar 

  7. Huang, Y. et al.: Software Tools and Libraries for Fault Tolerance. Bulletin of the Technical Committee on Operating Systems and Application Environment (1995) 5–9

    Google Scholar 

  8. Hunter, S.W. and Smith, W.E.: Availability Modeling and Analysis of a Two Node Cluster. Proceedings of the 5th International Conference on Information Systems, Analysis and Synthesis (1999)

    Google Scholar 

  9. Lyu, M.R. and Mendiratta, V.B.: Software Fault Tolerance in a Clustered Architecture: Techniques and Reliability Modeling. Proceedings of the 1999 IEEE Aerospace Conference (1999) 141–150

    Google Scholar 

  10. Mendiratta, V.B.: Reliability Analysis of Clustered Computing Systems. Proceedings of the 9th IEEE International Symposium on Software Reliability Engineering (1998) 268–272

    Google Scholar 

  11. Park, K. and Kim, S.: Availability Analysis and Improvement of Active/Standby Cluster Systems using software rejuvenation. The Journal of Systems Software 61 (2002) 121–128

    Article  Google Scholar 

  12. Castelli, V., et al.:Proactive Management of Software Aging. IBM Journal of Research and Development 45 (2001) 311–332

    Article  Google Scholar 

  13. Sericola, B.: Availability Analysis of Repairable Computer Systems and Stationary Detection. IEEE Transactions on Computers 48 (1999) 1166–1172

    Article  MathSciNet  Google Scholar 

  14. Kleinrock, L.: Queueing Systems Volume I: Theory. Wiley (1975)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Choi, C., Kim, S. (2003). Self-configuring Algorithm for Software Fault Tolerance in (n,k)-way Cluster Systems. In: Kumar, V., Gavrilova, M.L., Tan, C.J.K., L’Ecuyer, P. (eds) Computational Science and Its Applications — ICCSA 2003. ICCSA 2003. Lecture Notes in Computer Science, vol 2667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44839-X_78

Download citation

  • DOI: https://doi.org/10.1007/3-540-44839-X_78

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40155-1

  • Online ISBN: 978-3-540-44839-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics