Skip to main content
Log in

Service fault tolerance for highly reliable service-oriented systems: an overview

高可靠面向服务系统的服务容错技术概述

  • Review
  • Special Focus on High-Confidence Software Technologies
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Service-oriented systems are widely-employed in e-business, e-government, finance, management systems, and so on. Service fault tolerance is one of the most important techniques for building highly reliable service-oriented systems. In this paper, we provide an overview of various service fault tolerance techniques, including sections on fault tolerance strategy design, fault tolerance strategy selection, and Byzantine fault tolerance. In the first section, we introduce the design of static and dynamic fault tolerance strategies, as well as the major problems when designing fault tolerance strategies. After that, based on various fault tolerance strategies, in the second section, we identify significant components from a complex service-oriented system, and investigate algorithms for optimal fault tolerance strategy selection. Finally, in the third section, we discuss a special type of service fault tolerance techniques, i.e., the Byzantine fault tolerance.

摘要

面向服务系统被广泛应用于电子商务、 电子政务、 金融、 管理系统等领域。 服务容错技术是用于建立高可靠性面向服务系统的最重要的技术之一。 本文给出了各种服务容错技术的概述, 包括三个部分: 容错策略设计, 容错策略选择, 及拜占庭容错。 第一部分主要关注静态及动态容错策略的设计, 及服务容错策略设计过程中需要解决的主要问题。 面对各种各样的服务容错策略, 第二部分包括快速定位复杂的面向服务系统关键模块的方法, 及最优容错策略选择算法。 最后, 第三部分将会讨论一种特殊的服务容错技术, 拜占庭容错。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lyu M R. Handbook of Software Reliability Engineering. New York: McGraw-Hill, 1996

    Google Scholar 

  2. Lyu M R. Software Fault Tolerance. Chichester: John Wiley & Sons, 1995

    Google Scholar 

  3. Wang H, Tang Y, Yin G, et al. Trustworthiness of internet-based software. Sci China Ser-F: Inf Sci, 2006, 49: 759–773

    Article  Google Scholar 

  4. Fang C-L, Liang D, Lin F, et al. Fault-tolerant Web services. J Syst Architect, 2007, 53: 21–38

    Article  Google Scholar 

  5. Salatge N, Fabre J-C. Fault tolerance connectors for unreliable Web services. In: Proceedings of 37th International Conference on Dependable Systems and Networks, Edinburgh, 2007. 51–60

    Google Scholar 

  6. Sheu G-W, Chang Y-S, Liang D, et al. A fault-tolerant object service on CORBA. In: Proceedings of 17th International Conference on Distributed Computing Systems, Baltimore, 1997. 393

    Google Scholar 

  7. Luckow A, Schnor B. Service replication in grids: ensuring consistency in a dynamic, failure-prone environment. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing, Miami, 2008. 1–7

    Google Scholar 

  8. Merideth M G, Iyengar A, Mikalsen T, et al. Thema: Byzantine fault-tolerant middleware for Web service applications. In: Proceedings of 24th IEEE Symposium on Reliable Distributed Systems, Orlando, 2005. 131–142

    Chapter  Google Scholar 

  9. Pallemulle S L, Thorvaldsson H D, Goldman K J. Byzantine fault-tolerant Web services for n-tier and service oriented architectures. In: Proceedings of 28th International Conference on Distributed Computing Systems, Beijing, 2008. 260–268

    Google Scholar 

  10. Salas J, Perez-Sorrosal F, Marta Pati N-M, et al. WS-replication: a framework for highly available Web services. In: Proceedings of 15th International Conference on World Wide Web, Edinburgh, 2006. 357–366

    Chapter  Google Scholar 

  11. Santos G T, Lung L C, Montez C. FTWeb: a fault tolerant infrastructure for Web services. In: Proceedings of 9th IEEE International Conference on Enterprise Computing, Enschede, 2005. 95–105

    Google Scholar 

  12. Randell B, Xu J. The evolution of the recovery block concept. In: Lyu M R, ed. Software Fault Tolerance. Chichester: John Wiley & Sons, 1995. 1–21

    Google Scholar 

  13. Avizienis A. The methodology of n-version programming. In: Lyu M R, ed. Software Fault Tolerance. Chichester: John Wiley & Sons, 1995. 23–46

    Google Scholar 

  14. Leu D, Bastani F, Leiss E. The effect of statically and dynamically replicated components on system reliability. IEEE Trans Rel, 1990, 39: 209–216

    Article  MATH  Google Scholar 

  15. Zheng Z, Lyu M R. An adaptive QoS-aware fault tolerance strategy for Web services. Springer J Empir Softw Eng, 2010, 15: 323–345

    Article  Google Scholar 

  16. Ye X, Shen Y. Replicating multithreaded web services. In: Proceedings of 3rd International Symposium on Parallel and Distributed Processing and Applications, Nanjing, 2005. 162–167

    Chapter  Google Scholar 

  17. Osrael J, Froihofer L, Weghofer M, et al. Axis2-based replication middleware for Web services. In: Proceedings of IEEE International Conference on Web Services, Salt Lake City, 2007. 591–598

    Chapter  Google Scholar 

  18. Ye X. Providing reliable Web services through active replication. In: Proceedings of 6th IEEE/ACIS International Conference on Computer and Information Science, Melbourne, 2007. 1111–1116

    Google Scholar 

  19. Brito A, Fetzer C, Felber P. Multithreading-enabled active replication for event stream processing operators. In: Proceedings of 28th IEEE International Symposium on Reliable Distributed Systems, Niagara Falls, 2009. 22–31

    Google Scholar 

  20. Object Management Group. Fault-tolerant COBRA using entity redundancy: request for proposal. 98-04-01, 1998

    Google Scholar 

  21. Narasimhan P, Moser L E, Melliar-Smith P M. Enforcing determinism for the consistent replication of multithreaded CORBA applications. In: Proceedings of 18th IEEE Symposium on Reliable Distributed Systems, Lausanne, 1999. 263

    Chapter  Google Scholar 

  22. Fang C-L, Liang D, Chen C, et al. A redundant nested invocation suppression mechanism for active replication faulttolerant Web service. In: Proceedings of IEEE International Conference on e-Technology, e-Commerce and e-Service, Taipei, 2004. 9–16

    Google Scholar 

  23. Zheng Z, Zhou T C, Lyu M R, et al. FTCloud: a ranking-based framework for fault tolerant cloud applications. In: Proceedings of International Symposium on Software Reliability Engineering, San Jose, 2010. 398–407

    Google Scholar 

  24. Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of 7th Internationl World Wide Web Conference, Brisbane, 1998

    Google Scholar 

  25. Zheng Z, Zhou T, Lyu M R, et al. Component ranking for fault-tolerant cloud applications. IEEE Trans Serv Comput, 2012, 5: 540–550

    Article  Google Scholar 

  26. Qiu W, Zheng Z, Wang X, et al. Reliability-based design optimization for cloud migration. IEEE Trans Serv Comput, 2014, 7: 223–236

    Article  Google Scholar 

  27. Zheng Z, Lyu M R. Selecting an optimal fault tolerance strategy for reliable service-oriented systems with local and global constraints. IEEE Trans Comput, 2015, 64: 219–232

    Article  MathSciNet  Google Scholar 

  28. Cormen T, Leiserson C, Rivest R. Introduction to Algorithms. Cambridge: MIT Press, 1990

    MATH  Google Scholar 

  29. Shahadat Khan E G M, Li Kin F, Akbar M. Solving the knapsack problem for adaptive multimedia systems. Stud Inf Univ, 2002, 2: 157–178

    Google Scholar 

  30. Lamport L, Shostak R, Pease M. The Byzantine generals problem. ACM Trans Program Lang Syst, 1982, 4: 382–401

    Article  MATH  Google Scholar 

  31. Castro M, Liskov B. Practical Byzantine fault tolerance. In: Proceedings of 3rd Symposium on Operating Systems Design and Implementation, New Orleans, 1999. 1–14

    Google Scholar 

  32. Zhao W. BFT-WS: a Byzantine fault tolerance framework for Web services. In: Proceedings of 7th International IEEE EDOC Conference Workshop, Annapolis, 2007. 89–96

    Google Scholar 

  33. Li W, He J, Ma Q, et al. A framework to support survivable Web services. In: Proceedings of 19th IEEE International Symposium on Parallel and Distributed Processing, Denver, 2005. 93–94

    Chapter  Google Scholar 

  34. Rodrigues R, Castro M, Liskov B. BASE: using abstraction to improve fault tolerance. In: Proceedings of 18th Symposium on Operating Systems Principles, Banff, 2001. 15–28

    Google Scholar 

  35. Engelen R A V, Gallivan K A. The gSOAP toolkit for Web services and peer-to-peer computing networks. In: Proceedings of IEEE International Symposium on Cluster Computing and the Grid, Berlin, 2002. 128

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to HuaiMin Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Z., Lyu, M.R.T. & Wang, H. Service fault tolerance for highly reliable service-oriented systems: an overview. Sci. China Inf. Sci. 58, 1–12 (2015). https://doi.org/10.1007/s11432-015-5313-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-015-5313-y

Keywords

关键词

Navigation