Skip to main content

Analysis of Relationship Between Modes of Intercomputer Communications and Fault Types in Redundant Computer Systems

  • Chapter
  • First Online:
  • 375 Accesses

Part of the book series: Lecture Notes in Computer Science ((TCOMPUTATSCIE,volume 10220))

Abstract

This paper analyzes the reasons of appearance of non - Byzantine and Byzantine fault types in redundant computer systems. The proposed approach is based on analysis of the relationship between the modes of intercomputer communications and fault types. This analysis allows the users to design the redundant computer systems in such a way that Byzantine faults cannot appear. Consequently, designing the redundant computer systems, in which Byzantine faults cannot appear, allows the designers to increase the degree of reliability by preventing the masking of any forms of appearance of faults and by decreasing the time period of checkpoints. In addition, this approach decreases the cost of software and hardware involved in the execution of fault-tolerant procedures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secur. Comput. 1(1), 11–33 (2004)

    Article  Google Scholar 

  2. Bentley, J.: Introduction to Reliability and Quality Engineering. Addison-Wesley, Reading (1999)

    Google Scholar 

  3. Pradhan, D.K. (ed.): Fault-tolerant Computer System Design. Prentice-Hall Inc., Upper Saddle River (1996)

    Google Scholar 

  4. Kwak, S.W., Choi, B.J., Kim, B.K.: An optimal checkpointing-strategy for real-time control systems under transient faults. IEEE Trans. Reliab. 50(3), 293–301 (2001)

    Article  Google Scholar 

  5. Zhang, Y., Jiang, J.: Integrated active fault-tolerant control using IMM approach. IEEE Trans. Aerosp. Electron. Syst. 37(4), 1221–1235 (2001)

    Article  Google Scholar 

  6. Alvisi, L., Malkhi, D., Pierce, E., Reiter, M.K.: Fault detection for Byzantine quorum systems. IEEE Trans. Parallel Distrib. Syst. 12(9), 996–1007 (2001)

    Article  Google Scholar 

  7. Lamport, L., Shostak, R., Pease, M.: The Byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)

    Article  MATH  Google Scholar 

  8. Lima, G.M., Burns, A.: A consensus protocol for CAN-based systems. In: 24th IEEE Real-Time Systems Symposium, RTSS 2003, pp. 420–429. IEEE (2003)

    Google Scholar 

  9. Cristian, F., Aghili, H., Strong, R., Dolev, D.: Atomic broadcast: from simple message diffusion to Byzantine agreement. Inf. Comput. 118(1), 158–179 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  10. Pelc, A., Peleg, D.: Broadcasting with locally bounded Byzantine faults. Inf. Process. Lett. 93(3), 109–115 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fitzi, M., Gottesman, D., Hirt, M., Holenstein, T., Smith, A.: Detectable Byzantine agreement secure against faulty majorities. In: Proceedings of the Twenty-First Annual Symposium on Principles of Distributed Computing, pp. 118–126. ACM (2002)

    Google Scholar 

  12. Fitzi, M., Hirt, M.: Optimally efficient multi-valued Byzantine agreement. In: Proceedings of the Twenty-Fifth Annual ACM Symposium on Principles of Distributed Computing, pp. 163–168. ACM (2006)

    Google Scholar 

  13. Bao, F., Igarishi, Y.: Reliable broadcasting in product networks with Byzantine faults. In: Proceedings of Annual Symposium on Fault Tolerant Computing, pp. 262–271. IEEE (1996)

    Google Scholar 

  14. Keichafer, R.M., Walter, C.J., Finn, A.M., Thambidurai, P.M.: The MAFT architecture for distributed fault tolerance. IEEE Trans. Comput. 37(4), 398–404 (1988)

    Article  Google Scholar 

  15. Powell, D., Arlat, J., Beus-Dukic, L., Bondavalli, A., Coppola, P., Fantechi, A., Jenn, E., Rabejac, C., Wellings, A.: GUARDS: a generic upgradable architecture for real-time dependable systems. IEEE Trans. Parallel Distrib. Syst. 10(6), 580–599 (1999)

    Article  Google Scholar 

  16. Totel, E., Beus-Dukic, L., Blanquart, J.P., Deswarte, Y., Powell, D., Wellings, A.: Integrity management in GUARDS. In: Davies, N., Jochen, S., Raymond, K. (eds.) Middleware 1998, pp. 105–122. Springer, London (1998)

    Google Scholar 

  17. Palumbo, D.L., Butler, R.W.: A performance evaluation of the software-implemented fault-tolerance computer. J. Guidance Control Dyn. 9(2), 175–180 (1986)

    Article  Google Scholar 

  18. Hopkins, A.L., Smith, T.B., Lala, J.H.: FTMP: a highly reliable fault-tolerant multiprocess for aircraft. Proc. IEEE 66(10), 1221–1239 (1978)

    Article  Google Scholar 

  19. Han, S., Shin, K.G.: Experimental evaluation of failure-detection schemes in real-time communication networks. In: Twenty-Seventh Annual International Symposium on Fault-Tolerant Computing, FTCS-27, Digest of Papers, pp. 122–131. IEEE (1997)

    Google Scholar 

  20. Rufino, J., Verissimo, P., Arroz, G., Almeida, C., Rodrigues, L.: Fault-tolerant broadcasts in CAN. In: Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing, Digest of Papers, pp. 150–159. IEEE (1998)

    Google Scholar 

  21. AlMohammad, B., Bose, B.: Fault-tolerant communication algorithms in toroidal networks. IEEE Trans. Parallel Distrib. Syst. 10(10), 976–983 (1999)

    Article  Google Scholar 

  22. Hsieh, H.C., Chiang, M.L.: A new solution for the Byzantine agreement problem. J. Parallel Distrib. Comput. 71(10), 1261–1277 (2011)

    Article  MATH  Google Scholar 

  23. Saini, P., Singh, A.K.: An efficient Byzantine fault tolerant agreement. In: AIP Conference Proceedings, vol. 1324, no. 1 (2010)

    Google Scholar 

  24. Wang, S.S., Yan, K.Q., Wang, S.C.: An optimal solution for Byzantine agreement under a hierarchical cluster-oriented mobile ad hoc network. Comput. Electr. Eng. 36(1), 100–113 (2010)

    Article  MATH  Google Scholar 

  25. Moniz, H., Neves, N.F., Correia, M.: Byzantine fault-tolerant consensus in wireless ad hoc networks. IEEE Trans. Mobile Comput. 12(12), 2441–2454 (2013)

    Article  Google Scholar 

  26. Veronese, G.S., Correia, M., Bessani, A.N., Lung, L.C., Verissimo, P.: Efficient Byzantine fault-tolerance. IEEE Trans. Comput. 62(1), 16–30 (2013)

    Article  MathSciNet  Google Scholar 

  27. Kotla, R., Clement, A., Wong, E., Alvisi, L., Dahlin, M.: Zyzzyva: speculative Byzantine fault tolerance. Commun. ACM 51(11), 86–95 (2008)

    Article  Google Scholar 

  28. Keidar, I., Rajsbaum, S.: On the cost of fault-tolerant consensus when there are no faults: preliminary version. SIGACT News 32(2), 45–63 (2001)

    Article  Google Scholar 

  29. Banu, N., Izumi, T., Wada, K.: Adaptive and doubly-expedited one-step consensus in Byzantine asynchronous systems. Parallel Process. Lett. 21(04), 461–477 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  30. Patra, A., Choudhury, A., Rangan, C.P.: Asynchronous Byzantine agreement with optimal resilience. Distrib. Comput. 27(2), 111–146 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  31. Xu, X., Lin, Y.: Checkpoint selection in fault recovery based on Byzantine fault model. In: Fourth International Conference on Computational Intelligence and Communication Networks (CICN), pp. 582–587, November 2012

    Google Scholar 

  32. Widder, J., Biely, M., Gridling, G., Weiss, B., Blanquart, J.P.: Consensus in the presence of mortal Byzantine faulty processes. Distrib. Comput. 24(6), 299–321 (2012)

    Article  MATH  Google Scholar 

  33. Wang, S.C., Yan, K.Q., Ho, C.L., Wang, S.S.: The optimal generalized Byzantine agreement in cluster-based wireless sensor networks. Comput. Stan. Interfaces 36(5), 821–830 (2014)

    Article  Google Scholar 

  34. Abdelhakim, M., Lightfoot, L.E., Ren, J., Li, T.: Distributed detection in mobile access wireless sensor networks under Byzantine attacks. IEEE Trans. Parallel Distrib. Syst. 25(4), 950–959 (2014)

    Article  Google Scholar 

  35. Duran, A., Ferrer, R., Costa, J.J., Gonzàlez, M., Martorell, X., Ayguadé, E., Labarta, J.: A proposal for error handling in OpenMP. Int. J. Parallel Prog. 35(4), 393–416 (2007)

    Article  MATH  Google Scholar 

  36. Bronevetsky, G., Marques, D., Pingali, K., Szwed, P., Schulz, M.: Application-level checkpointing for shared memory programs. In: Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS XI, pp. 235–247. ACM, New York (2004)

    Google Scholar 

  37. Bronevetsky, G., Pingali, K., Stodghill, P.: Experimental evaluation of application-level checkpointing for OpenMP programs. In: Proceedings of the 20th Annual International Conference on Supercomputing, ICS 2006, pp. 2–13. ACM, New York (2006)

    Google Scholar 

  38. Fu, H., Ding, Y.: Using redundant threads for fault tolerance of OpenMP programs. In: 2010 International Conference on Information Science and Applications, pp. 1–8, April 2010

    Google Scholar 

  39. Li, M., Hsiao, M.S.: 3-D parallel fault simulation with GPGPU. IEEE Trans. Comput. Aided Design Integr. Circuits Syst. 30(10), 1545–1555 (2011)

    Article  Google Scholar 

  40. Guo, X., Jiang, H., Li, K.C.: A checkpoint/restart scheme for CUDA applications with complex memory hierarchy. In: 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 247–252, July 2013

    Google Scholar 

  41. Carlo, S.D., Gambardella, G., Martella, I., Prinetto, P., Rolfo, D., Trotta, P.: Fault mitigation strategies for CUDA GPUs. In: 2013 IEEE International Test Conference (ITC), pp. 1–8, September 2013

    Google Scholar 

  42. Xu, X.H., Yang, X.J., Xue, J.L., Lin, Y.F., Lin, Y.S.: PartialRC: a partial recomputing method for efficient fault recovery on GPGPUs. J. Comput. Sci. Technol. 27(2), 240–255 (2012)

    Article  Google Scholar 

  43. Laosooksathit, S., Nassar, R., Leangsuksun, C., Paun, M.: Reliability-aware performance model for optimal GPU-enabled cluster environment. J. Supercomputing 68(3), 1630–1651 (2014)

    Article  Google Scholar 

  44. Demchik, V., Kolomoyets, N.: QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems (2013)

    Google Scholar 

  45. Avizienis, A.: Fault-tolerance: a property that ensures constant availability of digital system. IEEE Trans. Comput. 66(10), 5–25 (1978)

    MathSciNet  Google Scholar 

  46. Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  47. Mamedli, È.M., Samedov, R.Y., Sobolev, N.: A method for localization of Byzantine and nonbyzantine faults. Avtomatika i Telemekhanika 5, 126–138 (1992)

    MATH  Google Scholar 

  48. Samet, R.: Recovery device for real-time dual-redundant computer systems. IEEE Trans. Dependable Secure Comput. 8(3), 391–403 (2011)

    Article  Google Scholar 

  49. Samet, R.: Choosing between design options for real-time computers tolerating a single fault. J. Circuits Syst. Comput. 19(05), 1041–1068 (2010)

    Article  Google Scholar 

  50. Sivencrona, H., Johannessen, P., Persson, M., Torin, J.: Heavy-ion fault injections in the time-triggered communication protocol. In: Lemos, R., Weber, T.S., Camargo, J.B. (eds.) LADC 2003. LNCS, vol. 2847, pp. 69–80. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45214-0_8

    Chapter  Google Scholar 

  51. Driscoll, K., Hall, B., Sivencrona, H., Zumsteg, P.: Byzantine fault tolerance, from theory to reality. In: Anderson, S., Felici, M., Littlewood, B. (eds.) SAFECOMP 2003. LNCS, vol. 2788, pp. 235–248. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39878-3_19

    Chapter  Google Scholar 

  52. Tanenbaum, A.S.: Computer Networks, vol. 3. Prentice Hall, New Jersey (1996)

    MATH  Google Scholar 

  53. Stallings, W.: Data and computer communications. Pearson/Prentice Hall (2007)

    Google Scholar 

  54. Mullender, S.: Distributed Systems. ACM Press/Addison-Wesley Publishing Co. (1993)

    Google Scholar 

  55. Coulouris, G.F., Dollimore, J., Kindberg, T.: Distributed Systems: Concepts and Design. Pearson education (2005)

    Google Scholar 

  56. Mitra, S., Saxena, N.R., McCluskey, E.J.: A design diversity metric and analysis of redundant systems. IEEE Trans. Comput. 51(5), 498–510 (2002)

    Article  Google Scholar 

  57. Samedov, R.: An approach to the support of the fault-tolerance of the double redundant computer control systems. Math. Comput. Appl. 4(2), 175–184 (1999)

    Google Scholar 

  58. Kim, H., Jeon, H.J., Lee, K., Lee, H.: The design and evaluation of all voting triple modular redundancy system. In: Proceedings. Annual Reliability and Maintainability Symposium, pp. 439–444. IEEE (2002)

    Google Scholar 

  59. Smith, T.B.: Fault tolerant processor concepts and operation. In: Digest of Papers, FTCS-14, Kissimmee, USA, pp. 158–163 (1984)

    Google Scholar 

  60. Laprie, J.C.: Dependable computing and fault-tolerance. In: Digest of Papers FTCS-15, pp. 2–11 (1985)

    Google Scholar 

  61. Mamedli, È.M., Samedov, R.Y., Sobolev, N.: A method for localization of Byzantine and NonByzantine faults. J. Autom. Remote Control 53(5), 734–744 (1992)

    MATH  Google Scholar 

  62. Oh, N., Mitra, S., McCluskey, E.J.: ED4I: error detection by diverse data and duplicated instructions. IEEE Trans. Comput. 51(2), 180–199 (2002)

    Article  Google Scholar 

  63. Siewiorek, D.P., Swarz, R.S.: Reliable Computer Systems: Design and Evaluation, 3rd edn. A.K. Peters Ltd., Natick (1998)

    MATH  Google Scholar 

  64. Samet, R.: Fault-tolerant procedures for redundant computer systems. Qual. Reliab. Eng. Int. 25(1), 41–68 (2009)

    Article  Google Scholar 

  65. Hurst, S.L.: VLSI Testing: digital and mixed analogue/digital techniques, vol. 9. IET (1998)

    Google Scholar 

  66. Lala, P.K.: Self-checking and fault-tolerant digital design. Morgan Kaufmann (2001)

    Google Scholar 

  67. Powell, D.: Failure mode assumptions and assumption coverage. In: Randell, B., Laprie, J.C., Kopetz, H., Littlewood, B. (eds.) Predictably Dependable Computing Systems, pp. 123–140. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  68. Laprie, J.C., Arlat, J., Blanquart, J., Costes, A., Crouzet, Y., Deswarte, Y., Fabre, J., Guillermain, H., Kaâniche, M., Kanoun, K., et al.: Guide de la sûreté de fonctionnement (dependability handbook). Cépaduès, Toulouse (1995)

    Google Scholar 

  69. Ziv, A., Bruck, J.: An on-line algorithm for checkpoint placement. IEEE Trans. Comput. 46(9), 976–985 (1997)

    Article  MathSciNet  Google Scholar 

  70. Ling, Y., Mi, J., Lin, X.: A variational calculus approach to optimal checkpoint placement. IEEE Trans. Comput. 50(7), 699–708 (2001)

    Article  Google Scholar 

  71. Lincoln, P., Rushby, J.: A formally verified algorithm for interactive consistency under a hybrid fault model. In: The Twenty-Third International Symposium on Fault-Tolerant Computing, FTCS-23, Digest of Papers, pp. 402–411. IEEE (1993)

    Google Scholar 

  72. Meyer, F.J., Pradhan, D.K.: Consensus with dual failure modes. IEEE Trans. Parallel Distrib. Syst. 2(2), 214–222 (1991)

    Article  Google Scholar 

  73. Thambidurai, P., Park, Y.K.: Interactive consistency with multiple failure modes. In: Proceedings, Seventh Symposium on Reliable Distributed Systems, pp. 93–100. IEEE (1988)

    Google Scholar 

  74. Chor, B., Coan, B.A.: A simple and efficient randomized Byzantine agreement algorithm. IEEE Trans. Softw. Eng. 6, 531–539 (1985)

    Article  MathSciNet  Google Scholar 

  75. Kopetz, H.: Real-Time Systems: Design Principles for Distributed Embedded Applications. Springer Science & Business Media, London (2011)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Refik Samet .

Editor information

Editors and Affiliations

Appendix I: Examples

Appendix I: Examples

Example 1

Let us consider a Redundant Computer System (RCS) which consists of four computers (nodes) (Fig. 1 with \(N=4\)). Suppose that the fourth node is faulty and computational results of non-faulty nodes are “1”. If the fourth node sends the same values, namely logical “0”, to all others during the exchange by computational results it means that the fault type is Non-Byzantine.

figure a

As we see, vectors consist of the same values in all non-faulty nodes. However, if the fourth node sends different values, namely to the first and third nodes logical “0” and to the second node logical “1”, it means that the fault type is Byzantine.

figure b

As we see, vectors consist of different values in non-faulty nodes. The index of x (“0” or “1”) refers to values in faulty nodes.

Example 2

For explanation of the masking of fault forms, the Determinate Byzantine agreement protocol will be used. Suppose that \(N=4\), \(k=1\), consequently, \(m=2\) (Table 3). Let us consider the examples for three cases: (1) there is no fault in RCS; (2) the Byzantine fault appears in RCS during the first intercomputer communication round and (3) the Byzantine fault appears in RCS during the second intercomputer communication round.

Example 2.1. First, let us consider the case when there is no fault in RCS. Suppose that computational results of nodes are the logical “1”.

figure c

In the first intercomputer communication round, nodes exchange the computational results. So after the first round, the following vectors are formed.

figure d

In the second intercomputer communication round, the \(i^{th}\) node transmits to the \(j^{th}\) node the computational result of the \(n^{th}\) node, where i, j, \(n= 1, 2, 3, 4\) and \(i \ne j \ne n\). After the second round, the matrices of the computational results are formed.

figure e

After majority voting in columns, the final vectors are formed. Element \(a_{nn}\) is chosen as majority of values in the \(n^{th}\) column, where n = 1, 2, 3, 4.

figure f

As we see, all matrices consist of the same computational results and all nodes forms the same final vectors. This means that there is no fault in RCS.

Example 2.2. Now, let us consider the case when the Byzantine fault appears in the first intercomputer communication round. Suppose that the fourth node is faulty and computational results of nodes are the logical “1”, “1”, “1”, “x” (“0” or “1”).

figure g

Suppose that the faulty node sends to the first and third nodes the logical “0” and to the second node the logical “1” during the first intercomputer communication round. After the first round, the following vectors are formed.

figure h

After the second round, the following matrices and after majority voting in columns, the following final vectors are formed.

figure i
figure j

As we see, the matrices in all non-faulty nodes consist of different computational results. However, all non-faulty nodes form the same final vectors, where the fourth computational result differs from others. So the non-faulty nodes determine that the fourth node is faulty. In this case, the Determinate Byzantine agreement algorithm transforms the Byzantine fault type appearing in the first intercomputer communication round to Non-Byzantine type.

Example 2.3. Finally, let us consider the case when the Byzantine fault appears in the second intercomputer communication round. Suppose that the computational results of nodes are the logical “1”.

figure k

After the first round, the following vectors are formed.

figure l

Suppose that the Byzantine fault appears in the fourth node during the second intercomputer communication round. The faulty node will send different computational results to the non-faulty nodes. After the second round, the following matrices are formed.

figure m

After majority voting in columns, the following final vectors are formed.

figure n

As we see, the matrices in all non-faulty nodes consist of different computational results. Despite this, all non-faulty nodes form the same final vectors where all computational results are the same. The non-faulty nodes determine that there is no fault in RCS. This means that the Determinate Byzantine agreement algorithm masks the Byzantine fault type that appeared in the second round. The same situation will take place when the Non-Byzantine fault appears in RCS during the second intercomputational round. As result, the Byzantine agreement algorithms mask the Non-Byzantine and Byzantine fault appearance forms that occurred during the second and following rounds. Masking of faults is very dangerous because they can be accumulated and lead the system to failure.

Example 3

Let us show that the use of a Byzantine agreement algorithm (for example, the Determinate Byzantine agreement algorithm) cannot counteract the Byzantine fault in RCS with \(N=3\) (Fig. 1 with \(N=3\)). Suppose that the third node is faulty and computational results of nodes are “1”, “1”, “x” (“0” or “1”).

figure o

Suppose that the faulty node sends to the first node the logical “0” and to the second node the logical “1” during the first intercomputer communication round. So after the first round, the following vectors are formed.

figure p

After the second round, the following matrices are formed.

figure q

After majority voting in columns, the following final vectors are formed. As we see, all non-faulty nodes form the different final vectors and could not determine the faulty node. So the Byzantine agreement algorithms cannot detect the Byzantine fault in RCS with \(N=3\).

figure r

Example 4

Let us consider RCS where Non-Byzantine and Byzantine fault types may occur (Fig. 12).

Fig. 12.
figure 12

Example for RCS using Protocol #II

Suppose that the \(n^{th}\) (\(n=1, 2, 3, 4\)) computer (node) in RCS consists of Central Processor (CP), Input Processor (IP) and Output Processor (OP) as shown in Fig. 12.

Each CP controls its own computational process, computes its own computational result and executes its own fault-tolerant procedure on the basis of IDS which consists of computational results of all nodes in RCS. Each IP consists of 4 receivers (R1, R2, R3, and R4) which receive the computational results from the other nodes. Each OP consists of 4 transmitters (T1, T2, T3, and T4) which transmit the computational results to the other nodes.

On the one hand, in this protocol, OP of the \(n^{th}\) node (\(n=1, 2, 3, 4\)) transmits the computational results to IP of the \(i^{th}\) nodes (\(i=1, 2, 3, 4\) and \(n\) \(\ne \) \(i\)) in parallel, by using three busses. Nodes execute this procedure in sequential order by using three busses in the time-sharing mode. For example:

  • T1, T2 and T3 of OP of the \(1^{st}\) node transmit the computational result in the instant of time \(t_1\) to other nodes by using three busses in parallel;

  • T1, T2 and T3 of OP of the \(2^{nd}\) node transmit the computational result in instant of time \(t_2\) to other nodes by using three busses in parallel;

  • T1, T2 and T3 of OP of the \(3^{rd}\) node transmit the computational result in instant of time \(t_3\) to other nodes by using three busses in parallel;

  • T1, T2 and T3 of OP of the \(4^{th}\) node transmit computational result in instant of time \(t_4\) to other nodes by using three busses in parallel.

On the other hand, in this protocol, IP of the \(n^{th}\) node (\(n=1, 2, 3, 4\)) receives the computational results from OP of the \(i^{th}\) nodes (\(i=1, 2, 3, 4\) and \(n\) \(\ne \) \(i\)) in the time-sharing mode by using one bus. All nodes execute this procedure in parallel. For example:

  • R1 of IP of the \(1^{st}\) node receives the computational results from T1 of the \(2^{nd}\), \(3^{rd}\) and \(4^{th}\) nodes in instants of times \(t_1\), \(t_3\) and \(t_4\) accordingly by using one bus;

  • R1 of IP of the \(2^{nd}\) node receives the computational results from T1 of the \(1^{st}\) node and from T2 of \(3^{rd}\) and \(4^{th}\) nodes in instants of times \(t_1\), \(t_3\) and \(t_4\) accordingly by using one bus;

  • R1 of IP of the \(3^{rd}\) node receives the computational results from T2 of the \(1^{st}\) and \(2^{nd}\) nodes and from T3 of the \(4^{th}\) node in instants of times \(t_1\), \(t_2\) and \(t_4\) accordingly by using one bus;

  • R1 of IP of the \(4^{th}\) node receives the computational results from T3 of the \(1^{st}\), \(2^{nd}\) and \(3^{rd}\) nodes in instants of times \(t_1\), \(t_2\) and \(t_3\) accordingly by using one bus.

We assume that only one fault may occur in RCS in any instant of time (Sect. 2.4). According to this assumption, three cases may take place:

  1. (1)

    Fault occurs in IP (CP and OP are non-faulty) of the \(n^{th}\) node. The functions of IP are to receive the computational results from the other nodes and to save them in its buffer memory. According to assumption if IP is faulty, CP and OP are non-faulty in the \(n^{th}\) (\(n=1, 2, 3, 4\)) node.

    • Faulty IP may change the received correct computational results from the \(i^{th}\) (\(i=1, 2, 3, 4\) and \(n \ne i\)) nodes in the first and following communication rounds and save correct or incorrect computational results in its buffer memory.

    • Non-faulty CP computes its own correct computational result.

    • Non-faulty OP transmits the same (correct) computational result (which is computed in non-faulty CP and which was not received and changed by faulty IP) in the first communication round and the same (possibly correct or incorrect) computational results (which were received and may be changed by faulty IP in the first and following communication rounds) in the second and the following communication rounds to the other nodes.

    In this case, transmitters of non-faulty OP will transmit the same computational results (it is not important whether it is correct or incorrect) to other nodes during all communication rounds. Consequently, if a fault occurs in IP, the type of appeared faults in the above RCS structure is only Non-Byzantine.

  2. (2)

    Fault occurs in CP (IP and OP are non-faulty) of the \(n^{th}\) node. The functions of CP are to control its own computational process, to compute its own computational result and to execute its own fault-tolerant procedure on the basis of IDS which consists of computational results of all nodes in RCS. According to our assumption, if CP is faulty, IP and OP are non-faulty in the \(n^{th}\) node.

    • Non-faulty IP receives the same (correct) computational results from the other nodes in the first and following communication rounds.

    • Faulty CP may compute its own incorrect computational result.

    • Non-faulty OP transmits the same (may be correct or incorrect) computational result (which is computed in faulty CP of the \(n^{th}\) node) in the first communication round and the same (correct) computational results (which were received by non-faulty IP in the first and the following communication rounds) in the second and following communication rounds.

    In this case, transmitters will also transmit the same computational results (it is not important whether it is correct or incorrect) to other nodes during all communication rounds. Consequently, if fault occurs in CP, the type of appeared faults in the above mentioned RCS structure is also only Non-Byzantine.

  3. (3)

    Fault occurs in OP (CP and IP are non-faulty) of the \(n^{th}\) node. The main function of OP is to transmit the computational results to the other nodes. According to our assumption, if OP is faulty, IP and CP are non-faulty in the \(n^{th}\) node.

    • Non-faulty IP receives the correct computational results from the other nodes in the first and following communication rounds.

    • Non-faulty CP computes its own correct computational result.

    • Faulty OP might transmit the different computational results to other nodes during all communication rounds because OP has multiple transmitters (in this case three of them are used) for transmission and one or more of them might be faulty and may change the transmitted values.

    Consequently, if fault occurs in OP, the type of appeared faults in the above RCS structure is Non-Byzantine or Byzantine.

Let us change the connections between nodes in RCS in Fig. 12 so that the Byzantine fault type could not occur (Fig. 13).

Fig. 13.
figure 13

Example for RCS using Protocol #IV (Byzantine fault free protocol)

On the one hand, in this protocol, only one transmitter of OP of the \(n^{th}\) node (\(n=1, 2, 3, 4\)) transmits the computational result to IP of all other nodes in the broadcast mode by using one bus simultaneously. Nodes execute this procedure in sequence. For example:

  • T1 of OP of the \(1^{st}\) node transmits the computational result in instant of time \(t_1\) to other nodes in the broadcast mode by using one bus simultaneously;

  • T1 of OP of the \(2^{nd}\) node transmits the computational result in instant of time \(t_2\) to other nodes in the broadcast mode by using one bus simultaneously;

  • T1 of OP of the \(3^{rd}\) node transmits the computational result in instant of time \(t_3\) to other nodes in the broadcast mode by using one bus simultaneously;

  • T1 of OP of the \(4^{th}\) node transmits the computational result in instant of time \(t_4\) to other nodes in the broadcast mode by using one bus simultaneously;

On the other hand, IP of the \(n^{th}\) node (\(n=1, 2, 3, 4\)) receives the computational results from OP of the \(i^{th}\) nodes (\(i=1, 2, 3, 4\) and \(n\) \(\ne \) \(i\)) by using (N-1) bus in sequential order. All nodes execute this procedure in parallel. For example:

  • R1, R2 and R3 of IP of the \(1^{st}\) node receive the computational result from \(2^{nd}\), \(3^{rd}\) and \(4^{th}\) nodes in instants of times \(t_2\), \(t_3\) and \(t_4\) accordingly by using three different busses;

  • R1, R2 and R3 of IP of the \(2^{nd}\) node receive the computational result from \(1^{st}\), \(3^{rd}\) and \(4^{th}\) nodes in instants of times \(t_1\), \(t_3\) and \(t_4\) accordingly by using three different busses;

  • R1, R2 and R3 of IP of the \(3^{rd}\) node receive the computational result from \(1^{st}\), \(2^{nd}\) and \(4^{th}\) nodes in instants of times \(t_1\), \(t_2\) and \(t_4\) accordingly by using three different busses;

  • R1, R2 and R3 of IP of the \(4^{th}\) node receive the computational result from \(1^{st}\), \(2^{nd}\) and \(3^{rd}\) nodes in instants of times \(t_1\), \(t_2\) and \(t_3\) accordingly by using three different busses;

According to the assumption, three cases are also possible here. Discussions about the first and second cases are the same as for the previous protocol. The difference is in the third case in which the transmitter of faulty OP transmits the same computational results (it is not important whether correct or incorrect) because of the used broadcast mode.

Consequently, if a fault occurs in OP, the type of faults appearing in the RCS structure will only be Non-Byzantine.

As a result, we changed the connections between nodes and got RCS where only Non-Byzantine fault type might occur. Consequently, by changing connection modes between nodes we can block the occurrence of the Byzantine fault type in RCS.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer-Verlag GmbH Germany

About this chapter

Cite this chapter

Samet, R., Samet, N. (2017). Analysis of Relationship Between Modes of Intercomputer Communications and Fault Types in Redundant Computer Systems. In: Gavrilova, M., Tan, C. (eds) Transactions on Computational Science XXIX. Lecture Notes in Computer Science(), vol 10220. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-54563-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-54563-8_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-54562-1

  • Online ISBN: 978-3-662-54563-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics