Analyzing the reliability of shuffle-exchange networks using reliability block diagrams

https://doi.org/10.1016/j.ress.2014.07.012Get rights and content

Highlights

  • The impact of increasing the number of stages on reliability of MINs is investigated.

  • The RBD method as an accurate method is used for the reliability analysis of MINs.

  • Complex series–parallel RBDs are used to determine the reliability of the MINs.

  • All measures of the reliability (i.e. terminal, broadcast, and network reliability) are analyzed.

  • All reliability equations will be calculated for different size N×N.

Abstract

Supercomputers and multi-processor systems are comprised of thousands of processors that need to communicate in an efficient way. One reasonable solution would be the utilization of multistage interconnection networks (MINs), where the challenge is to analyze the reliability of such networks. One of the methods to increase the reliability and fault-tolerance of the MINs is use of various switching stages. Therefore, recently, the reliability of one of the most common MINs namely shuffle-exchange network (SEN) has been evaluated through the investigation on the impact of increasing the number of switching stage. Also, it is concluded that the reliability of SEN with one additional stage (SEN+) is better than SEN or SEN with two additional stages (SEN+2), even so, the reliability of SEN is better compared to SEN with two additional stages (SEN+2). Here we re-evaluate the reliability of these networks where the results of the terminal, broadcast, and network reliability analysis demonstrate that SEN+ and SEN+2 continuously outperform SEN and are very alike in terms of reliability.

Introduction

Demands for even more computing power have never ceased. Although the performance of processors has been doubled in approximately every three-year span from 1980 to 1996, some important issues were revealed, whose solutions would require a huge amount of computing power. In 1987, the U.S. government office of science and technology policy defined many grand challenging problems as fundamental applications to science or engineering, whose solutions would be enabled by applying high-performance computing resources that could become available in the near future. Some of these problems include: Computational fluid dynamics: long-range weather prediction, global climate change, computational ocean sciences, enhanced oil and gas recovery, nuclear reactor design, automobile and hypersonic aircraft design, and quiet submarines. Electronic structure calculations for the design of new materials: chemical catalysts, immunological agents, drug design, human genome, semiconductors, and superconductors. Plasma dynamics for fusion energy applications and military: nuclear fusion, combustion systems, air, sea, and undersea surveillance for safe and efficient military technology. Calculations to understand the fundamentals of matter: quantum chromodynamics, astrophysics, structural analysis, seismology, and condensed matter theory. Symbolic computational: speech recognition, natural language processing, computer vision, image processing, automated reasoning, data mining for modeling business and financial processes, and discrete and continuous simulations of design, manufacturing, and production issues, e.g., in transportation systems. In order to solve these challenging problems, the goal is to obtain computer systems capable of computing at the 1012 floating-point operations per second (teraflops). Even the smallest of these problems requires gigaflops of performance for hours at a time, the larger problems require teraflops performance for more than a thousand hours at a time [1], [2]. Single-processor supercomputers have achieved unheard of speeds and have been pushing hardware technology to the physical limits of cheap manufacturing. However, this trend will reach soon to an end, because there are physical and architectural bounds that limit the computational power of a single-processor system. Regarding the processing speed limitation of sequential computers and the easy and cost-effective availability of VLSI technology, employing a large number of processors to accomplish a given computation is an alternative. Parallel computers with multiple processors are opening the door to teraflops computing performance to meet the increasing demand of computational power. The main argument for using multi-processors is to create powerful computers through simply connecting them by multiple processors. A multi-processor system is expected to reach faster speed than the faster single-processor system. Furthermore, a multi-processor system consisting of a number of single processors is expected to be more cost-effective compared with building a high-performance single processor. Another advantage of a multi-processor system is fault-tolerance, if a processor fails, the remaining processors should be able to provide continued service, albeit with degraded performance [1], [3], [4].

Given the above discussion, several questions may arise; how do these processing elements communicate and cooperate? In which fashion data should be transmitted between individual processors? What sort of interconnection is provided? In other words, a parallel computer requires some kinds of sub-system communications to interconnect processors, memories, disks, and other peripherals. The task of communicating between different nodes is the responsibility of the interconnection networks. An interconnection network is a system of switches and links that connects N input channels to M output channels that can be used for internal connections among processors, memory modules and I/O devices. Therefore, the design of an efficient interconnection network is very critical for the construction of efficient multi-processor systems [1], [5].

With the increasing number of nodes in a supercomputer environment, it seems that the desirable option is to use the multistage interconnection networks (MINs), since MINs are able to provide a good performance at a relatively low cost [1], [6], [7]. Therefore, MINs are often used in the context of SIMD (single-instruction multiple-data) and MIMD (multiple-instruction multiple-data) parallel machines and are also increasingly adopted for implementing the switching fabric of high-capacity communication processors, including ATM switches, gigabit Ethernet switches, and terabit routers [7], [8]. For instance, MINs are frequently used to connect the nodes of IBMSP [9] and CARY X-MP series [10].

Generally, MIN׳s structure consists of two parts: sources (inputs) and destinations (outputs). These sources and destinations are connected by multi-stage switching elements so that all sources have access to all destinations [1], [11].

MINs can be divided into two categories: single-path MINs and multiple-path MINs. Generally, single-path MINs are built from switching elements of size 2×2. This minimization helps to reduce the hardware costs in these networks. The numbers of switching stages in single-path MINs of size N×N is (log2N) and in each stage there are (N/2)switching elements. The network complexity (total number of switching elements in MINs) of single-path MINs is(N2log2N), which is a reasonable network complexity compared to the network complexity (N2) of the crossbar [11], [12], [13]. Thanks to the low cost of single-path MINs, many networks such as shuffle-exchange network (SEN) [14], Baseline [15], and Generalized Cube [16] are an excellent choice for large-scale systems. But a major problem in this kind of MINs is that there is only one path between each source-destination pair. Thus, if one of the switches across the path becomes unavailable (faulty or busy), the entire network collapses. The solution would be to increase the fault-tolerance [13], [17], [18].

The basic idea of MIN׳s fault-tolerance is to increase the number of paths between each source-destination pair, so that alternate paths can be taken in case of unavailability. A method to create redundancy in the single-path MINs is to increase the number of switching stages [11], [14], [18], [19].

SEN is a single-path MIN, its double-path version is named SEN+ (with one additional stage), and its quadruple-path is SEN+2 (with two additional stages). The analysis of the three networks SEN, SEN+, and SEN+2 shows that the reliability of SEN+ is more than two other networks and also reliability of SEN is much higher than that of SEN+2 [20]. These results prove quite surprising; because, as mentioned, single-path MINs are the most vulnerable MINs, and fail with a single fault (the minimum fault possible) and it was unexpected for SEN to be more reliable than SEN+2. The main reason for poor results regarding SEN+2 is higher network complexity in comparison with the other two [20]. It should be noted that these results are very influential and can be extended to other types of single-path MINs because these MINs are equivalent in terms of topology [11], [19], [21].

In this paper, we will analyze the three reliability parameters, terminal, broadcast, and network reliability for SEN, SEN+, and SEN+2. As we shall observe, analytical results indicate that both SEN+2 and SEN+ perform very alike in terms of reliability. But both of them are always more reliable compared to the single-path SEN.

The rest of this paper is organized as follows: motivation, a vision of related works, contribution, and structure of SEN, SEN+, and SEN+2 will be presented in Section 2. Three reliability parameters, terminal, broadcast, and network reliability will be evaluated for SEN, SEN+, and SEN+2 in 3 Terminal reliability of SEN, SEN+, and SEN+2, 4 Broadcast reliability of SEN, SEN+, and SEN+2, 5 Network reliability of SEN, SEN+, and SEN+2 respectively. At last, some conclusions will be made in Section 6.

Section snippets

Motivation

The main approach for improving the fault-tolerance of MINs is to create redundant paths between each source-destination pair. On the other hand, increasing the number of stages is one of the major ideas for creating redundancy in MINs׳ paths [14], [18], [19]. However, reliability should be considered as well. The question that arises is what the impact of increasing the number of stages is on reliability. Previous analyzes have shown that one extra stage has positive impact on the reliability

Terminal reliability of SEN, SEN+, and SEN+2

Terminal reliability shows the reliability between pairs of sources and destinations, defined as the probability of the occurrence of at least one fault-free path between a source–destination pair.

SEN is a single-path MIN, so all switches between a source-destination pair is vital. Terminal reliability RBD of SEN of size N×N is shown in Fig. 4.

Let r be the probability of a switch being operational. Terminal reliability of N×N SEN is obtained by the Eq. (1).Rt(SEN)=rlog2N

SEN+ is a double-path

Broadcast reliability of SEN, SEN+, and SEN+2

Broadcast reliability represents the reliability between a given source and all destinations, defined as the probability of successful communication between the source and all network destinations. Therefore, according to the network structure as discussed earlier, all switches in the last stage are vital in this type of reliability. With respect to Fig. 1, broadcast reliability RBD of N×N SEN is shown in Fig. 8.

In Fig. 8, K is calculated using Eq. (11).K=i=1log2NN2i

Broadcast reliability of N×N

Network reliability of SEN, SEN+, and SEN+2

Network reliability represents the reliability of the connections between all sources and all destinations, defined as the probability of successful communication between all sources and all network destinations. Therefore, according to the network structure as discussed, all switches in the first and last stages are critical in this type of reliability. With respect to Fig. 2, network reliability RBD of N×N SEN is shown in Fig. 12.

Network reliability of N×N SEN is calculated by the Eq. (25).Rn(

Conclusions and future works

In this paper, we analyzed the reliability of three networks, SEN, SEN+, and SEN+2 from three perspectives, terminal, broadcast, and network in order to investigate the impact of increasing the number of switching stages on the reliability of MINs. All the analysis on terminals, broadcast, and network reliability achieve almost the same results. These results demonstrate that SEN+ and SEN+2 perform the same in terms of reliability. Also, the reliability of these networks is always more than the

Acknowledgment

The authors would like to thank the editor and anonymous reviewers whose helpful comments improved the quality of this paper.

References (55)

  • Won-Hee Kang et al.

    A rapid reliability estimation method for directed acyclic lifeline networks with statistically dependent components

    Reliab Eng Syst Saf

    (2014)
  • Youngsuk Kim et al.

    Network reliability analysis of complex systems using a non-simulation-based method

    Reliab Eng Syst Saf

    (2013)
  • N. Padmavathy et al.

    Evaluation of mobile ad hoc network reliability using propagation-based link reliability model

    Reliab Eng Syst Saf

    (2013)
  • Won-Hee Kang et al.

    Matrix-based system reliability method and applications to bridge networks

    Reliability Engineering & System Safety 93.11

    (2008)
  • Kellie Schneider

    Social network analysis via multi-state reliability and conditional influence models

    Reliab Eng Syst Saf

    (2013)
  • Qing Shuang et al.

    Node vulnerability of water distribution networks under cascading failures

    Reliab Eng Syst Saf

    (2014)
  • Yan-Fu Li et al.

    Non-dominated sorting binary differential evolution for the multi-objective optimization of cascading failures protection in complex networks

    Reliab Eng Syst Saf

    (2013)
  • Mohsen Jahanshahi et al.

    A mathematical formulation for joint channel assignment and multicast routing in multi-channel multi-radio wireless mesh networks

    J Netw Comput Appl

    (2011)
  • Jose Duato et al.

    Interconnection Networks: An Engineering Approach

    (2003)
  • Miltos D. Grammatikakis et al.

    Parallel System Interconnections and Communications

    (2000)
  • Hesham El-Rewini et al.

    Advanced Computer Architecture and Parallel Processing

    (2005)
  • Moreshwar R. Bhujade

    Parallel Computing

    (2004)
  • David E. Culler et al.

    Parallel Computer Architecture: A Hardware/Software Approach

    (1999)
  • William James Dally et al.

    Principles and Practices of Interconnection Networks

    (2004)
  • Dimitris C. Vasiliadis et al.

    Modelling and performance study of finite-buffered blocking multistage interconnection networks supporting natively 2-class priority routing traffic

    J Netw Comput Appl

    (2013)
  • Rudy Lauwereins

    Creating a world of smart re-configurable devices

    Field-Programmable Logic and Applications: Reconfigurable Computing Is Going Mainstream

    (2002)
  • Tony Cheung et al.

    A simulation study of the CRAY X-MP memory system

    IEEE Trans Comput

    (1986)
  • Cited by (74)

    • An agent-based dynamic reliability modeling method for multistate systems considering fault propagation: A case study on subsea Christmas trees

      2022, Process Safety and Environmental Protection
      Citation Excerpt :

      Nevertheless, some engineering systems, such as subsea production systems (Liu et al., 2021), have different logic relationships among components under multiple operational modes. Although some traditional reliability modeling methods, such as MFT (Yingjie et al., 2016), MMDD (Jia et al., 2020), BN (Jiang and Liu, 2017), MSP (Xue and Yang, 1995), UGF (Qiu and Ming, 2019) and RBD (Bistouni and Jahanshahi, 2014), have the ability to switch reliability models according to the real-time operational mode, they are difficulty characterizing the different fault logic of systems. Additionally, the modeling and evaluation work will be increased significantly with more components and operating modes.

    • Reliability assessment of replaceable shuffle-exchange network by using interval-valued universal generating function

      2021, The Handbook of Reliability, Maintenance, and System Safety through Mathematical Modeling
    View all citing articles on Scopus
    View full text