Abstract
In this paper, the concept ofk-submesh andk-submesh connectivity fault tolerance model is proposed. And the fault tolerance of 3-D mesh networks is studied under a more realistic model in which each network node has an independent failure probability. It is first observed that if the node failure probability is fixed, then the connectivity probability of 3-D mesh networks can be arbitrarily small when the network size is sufficiently large. Thus, it is practically important for multicomputer system manufacturer to determine the upper bound for node failure probability when the probability of network connectivity and the network size are given. A novel technique is developed to formally derive lower bounds on the connectivity probability for 3-D mesh networks. The study shows that 3-D mesh networks of practical size can tolerate a large number of faulty nodes thus are reliable enough for multicomputer systems. A number of advantages of 3-D mesh networks over other popular network topologies are given. Compared to 2-D mesh networks, 3-D mesh networks are much stronger in tolerating faulty nodes, while for practical network size, the fault tolerance of 3-D mesh networks is comparable with that of hypercube networks but enjoys much lower node degree.
Similar content being viewed by others
References
Alverson R. The Tera computer system. InProc. Int. Conf. Supercomputing, 1990, pp.1–6.
Cray T3D System Architecture Overview. Technical Report, Cray Research Inc. HR-04033, March, 1994.
Allen F, Almasi G, Andreoni Wet al. Blue Gene: A vision for protein science using a petaflop supercomputer.IBM Systems Journal 2001, 40: 310–337.
Chuang P, Tzeng N. Allocating precise submesh in mesh-connected systems.IEEE Trans. Parallel and Distributed Systems, 1994, 5(2): 211–217.
Liu T, Huang W. Lombardi Fet al. A submesh allocation scheme for mesh-connected multiprocessor systems. InProc. Int. Conf. Parallel Processing II, 1995, pp.159–163.
Chang C, Mohapatra P. An efficient method for approximating submesh reliability of two-dimensional meshes.IEEE Trans. parallel and Distributed Systems, 1998, 9(11): 1115–1124.
Yoo B, Das C. A fast and efficient processor allocation scheme for mesh-connected multicomputers.IEEE Trans. Computers. 2002, 51(1): 46–60.
Almohammand B F A, Bose Bella. Fault-tolerant communication algorithms in toroidal networks.IEEE Trans. Parallel and Distributed Systems, 1999, 10(10) 976–983.
Cang S, Wu J. Time-step optimal broadcasting in 3-D meshes with minimum total communication distance.Journal of Parallel and Distributed Computing, 2000, 60: 966–997.
Wu J. A simple fault-tolerant adaptive and minimal routing approach in 3-D meshes.Journal of Computer Science and Technology, 2003, 18(1): 1–13.
Boppana R, Chalasani S. Fault-tolerant wormhole routing algorithms for mesh networks.IEEE Trans. Computers, 1995, 44(7): 848–864.
Chen C, Chiu G. A fault-tolerant routing scheme for meshes with nonconvex faults.IEEE Trans. Parallel and Distributed Systems, 2001, 12(5): 467–475.
Kim S, Han T. Fault-tolerant wormhole routing in mesh with overlapped solid fault regions.Parallel Computing, 1997, 23: 1937–1962.
Wu J, Chen X. Fault-tolerant tree-based multicasting in mesh multicomputers.Journal of Computer Science and Technology, 2001, 16(5): 393–400.
Leighton F T. Introduction to Parallel Algorithms and Architectures. Arrays, Trees, Hypercubes. Morgan Kaufmann Publishers. San Mateo, CA, 1992.
Chen J, Kang I, Wang G. Hypercube network fault, tolerance: A probabilistic approach. InProc. Int. Conf. Parallel Processing (ICPP'2002), 2002, pp.65–72.
Najjar W, Gaudiot J. Network resilience: A measure of network fault tolerance.IEEE Trans. Computers, 1990, 39(2): 174–181.
Chen J, Wang T. Probabilistic analysis on mesh network fault tolerance. InProc. 14th International Conference on Parallel and Distributed Computing and Systems (PDCS'02), 2002, pp.606–611.
Cormen T H, Leiserson C E, Rivest R Let al. Introduction to Algorithms. 2nd Ed., McGraw-Hill, 2001.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported in part by the National Natural Science Foundation of China for Distinguished Young Scholars under Grant No.69928201, the Major Research Plan of National Natural Science Foundation of China, Grant No.90104028, and by the National Science Foundation of USA under Grant NoCCR-0000206.
Gao-Cai Wang received the M.S. degree in geographic information system from Central South University, China, in 2001. Currently, he is a Ph.D. candidate in the Department of Computer Science, College of Information Science and Engineering at Central South University. His research interests include computer networks, routing algorithms, computer fault tolerance. He has published more than 15 papers in these areas.
Jian-Er Chen received the Ph.D. degree in computer science from the Courant Institute of Mathematical Science, New York University (NYU), in 1987. After graduation from NYU, he went to the Department of Mathematics at Columbia University, where he received the Ph.D. degree in mathematics in 1990. Since then, he has been with the Department of Computer Science at Texas A & M University, where he is currently a professor. He also holds a Chang Jiang Scholar Professorship at Central South University, China. His research interests include computational complexity and optimization, graph theory and algorithms, parallel processing and networks, and computer graphics. He has published more than 100 papers in these areas.
Guo-Jun Wang received the M.S. degree and Ph.D. degree in computer science from Central South University, China, in 1996 and 2002, respectively. Currently, he is an associate professor in the Department of Computer Science, College of Information Science and Engineering at Central South University. His research interests include computer networks, routing algorithms, computer fault tolerance, and software engineering. He has published more than 40 papers in these areas.
Rights and permissions
About this article
Cite this article
Wang, GC., Chen, JE. & Wang, GJ. On fault tolerance of 3-dimensional mesh networks. J. Comput. Sci. & Technol. 19, 183–190 (2004). https://doi.org/10.1007/BF02944796
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02944796