Abstract
We examine the reliability properties of ideal fat-trees, a general model used to capture both distance and bandwidth constraints of various classes of fat-tree networks. We allow the edges and the vertices of the network to fail independently with probability f, and show that: (1) Any fat-tree G can always be partitioned into an upper (G H) and a lower (G L) part. After the faults, the remaining part of G L guarantees that a linear fraction of the leaves of the fat-tree still connect to the upper part, with high probability. (2) G H is robust, in the sense that, after the faults, at least half of the edge-disjoint paths between any set of “leaves” of G H are preserved with probability tending to 1, even in the case of failure probabilities as high as f < 0.25. The robust properties of G H hold for the case that fat-nodes do not have internal edges and also for the case that fat-nodes are random regular graphs. (3) For the special case of a pruned butterfly, there is a critical probability p c for the existence of a linear sized component surviving the failures and including a large fraction of terminal nodes. We show that p c ≥ 0.42.
This research was partially supported by the EU Long Term Research Projects GEPPCOM (contract No. 9072) and ALCOM-IT (contract No. 20244).
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
M. Ajtai, J. Komlos, and E. Szemeredi, "Largest random component of a k-cube", Combinatorica, 2(1):1–7, 1982.
N. Alon and J. Spencer, “The Probabilistic Method”, John Wiley, 1992.
P. Bay and G. Bilardi, “An area-universal VLSI circuit”, in Proc. of the Symp. on Integrated Systems, pp. 53–67, 1993.
A. Benczur, D. Karger, “Approximating s — t Minimum Cuts in ~O(n2) time”, in the Proc. of the Symp. on Theory of Computing (STOC'96), pp. 47–55, 1996.
G. Bilardi and P. Bay, “An area lower bound for a class of fat-trees”, in Proceedings of the European Symposium on Algorithms, pp. 413–423, 1994.
P. Bay and G. Bilardi, “Deterministic on-line routing on area-universal networks”, Journal of the ACM, 42(3): 614–640, 1995.
G. Bilardi, B. Codenotti, G. Del Corso, G. Pinotti and G. Resta, “Broadcast and Associative Operators on Fat-Trees”, to appear in the Proc. of EUROPAR'97.
R. Cole, B. Maggs and R. Sitaraman, “Routing on butterfly networks with random faults”, in the Proc. of FOCS'95, pp. 558–570, 1995.
P. Eröfis and A. Renyi, “On the evolution of random graphs”, Publ. Math. Inst. Hungar. Acad. Sci., 5:17–61, 1960.
R. Greenberg and C. Leiserson, “Randomized routing on fat-trees”, in Randomness and Computation, pp. 345–374, JAI Press, 1989.
R. Greenberg, “The fat-pyramid and universal parallel computation independent of wire delay”, IEEE Transactions on Computers, C-43 (12): 1358–1364, 1994.
A.R. Karlin, G. Nelson, and H. Tamaki, “On the Fault Tolerance of the Butterfly”, in the Proc. of the Symp. on Theory of Computing (STOC'94), pp. 125–133, 1994.
H. Kesten, “The critical probability of bond percolation on the square lattice equals ½”, Communication in Mathematical Physics, 74:41–59, 1981.
T. Leighton, B. Maggs and R. Sitaraman, “On the Fault Tolerance of Some Popular Bounded-Degree Networks”, In the Proc. of FOCS'92, pp. 542–552.
F. T. Leighton and B. M. Maggs, “Fast Algorithms for Routing Around Faults in Multibutterflies and Randomly-Wired Splitter Networks”, In IEEE Transactions on Computers, Vol. 41:5, pp. 578–587, 1992.
C. Leiserson, Z. Abuharudeh, D. Douglas, C. Feyninan, M. Ganmukhi, J. Hill, W. Hillis, B. Kuszmaul, M. Pierre, D. Wells, M. Wong, S. Yang and R. Zak, “The network architecture of the Connection Machine CM-5”, in Proc. of the 4th ACM Symp. on Parallel Algorithms and Architectures, pp. 272–285, 1992.
C. Leiserson, “Fat-trees: Universal networks for hardware-efficient supercomputing”, IEEE Transactions on Computers, C-34 (10):892–900, 1985.
T. Leighton, B. Maggs and S. Rao, “Universal packet routing algorithms”, in Proceedings of the 29th Syrup. on the Foundations of Computer Science, 1988.
R. Motwani, P. Raghavan, “Randomized Algorithms”, Cambridge Univ. Press, 1995.
S. Nikoletseas, K. Palein, P. Spirakis and M. Yung, “Short Vertex Disjoint Paths and Multiconnectivity in Random Graphs: Reliable Network Computing”, 21st Col. on Automata, Languages and Programming (ICALP), pp. 508–515, 1994.
S. Nikoletseas, G. Pantziou, P. Psyclraris, P. Spirakis, “On the Fault Tolerance of Fat Trees”, Computer Technology Institute Tech. Report, TR.97.05.24, 1996.
S. Nikoletseas and P. Spirakis, “Expander Properties in Random Regular Graphs with Edge Faults”, in the Proc. of STACS'95, pp. 421–432, 1995.
H. Tamaki, “Efficient self-embedding of butterfly networks with random faults”, in the Proc. of IEEE FOCS, 1992, pp. 533–541.
S. Toledo, “Competitive Fault-Tolerance in Area-Universal Networks”, n the Proc. of SPAA'92, 1992, pp. 236–246
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nikoletseas, S., Pantziou, G., Psycharis, P., Spirakis, P. (1997). On the fault tolerance of fat-trees. In: Lengauer, C., Griebl, M., Gorlatch, S. (eds) Euro-Par'97 Parallel Processing. Euro-Par 1997. Lecture Notes in Computer Science, vol 1300. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0002735
Download citation
DOI: https://doi.org/10.1007/BFb0002735
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63440-9
Online ISBN: 978-3-540-69549-3
eBook Packages: Springer Book Archive