Abstract
The number of processors embedded in high performance computing platforms is growing daily to solve larger and more complex problems. However, as the number of components increases, so does the probability of failure. The logical network topologies must also support the fault-tolerant capability in such dynamic environments. This paper presents a self-healing mechanism to improve the fault-tolerant capability of a Binomial graph (BMG) network. The self-healing mechanism protects BMG from network bisection and helps maintain optimal routing even in failure circumstances. The experimental results show that self-healing with an adaptive method significantly reduces the overhead from reconstructing the networks.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dongarra, J.J., Meuer, H., Strohmaier, E.: TOP500 supercomputer sites. Supercomputer 13, 89–120 (1997)
Saad, Y., Schultz, M.H.: Topological properties of hypercubes. IEEE Transactions on Computers 37, 867–872 (1988)
Banerjee, S., Sarkar, D.: Hypercube connected rings: A scalable and fault-tolerant logical topology for optical networks 24, 1060–1079 (2001)
Malluhi, Q., Bayoumi, M.: The hierarchical hypercube: A new interconnection topology for massively parallel systems. IEEE Transactions on Parallel and Distributed Systems 05, 17–30 (1994)
El-Amawy, A., Latifi, S.: Properties and performance of folded hypercubes. IEEE Transactions on Parallel and Distributed Systems 2, 31–42 (1991)
Kumar, J.M., Patnaik, L.M.: Extended hypercube: A hierarchical interconnection network of hypercubes. IEEE Transactions on Parallel and Distributed Systems 3, 45–57 (1992)
Tzeng, N.F., Wei, S.: Enhanced hypercubes. IEEE Transactions on Computers 40, 284–294 (1991)
Preparata, F.P., Vuillemin, J.: The cube-connected cycles: a versatile network for parallel computation. Commun. ACM 24, 300–309 (1981)
Louri, A., Neocleous, C.: A spanning bus connected hypercube: A new scalable optical interconnection network for multiprocessors and massively parallel systems. IEEE/OSA Journal of Lightwave Technology 15, 1241–1252 (1997)
Louri, A., Sung, H.: An optical multi-mesh hypercube: A scalable optical interconnection network for massively parallel computing. Journal of Lightware Technology 12, 704–716 (1994)
Ohring, S., Das, S.K.: Folded petersen cube networks: New competitors for the hypercubes. IEEE Transactions on Parallel and Distributed Systems 7, 151–168 (1996)
Sivarajan, K.N., Ramaswami, R.: Lightwave networks based on de bruijn graphs. IEEE/ACM Trans. Netw. 2, 70–79 (1994)
Ganesan, E., Pradhan, D.K.: The hyper-debruijn networks: Scalable versatile architecture. IEEE Transactions on Parallel and Distributed Systems 04, 962–978 (1993)
Chen, C., Agrawal, D.P., Burke, J.R.: dbcube: A new class of hierarchical multiprocessor interconnection networks with area efficient layout. IEEE Trans. Parallel Distrib. Syst. 4, 1332–1344 (1993)
Panchapakesan, G., Sengupta, A.: On a lightwave network topology using kautz digraphs. IEEE Transactions on Computers 48, 1131–1138 (1999)
Karol, M.J.: Optical interconnection using shufflenet multihop networks in multi-connected ring topologies. In: SIGCOMM 1988: Symposium proceedings on Communications architectures and protocols, pp. 25–34. ACM Press, New York (1988)
Maxemchuck, N.F.: Regular mesh topologies in local and metropolitan area networks. AT&T Technical Journal 64, 1659–1685 (1985)
Campbell, S., Kumar, M., Olariu, S.: The hierarchical cliques interconnection network. Journal of Parallel and Distributed Computing 64, 16–28 (2004)
Goodman, J.R., Sequin, C.H.: Hypertree: A multiprocessor interconnection topology. IEEE Transactions on Computers 30, 923–933 (1981)
Angskun, T., Fagg, G.E., Bosilca, G., Pješivac-Grbović, J., Dongarra, J.: Scalable fault tolerant protocol for parallel runtime environments. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 4192, pp. 141–149. Springer, Heidelberg (2006)
Angskun, T., Fagg, G.E., Bosilca, G., Pješivac-Grbović, J., Dongarra, J.J.: Self-healing network for scalable fault tolerant runtime environments. In: Proceedings of 6th Austrian-Hungarian workshop on distributed and parallel systems, Innsbruck, Austria, Springer, Heidelberg (2006)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content addressable network. Technical Report TR-00-010, Berkeley, CA (2000)
Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable Peer-To-Peer lookup service for internet applications. In: Proceedings of the 2001 ACM SIGCOMM Conference, pp. 149–160 (2001)
Harvey, N.J.A., Jones, M.B., Marvin Theimer, S.S., Wolman, A.: Skipnet: A scalable overlay network with practical locality properties. In: USENIX Symposium on Internet Technologies and Systems. proceedings of the 4th USENIX Symposium on Internet Technol ogies and Systems (USITS 2003), Seattle, WA, USA, pp. 113–126 (2003)
Maymounkov, P., Mazieres, D.: Kademlia: A peer-to-peer information system based on the xor metric. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, Springer, Heidelberg (2002)
Malkhi, D., Naor, M., Ratajczak, D.R.: Viceroy: A scalable and dynamic emulation of the butterfly. In: Proceedings of the 21st ACM Symposium on Principles of Distributed Comput ing, pp. 183–192. ACM Press, New York (2002)
Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)
Zhao, B.Y., Kubiatowicz, J.D., Joseph, A.D.: Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01-1141, UC Berkeley (2001)
Angskun, T., Bosilca, G., Dongarra, J.: Binomial graph: A scalable and fault-tolerant logical network topology. In: ISPA 2007. LNCS, pp. 471–482. Springer, Heidelberg (2007)
Bermond, J.C., Comellas, F., Hsu, D.F.: Distributed loop computer networks: A survey. Journal of Parallel and Distributed Computing 24, 2–10 (1995)
Beivide, R., Herrada, E., Balcázar, J.L., Arruabarrena, A.: Optimal distance networks of low degree for parallel computers. IEEE Trans. Comput. 40, 1109–1124 (1991)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Angskun, T., Bosilca, G., Dongarra, J. (2007). Self-healing in Binomial Graph Networks. In: Meersman, R., Tari, Z., Herrero, P. (eds) On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops. OTM 2007. Lecture Notes in Computer Science, vol 4806. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76890-6_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-76890-6_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76889-0
Online ISBN: 978-3-540-76890-6
eBook Packages: Computer ScienceComputer Science (R0)