Abstract
We present an output-queued switch architecture with cross-point buffering that has improved performance for both point-to-point communication and hardware accelerated collective communication. In the past, output queuing architectures have been less popular as they require more internal speedup and buffering. However, with current technology it is possible to build output-queued switches with a relatively large number of ports. We demonstrate that our output-queued architecture performs well for point-to-point messages, specially in a fat-tree topology. We also show that output-queued architectures facilitate efficient implementations of multicasts and reductions. We present performance of multicasts and reductions on individual switches and a network of switches interconnected in a fat-tree topology. We also present simulation results based on synthetic workloads that emulate a molecular dynamics application.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kumar, S., Kale, L.V.: Scaling collective multicast on fat-tree networks. In: ICPADS, Newport Beach, CA (2004)
Petrini, F., Kerbyson, D., Pakin, S.: The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In: Supercomputing 2003 (2003)
Kale, L.V., Krishnan, S.: Charm++: Parallel Programming with Message-Driven Objects. In: Wilson, G.V., Lu, P. (eds.) Parallel Programming using C++, pp. 175–213. MIT Press, Cambridge (1996)
Kale, L.V., Kumar, S., Vardarajan, K.: A Framework for Collective Personalized Communication. In: Proceedings of IPDPS 2003, Nice, France (2003)
Moody, A., Fernandez, J., Petrini, F., Panda, D.K.: Scalable nic-based reduction on large-scale clusters. In: Supercomputing 2003, Phoenix, AZ (2003)
Petrini, F., Coll, S., Frachtenberg, E., Hoisie, A.: Performance Evaluation of the Quadrics Interconnection Network. Cluster Computing 6, 125–142 (2003)
Infiniband Trade Association: Infiniband architecture specification, release 1.0 (2000)
Marsan, M.A., Bianco, A., Giaccone, P., Leonardi, E., Neri, F.: On the throughput of input-queued cell-based switches with multicast traffic. In: Proceedings of IEEE Infocom (2001)
Prabhakar, B., McKeown, N., Ahuja, R.: Multicast scheduling for input-queued switches. IEEE Journal of Selected Areas in Communications 15, 855–866 (1997)
McKeown, N., Izzard, M., Mekkittikul, A., Ellersick, W., Horowitz, M.: Tiny Tera: A packet switch core. IEEE Micro 17, 26–33 (1997)
Petrini, F., Vanneschi, M.: K-ary N-trees: High performance networks for massively parallel architectures. Technical Report TR-95-18 (1995)
Phillips, J.C., Zheng, G., Kumar, S., Kalé, L.V.: NAMD: Biomolecular simulation on thousands of processors. In: Proceedings of SC 2002, Baltimore, MD (2002)
Vadali, R., Kale, L.V., Martyna, G., Tuckerman, M.: Scalable parallelization of ab initio molecular dynamics. Technical report, UIUC, Dept. of Computer Science (2003)
Sivaram, R., Stunkel, C.B., Panda, D.K.: HIPIQS: A high-performance switch architecture using input queuing. IEEE Transactions on Parallel and Distributed Systems 13 (2002)
Tamir, Y., Frazier, G.L.: High performance multiqueue buffers for VLSI communication switches. In: Proceedings of ISCA, pp. 343–354 (1988)
Blackwell, T., Chang, K., Kung, H.T., Lin., D.: Credit-based flow control for ATM networks. In: Proc. of 1st Annual Conference on Telecommunications R&D in Massachusetts (1994)
Sivaram, R., Stunkel, C.B., Panda, D.K.: Implementing multidestination worms in switch-based parallel systems: architectural alternatives and their impact. IEEE Transactions on Parallel and Distributed Systems 11, 794–812 (2000)
Sivaram, R., Stunkel, C., Panda, D.: A reliable hardware barrier synchronization scheme. In: Proceedings of IPPS, pp. 274–280 (1997)
Wilmarth, T., Kalé, L.V.: Pose: Getting over grainsize in parallel discrete event simulation. In: 2004 International Conference on Parallel Processing, pp. 12–19 (2004)
Heller, S.: Congestion-free routing on the cm-5 data router. In: Bolding, K., Snyder, L. (eds.) PCRCW 1994. LNCS, vol. 853, pp. 176–184. Springer, Heidelberg (1994)
Aydogan, Y., Stunkel, C.B., Aykanat, C., Abali, B.: Adaptive source routing in multistage interconnection networks. In: Proceedings of ICPP, pp. 258–267 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumar, S., Stunkel, C., Kalé, L.V. (2005). Improved Point-to-Point and Collective Communication Performance with Output-Queued High-Radix Routers. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds) High Performance Computing – HiPC 2005. HiPC 2005. Lecture Notes in Computer Science, vol 3769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11602569_44
Download citation
DOI: https://doi.org/10.1007/11602569_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30936-9
Online ISBN: 978-3-540-32427-0
eBook Packages: Computer ScienceComputer Science (R0)