Skip to main content

Improved Point-to-Point and Collective Communication Performance with Output-Queued High-Radix Routers

  • Conference paper
High Performance Computing – HiPC 2005 (HiPC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3769))

Included in the following conference series:

  • 613 Accesses

Abstract

We present an output-queued switch architecture with cross-point buffering that has improved performance for both point-to-point communication and hardware accelerated collective communication. In the past, output queuing architectures have been less popular as they require more internal speedup and buffering. However, with current technology it is possible to build output-queued switches with a relatively large number of ports. We demonstrate that our output-queued architecture performs well for point-to-point messages, specially in a fat-tree topology. We also show that output-queued architectures facilitate efficient implementations of multicasts and reductions. We present performance of multicasts and reductions on individual switches and a network of switches interconnected in a fat-tree topology. We also present simulation results based on synthetic workloads that emulate a molecular dynamics application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kumar, S., Kale, L.V.: Scaling collective multicast on fat-tree networks. In: ICPADS, Newport Beach, CA (2004)

    Google Scholar 

  2. Petrini, F., Kerbyson, D., Pakin, S.: The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In: Supercomputing 2003 (2003)

    Google Scholar 

  3. Kale, L.V., Krishnan, S.: Charm++: Parallel Programming with Message-Driven Objects. In: Wilson, G.V., Lu, P. (eds.) Parallel Programming using C++, pp. 175–213. MIT Press, Cambridge (1996)

    Google Scholar 

  4. Kale, L.V., Kumar, S., Vardarajan, K.: A Framework for Collective Personalized Communication. In: Proceedings of IPDPS 2003, Nice, France (2003)

    Google Scholar 

  5. Moody, A., Fernandez, J., Petrini, F., Panda, D.K.: Scalable nic-based reduction on large-scale clusters. In: Supercomputing 2003, Phoenix, AZ (2003)

    Google Scholar 

  6. Petrini, F., Coll, S., Frachtenberg, E., Hoisie, A.: Performance Evaluation of the Quadrics Interconnection Network. Cluster Computing 6, 125–142 (2003)

    Article  Google Scholar 

  7. Infiniband Trade Association: Infiniband architecture specification, release 1.0 (2000)

    Google Scholar 

  8. Marsan, M.A., Bianco, A., Giaccone, P., Leonardi, E., Neri, F.: On the throughput of input-queued cell-based switches with multicast traffic. In: Proceedings of IEEE Infocom (2001)

    Google Scholar 

  9. Prabhakar, B., McKeown, N., Ahuja, R.: Multicast scheduling for input-queued switches. IEEE Journal of Selected Areas in Communications 15, 855–866 (1997)

    Article  Google Scholar 

  10. McKeown, N., Izzard, M., Mekkittikul, A., Ellersick, W., Horowitz, M.: Tiny Tera: A packet switch core. IEEE Micro 17, 26–33 (1997)

    Article  Google Scholar 

  11. Petrini, F., Vanneschi, M.: K-ary N-trees: High performance networks for massively parallel architectures. Technical Report TR-95-18 (1995)

    Google Scholar 

  12. Phillips, J.C., Zheng, G., Kumar, S., Kalé, L.V.: NAMD: Biomolecular simulation on thousands of processors. In: Proceedings of SC 2002, Baltimore, MD (2002)

    Google Scholar 

  13. Vadali, R., Kale, L.V., Martyna, G., Tuckerman, M.: Scalable parallelization of ab initio molecular dynamics. Technical report, UIUC, Dept. of Computer Science (2003)

    Google Scholar 

  14. Sivaram, R., Stunkel, C.B., Panda, D.K.: HIPIQS: A high-performance switch architecture using input queuing. IEEE Transactions on Parallel and Distributed Systems 13 (2002)

    Google Scholar 

  15. Tamir, Y., Frazier, G.L.: High performance multiqueue buffers for VLSI communication switches. In: Proceedings of ISCA, pp. 343–354 (1988)

    Google Scholar 

  16. Blackwell, T., Chang, K., Kung, H.T., Lin., D.: Credit-based flow control for ATM networks. In: Proc. of 1st Annual Conference on Telecommunications R&D in Massachusetts (1994)

    Google Scholar 

  17. Sivaram, R., Stunkel, C.B., Panda, D.K.: Implementing multidestination worms in switch-based parallel systems: architectural alternatives and their impact. IEEE Transactions on Parallel and Distributed Systems 11, 794–812 (2000)

    Article  Google Scholar 

  18. Sivaram, R., Stunkel, C., Panda, D.: A reliable hardware barrier synchronization scheme. In: Proceedings of IPPS, pp. 274–280 (1997)

    Google Scholar 

  19. Wilmarth, T., Kalé, L.V.: Pose: Getting over grainsize in parallel discrete event simulation. In: 2004 International Conference on Parallel Processing, pp. 12–19 (2004)

    Google Scholar 

  20. Heller, S.: Congestion-free routing on the cm-5 data router. In: Bolding, K., Snyder, L. (eds.) PCRCW 1994. LNCS, vol. 853, pp. 176–184. Springer, Heidelberg (1994)

    Google Scholar 

  21. Aydogan, Y., Stunkel, C.B., Aykanat, C., Abali, B.: Adaptive source routing in multistage interconnection networks. In: Proceedings of ICPP, pp. 258–267 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kumar, S., Stunkel, C., Kalé, L.V. (2005). Improved Point-to-Point and Collective Communication Performance with Output-Queued High-Radix Routers. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds) High Performance Computing – HiPC 2005. HiPC 2005. Lecture Notes in Computer Science, vol 3769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11602569_44

Download citation

  • DOI: https://doi.org/10.1007/11602569_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30936-9

  • Online ISBN: 978-3-540-32427-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics