Skip to main content

Evaluation of Topology-Aware All-Reduce Algorithm for Dragonfly Networks

  • Conference paper
  • First Online:
Network and Parallel Computing (NPC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13152))

Included in the following conference series:

Abstract

Dragonfly is a popular topology for current and future high-speed interconnection networks. The concept of gathering topology information to accelerate collective operations is a very hot research field. All-reduce operations are often used in the research fields of distributed machine learning (DML) and high-performance computing (HPC), because All-reduce is the key collective communication algorithm. The hierarchical characteristics of the dragonfly topology can be used to take advantage of the low communication delay of adjacent nodes to reduce the completion time of All-reduce operations. In this paper, we propose g-PAARD, a general proximity-aware All-reduce communication on the Dragonfly network. We study the impact of different routing mechanisms on the All-reduce algorithm, and their sensitivity to topology size and message size. Our results show that the proposed topology-aware algorithm can significantly reduce the communication delay, while having little impact on the network topology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For example, using standard algorithms, both nodes require at least 3 hops, and in some cases up to 6 hops, to facilitate communication at each step of the topology.

References

  1. Hierarchical collectives in mpich2. Springer-Verlag (2009). https://doi.org/10.1007/978-3-642-03770-2_41

  2. Design of a scalable infiniband topology service to enable network-topology-aware placement of processes. In: SC 2012: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (2013)

    Google Scholar 

  3. Archer, B.J., Vigil, B.M.: The trinity system. Technical report, Los Alamos National Lab. (LANL), Los Alamos, NM (United States) (2015)

    Google Scholar 

  4. Camarero, C., Vallejo, E., Beivide, R.: Topological characterization of hamming and dragonfly networks and its implications on routing. ACM Trans. Archit. Code Optim. (TACO) 11(4), 1–25 (2014)

    Google Scholar 

  5. Castelló, A., Quintana-Ortí, E.S., Duato, J.: Accelerating distributed deep neural network training with pipelined MPI allreduce. Cluster Comput. 24(4), 3797–3813 (2021). https://doi.org/10.1007/s10586-021-03370-9

    Article  Google Scholar 

  6. De, K., et al.: Integration of panda workload management system with titan supercomputer at OLCF. J. Phys. Conf. Ser. 664, 092020. IOP Publishing (2015)

    Google Scholar 

  7. Faanes, G., et al.: Cray cascade: a scalable hpc system based on a dragonfly network. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. pp. 1–9. IEEE (2012)

    Google Scholar 

  8. Kandalla, K.C., Subramoni, H., Vishnu, A., Panda, D.K.: Designing topology-aware collective communication algorithms for large scale infiniband clusters: case studies with scatter and gather. In: IEEE International Symposium on Parallel & Distributed Processing (2010)

    Google Scholar 

  9. Kielmann, T., Hofman, R., Bal, H.E., Plaat, A., Bhoedjang, R.: MagPIe: MPI’s collective communication operations for clustered wide area systems. In: ACM SIGPLAN Notices (1999)

    Google Scholar 

  10. Kim, J., Dally, W.J., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly topology. In: 2008 International Symposium on Computer Architecture, pp. 77–88. IEEE (2008)

    Google Scholar 

  11. Luo, X., et al.: HAN: a hierarchical autotuned collective communication framework. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp. 23–34 (2020). https://doi.org/10.1109/CLUSTER49012.2020.00013

  12. Lusk, E., de Supinski, B.R., Gropp, W., Karonis, N.T., Bresnahan, J., Foster, I.: Exploiting hierarchy in parallel computer networks to optimize collective operation performance. In: Parallel and Distributed Processing Symposium, International, p. 377. IEEE Computer Society, Los Alamitos, CA, USA, May 2000. https://doi.org/10.1109/IPDPS.2000.846009

  13. Rabenseifner, R.: Optimization of collective reduction operations. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3036, pp. 1–9. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24685-5_1

    Chapter  Google Scholar 

  14. Sensi, D., Girolamo, S., McMahon, K., Roweth, D., Hoefler, T.: An in-depth analysis of the slingshot interconnect. In: 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 481–494. IEEE Computer Society (2020)

    Google Scholar 

  15. Thakur, R., Gropp, W.D.: Improving the performance of collective operations in MPICH. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 257–267. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39924-7_38

    Chapter  Google Scholar 

  16. Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19(1), 49–66 (2005)

    Article  Google Scholar 

Download references

Acknowledgment

We thank the anonymous reviewers for their insightful comments. We gratefully acknowledge members of Tianhe interconnect group at NUDT for many inspiring conversations. The work was supported by the National Key R&D Program of China under Grant No. 2018YFB0204300, the Excellent Youth Foundation of Hunan Province (Dezun Dong), and the National Postdoctoeral Program for Innovative Talents under Grant No. BX20190091.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dezun Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, J., Dong, D., Li, C., Wu, K., Xiao, L. (2022). Evaluation of Topology-Aware All-Reduce Algorithm for Dragonfly Networks. In: Cérin, C., Qian, D., Gaudiot, JL., Tan, G., Zuckerman, S. (eds) Network and Parallel Computing. NPC 2021. Lecture Notes in Computer Science(), vol 13152. Springer, Cham. https://doi.org/10.1007/978-3-030-93571-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93571-9_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93570-2

  • Online ISBN: 978-3-030-93571-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics