research-article

Generalisation of Recursive Doubling for AllReduce

Authors:

Martin Ruefenacht,

Stephen BoothAuthors Info & Claims

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

Pages 23 - 31

https://doi.org/10.1145/2966884.2966913

Published: 25 September 2016 Publication History

Abstract

The performance of AllReduce is crucial at scale. The recursive doubling with pairwise exchange algorithm theoretically achieves O(log2 N) scaling for short messages with N peers, but is limited by improvements in network latency. A multi-way exchange can be implemented using message pipelining, which is easier to improve than latency. Using our method, recursive multiplying, we show reductions in execution time of between 8% and 40% of AllReduce on a Cray XC30 over recursive doubling.

References

[1]

ARCHER. http://www.archer.ac.uk/. Accessed: 2016-04-18.

[2]

MPI: A Message-Passing Interface Standard. https://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf. Accessed: 2016-05-09.

[3]

MPICH. http://www.mpich.org/. Accessed: 2016-04-18.

[4]

Using the GNI and DMAPP APIs. http://docs.cray.com/books/S-2446-5202/S-2446-5202.pdf. Accessed: 2016-04-18.

[5]

B. Alverson, E. Froese, L. Kaplan, and D. Roweth. Cray ® XC Series Network. pages 1--28.

[6]

E. D. Brooks. The butterfly barrier. International Journal of Parallel Programming, 15(4):295--307, 1986.

Digital Library

[7]

E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. Collective communication: theory, practice, and experience. Concurrency and Computation: Practice and Experience, 19(13):1749--1783, 2007.

Digital Library

[8]

D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: towards a realistic model of parallel computation. Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, 28(7):1--12, 1993.

Digital Library

[9]

V. End, R. Yahyapour, C. Simmendinger, and T. Alrutz. Adaption of the n-way Dissemination Algorithm for GASPI Split-Phase Allreduce. (c):13--19, 2015.

[10]

D. Hensgen, R. Finkel, and U. Manber. Two algorithms for barrier synchronization. International Journal of Parallel Programming, 17(1):1--17, 1988.

Digital Library

[11]

T. Hoefler, T. Mehlan, F. Mietke, and W. Rehm. Fast barrier synchronization for InfiniBand. 20th International Parallel and Distributed Processing Symposium, IPDPS, 2006.

Digital Library

[12]

T. Hoefler, T. Mehlan, F. Mietke, and W. Rehm. LogfP - a model for small Messages in InfiniBand. Parallel and Distributed Processing Symposium, 2006.

Digital Library

[13]

T. Hoefler and W. Rehm. A Communication Model for Small Messages with InfiniBand. In PARS Mitteilungen, pages 32--41, 2005.

[14]

H. Pritchard, I. Gorodetsky, and D. Buntinas. A uGNI-based MPICH2 nemesis network module for the Cray XE. Recent Advances in the Message Passing Interface, pages 110--119, 2011.

Digital Library

[15]

R. Rabenseifner. Automatic MPI counter profiling. 42nd CUG Conference, 2000.

[16]

R. Rabenseifner and J. L. Träf. More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems. Recent Advances in Parallel Virtual Machine and Message Passing Interface, 3241:36--46, 2004.

[17]

R. Thakur and W. Gropp. Improving the performance of collective operations in MPICH. Recent Advances in Parallel Virtual Machine and Message Passing Interface, (2840):257--267, 2003.

[18]

R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of Collective Communication Operations in MPICH. International Journal of High Performance Computing Applications, 19(1):49--66, 2005.

Digital Library

Cited By

Bienz AOlson LGropp W(2019)Node-Aware Improvements to Allreduce2019 IEEE/ACM Workshop on Exascale MPI (ExaMPI)10.1109/ExaMPI49596.2019.00008(19-28)Online publication date: Nov-2019
https://doi.org/10.1109/ExaMPI49596.2019.00008

Recommendations

Generalisation of recursive doubling for AllReduce

We show that message pipelining can be used to construct multicast operations.Using multicast operations we construct an extension of recursive doubling.We show that recursive multiplying outperforms recursive doubling significantly.We created a ...
An optimisation of allreduce communication in message-passing systems
Abstract
Collective communication, namely the pattern allreduce in message-passing systems, is optimised based on measurements at the installation time of the library. The algorithms used are set up in an initialisation phase of the ...
Highlights
- Optimisation of the collective communication pattern allreduce.
- Extension of ...
Adaptive Recursive Doubling Algorithm for Collective Communication
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Process arrival times at MPI collective operations differ significantly. Addressing this fact with special handling for popular collective communication algorithms can yield performance improvements. The recursive doubling algorithm is one of the most ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

September 2016

225 pages

ISBN:9781450342346

DOI:10.1145/2966884

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

EPSRC

Conference

EuroMPI 2016

EuroMPI 2016: The 23rd European MPI Users' Group Meeting

September 25 - 28, 2016

Edinburgh, United Kingdom

Acceptance Rates

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
243
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bienz AOlson LGropp W(2019)Node-Aware Improvements to Allreduce2019 IEEE/ACM Workshop on Exascale MPI (ExaMPI)10.1109/ExaMPI49596.2019.00008(19-28)Online publication date: Nov-2019
https://doi.org/10.1109/ExaMPI49596.2019.00008

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten