skip to main content
10.1145/2966884.2966913acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Generalisation of Recursive Doubling for AllReduce

Published: 25 September 2016 Publication History

Abstract

The performance of AllReduce is crucial at scale. The recursive doubling with pairwise exchange algorithm theoretically achieves O(log2 N) scaling for short messages with N peers, but is limited by improvements in network latency. A multi-way exchange can be implemented using message pipelining, which is easier to improve than latency. Using our method, recursive multiplying, we show reductions in execution time of between 8% and 40% of AllReduce on a Cray XC30 over recursive doubling.

References

[1]
ARCHER. http://www.archer.ac.uk/. Accessed: 2016-04-18.
[2]
MPI: A Message-Passing Interface Standard. https://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf. Accessed: 2016-05-09.
[3]
MPICH. http://www.mpich.org/. Accessed: 2016-04-18.
[4]
Using the GNI and DMAPP APIs. http://docs.cray.com/books/S-2446-5202/S-2446-5202.pdf. Accessed: 2016-04-18.
[5]
B. Alverson, E. Froese, L. Kaplan, and D. Roweth. Cray ® XC Series Network. pages 1--28.
[6]
E. D. Brooks. The butterfly barrier. International Journal of Parallel Programming, 15(4):295--307, 1986.
[7]
E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. Collective communication: theory, practice, and experience. Concurrency and Computation: Practice and Experience, 19(13):1749--1783, 2007.
[8]
D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: towards a realistic model of parallel computation. Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, 28(7):1--12, 1993.
[9]
V. End, R. Yahyapour, C. Simmendinger, and T. Alrutz. Adaption of the n-way Dissemination Algorithm for GASPI Split-Phase Allreduce. (c):13--19, 2015.
[10]
D. Hensgen, R. Finkel, and U. Manber. Two algorithms for barrier synchronization. International Journal of Parallel Programming, 17(1):1--17, 1988.
[11]
T. Hoefler, T. Mehlan, F. Mietke, and W. Rehm. Fast barrier synchronization for InfiniBand. 20th International Parallel and Distributed Processing Symposium, IPDPS, 2006.
[12]
T. Hoefler, T. Mehlan, F. Mietke, and W. Rehm. LogfP - a model for small Messages in InfiniBand. Parallel and Distributed Processing Symposium, 2006.
[13]
T. Hoefler and W. Rehm. A Communication Model for Small Messages with InfiniBand. In PARS Mitteilungen, pages 32--41, 2005.
[14]
H. Pritchard, I. Gorodetsky, and D. Buntinas. A uGNI-based MPICH2 nemesis network module for the Cray XE. Recent Advances in the Message Passing Interface, pages 110--119, 2011.
[15]
R. Rabenseifner. Automatic MPI counter profiling. 42nd CUG Conference, 2000.
[16]
R. Rabenseifner and J. L. Träf. More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems. Recent Advances in Parallel Virtual Machine and Message Passing Interface, 3241:36--46, 2004.
[17]
R. Thakur and W. Gropp. Improving the performance of collective operations in MPICH. Recent Advances in Parallel Virtual Machine and Message Passing Interface, (2840):257--267, 2003.
[18]
R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of Collective Communication Operations in MPICH. International Journal of High Performance Computing Applications, 19(1):49--66, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting
September 2016
225 pages
ISBN:9781450342346
DOI:10.1145/2966884
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AllReduce
  2. Collective
  3. MPI
  4. Message Pipelining
  5. Recursive Doubling
  6. Scalability
  7. n-way

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • EPSRC

Conference

EuroMPI 2016
EuroMPI 2016: The 23rd European MPI Users' Group Meeting
September 25 - 28, 2016
Edinburgh, United Kingdom

Acceptance Rates

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media