Assessing the Performance and Scalability of a Novel Multilevel K-Nomial Allgather on CORE-Direct Systems

Ladd, Joshua S.; Venkata, Manjunath Gorentla; Graham, Richard; Shamis, Pavel

doi:10.1007/978-3-642-32820-6_53

Joshua S. Ladd¹⁹,
Manjunath Gorentla Venkata¹⁹,
Richard Graham¹⁹ &
…
Pavel Shamis¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7484))

Included in the following conference series:

European Conference on Parallel Processing

3209 Accesses

Abstract

In this paper, we propose a novel allgather algorithm, Reindexed Recursive K-ing (RRK), which leverages flexibility in the algorithm’s tree topology and ability to make asynchronous progress coupled with Core-Direct communication offload capability to optimize the MPI_Allgather for Core-Direct enabled systems. In particular, the RRK introduces a reindexing scheme which ensures contiguous data transfers while adding only a single additional send and receive operation for any radix, k, or communicator size, N. This allows us to improve algorithm scalability by avoiding the use of a scatter/gather elements (SGE) list on InfiniBand networks. The implementations of the RRK algorithm and its evaluation shows that it performs and scales well on Core-Direct systems for a wide range of message sizes and various communicator configurations.

Download to read the full chapter text

Chapter PDF

Exploring Hierarchical MPI Reduction Collective Algorithms Targeted to Multicore Node Clusters

Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm

Article 16 March 2023

Optimal low-latency network topologies for cluster performance enhancement

Article 02 March 2020

Keywords

References

Benson, G.D., Chu, C.-W., Huang, Q., Caglar, S.G.: A Comparison of MPICH Allgather Algorithms on Switched Networks. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 335–343. Springer, Heidelberg (2003)
Chapter Google Scholar
Bruck, J., Member, S., Tien Ho, C., Kipnis, S., Upfal, E., Member, S., Weathersby, D.: Efficient algorithms for all-to-all communications in multi-port message-passing systems. In: IEEE Transactions on Parallel and Distributed Systems, pp. 298–309 (1997)
Google Scholar
Chen, J., Zhang, L., Zhang, Y., Yuan, W.: Performance evaluation of allgather algorithms on terascale linux cluster with fast ethernet. In: Proceedings. Eighth International Conference on High-Performance Computing in Asia-Pacific Region, pp. 6–442 (July 2005)
Google Scholar
Fagg, G., Bosilca, G., Pješivac-Grbović, J., Angskun, T., Dongarra, J.: Tuned: An open mpi collective communications component. In: Distributed and Parallel Systems, pp. 65–72. Springer, US (2007)
Chapter Google Scholar
Fraigniaud, P., Lazard, E.: Methods and problems of communication in usual networks. Discrete Applied Mathematics 53, 79–133 (1994)
Article MathSciNet MATH Google Scholar
Graham, R., Venkata, M.G., Ladd, J., Shamis, P., Rabinovitz, I., Filipov, V., Shainer, G.: Cheetah: A framework for scalable hierarchical collective operations. In: CCGRID 2011 (2011)
Google Scholar
Graham, R.L., Poole, S., Shamis, P., Bloch, G., Bloch, N., Chapman, H., Kagan, M., Shahar, A., Rabinovitz, I., Shainer, G.: Connectx-2 infiniband management queues: First investigation of the new support for network offloaded collective operations. In: CCGRID, pp. 53–62 (2010)
Google Scholar
Hedetniemi, S.M., Hedetniemi, S.T., Liestman, A.L.: A survey of gossiping and broadcasting in communication networks. Networks (1988)
Google Scholar
Lawry, W., Wilson, C., Maccabe, A., Brightwell, R.: Comb: a portable benchmark suite for assessing mpi overlap. In: 2002 IEEE International Conference on Cluster Computing, pp. 472–475 (2002)
Google Scholar
Sanders, P., Träff, J.L.: The Hierarchical Factor Algorithm for All-to-All Communication. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, p. 799. Springer, Heidelberg (2002)
Chapter Google Scholar
Sur, S., Bondhugula, U.K.R., Mamidala, A.R., Jin, H.-W., Panda, D.K.: High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds.) HiPC 2005. LNCS, vol. 3769, pp. 148–157. Springer, Heidelberg (2005)
Chapter Google Scholar
Sur, S., Jin, H.-W., Panda, D.K.: Efficient and scalable all-to-all personalized exchange for infiniband-based clusters. In: Proceedings of the 2004 International Conference on Parallel Processing, ICPP 2004, pp. 275–282. IEEE Computer Society (2004)
Google Scholar
Träff, J.L.: Efficient Allgather for Regular SMP-Clusters. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 58–65. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Mathematics Division, Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN, 37831, USA
Joshua S. Ladd, Manjunath Gorentla Venkata, Richard Graham & Pavel Shamis

Authors

Joshua S. Ladd
View author publications
You can also search for this author in PubMed Google Scholar
Manjunath Gorentla Venkata
View author publications
You can also search for this author in PubMed Google Scholar
Richard Graham
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Shamis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Patras, Computer Technology Institute and Press “Diophantus”,, N. Kazantzaki, 26504, Rio, Greece
Christos Kaklamanis
University of Patras, University Building B, 26504, Rio, Greece
Theodore Papatheodorou
Computer Technology Institute and Press “Diophantus”, University of Patras, N. Kazantzaki, 26504, Rio, Greece
Paul G. Spirakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ladd, J.S., Venkata, M.G., Graham, R., Shamis, P. (2012). Assessing the Performance and Scalability of a Novel Multilevel K-Nomial Allgather on CORE-Direct Systems. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_53

Download citation

DOI: https://doi.org/10.1007/978-3-642-32820-6_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Assessing the Performance and Scalability of a Novel Multilevel K-Nomial Allgather on CORE-Direct Systems