The design and implementation of MPI collective operations for clusters in long-and-fast networks

Matsuda, Motohiko; Kudoh, Tomohiro; Kodama, Yuetsu; Takano, Ryousei; Ishikawa, Yutaka

doi:10.1007/s10586-007-0050-7

The design and implementation of MPI collective operations for clusters in long-and-fast networks

Published: 08 December 2007

Volume 11, pages 45–55, (2008)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Motohiko Matsuda¹,
Tomohiro Kudoh¹,
Yuetsu Kodama¹,
Ryousei Takano¹ &
…
Yutaka Ishikawa²

91 Accesses
Explore all metrics

Abstract

Several MPI systems for Grid environment, in which clusters are connected by wide-area networks, have been proposed. However, the algorithms of collective communication in such MPI systems assume relatively low bandwidth wide-area networks, and they are not designed for the fast wide-area networks that are becoming available. On the other hand, for cluster MPI systems, a bcast algorithm by van de Geijn, et al. and an allreduce algorithm by Rabenseifner have been proposed, which are efficient in a high bi-section bandwidth environment. We modify those algorithms so as to effectively utilize fast wide-area inter-cluster networks and to control the number of nodes which can transfer data simultaneously through wide-area networks to avoid congestion. We confirmed the effectiveness of the modified algorithms by experiments using a 10 Gbps emulated WAN environment. The environment consists of two clusters, where each cluster consists of nodes with 1 Gbps Ethernet links and a switch with a 10 Gbps upper link. The two clusters are connected through a 10 Gbps WAN emulator which can insert latency. In a 10 millisecond latency environment, when the message size is 32 MB, the proposed bcast and allreduce are 1.6 and 3.2 times faster, respectively, than the algorithms used in existing MPI systems for Grid environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Network states-aware collective communication optimization

Article 10 March 2024

Exploring Hierarchical MPI Reduction Collective Algorithms Targeted to Multicore Node Clusters

ClustMap: A Topology-Aware MPI Process Placement Algorithm for Multi-core Clusters

References

Allman, M., Kruse, H., Ostermann, S.: An application-level solution to TCP’s satellite inefficiencies. In: Proc. of the First Intl. Workshop on Satellite-based Information Services (WOSBIS) (1997)
Barnett, M., Gupta, S., Payne, D.G., Shuler, L., van de Geijn, R., Watts, J.: Interprocessor collective communication library (InterCom). In: Proc. of the Scalable High Performance Computing Conference, pp. 357–364 (1994)
Barnett, M., Gupta, S., Payne, D.G., Shuler, L., van de Geijn, R., Watts, J.: Building a high-performance collective communication library. In: Proc. of the 1994 Conference on Supercomputing (SC94), pp. 107–116 (1994)
den Burger, M., Kielmann, T., Bal, H.E.: Balanced multicasting: high-throughput communication for grid applications. In: Proc. of the 2005 ACM/IEEE Conference on Supercomputing (SC’05) (2005)
Chan, E., van de Geijn, R., Gropp, W., Thakur, R.: Collective communication on architectures that support simultaneous communication over multiple links. In: Proc. of the ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP 2006), pp. 2–11 (2006)
Gabriel, E., Resch, M., Rühle, R.: Implementing MPI with optimized algorithms for metacomputing. In: Proc. of the Third MPI Developer’s and User’s Conference (MPIDC’99), pp. 31–41 (1999)
GLIF: Global Lambda Integrated Facility, http://www.glif.is
GridMPI Project, http://www.gridmpi.org
GtrcNET-10, http://www.gtrc.aist.go.jp/gnet
Ishikawa, Y.: YAMPII official home page, http://www.il.is.s.u-tokyo.ac.jp/yampii
Karonis, N.T., de Supinski, B.R., Foster, I.T., Gropp, W., Lusk, E.L., Lacour, S.: A multilevel approach to topology-aware collective operations in computational grids. Tech. Rep. ANL/MCS-P948–0402 (2002)
Kielmann, T., Bal, H.E., Gorlatch, S.: Bandwidth-efficient collective communication for clustered wide area systems. In: Proc. of the 14th Intl. Parallel and Distributed Processing Symp., pp. 492–499 (1999)
Kodama, Y., Kudoh, T., Takano, R., Sato, H., Tatebe, O., Sekiguchi, S.: GNET-1: gigabit Ethernet network testbed. In: IEEE Intl. Conf. on Cluster Computing (Cluster2004), pp. 185–192 (2004)
Matsuda, M., Ishikawa, Y., Kudoh, T.: Evaluation of MPI implementations on grid-connected clusters using an emulated WAN environment. In: 3rd Intl. Symp. on Cluster Computing and the Grid (CCGrid2003), pp. 10–17 (2003)
Matsuda, M., Ishikawa, Y., Kudoh, T., Kodama, Y., Takano, R.: TCP adaptation for MPI on long-and-fat networks. In: IEEE Intl. Conf. on Cluster Computing (Cluster2005), pp. 1–10 (2005)
Rabenseifner, R.: Automatic MPI counter profiling of all users: first results on a CRAY T3E 900-512. In: Proc. of the Message Passing Interface Developers and Users Conference 1999 (MPIDC99), pp. 77–85 (1999)
Rabenseifner, R.: Optimization of collective reduction operations. In: Intl. Conf. on Computational Science, LNCS 3036, pp. 1–9. Springer (2004)
Takano, R., Kudoh, T., Kodama, Y., Matsuda, M., Tezuka, H., Ishikawa, Y.: Design and evaluation of precise software pacing mechanisms for fast long-distance networks. In: 3rd Intl. Workshop on Protocols for Fast Long-Distance Networks (PFLDnet05) (2005)
TeraGrid, http://www.teragrid.orgVol
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19(1), 49–66 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Grid Technology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
Motohiko Matsuda, Tomohiro Kudoh, Yuetsu Kodama & Ryousei Takano
The University of Tokyo, Tokyo, Japan
Yutaka Ishikawa

Authors

Motohiko Matsuda
View author publications
You can also search for this author inPubMed Google Scholar
Tomohiro Kudoh
View author publications
You can also search for this author inPubMed Google Scholar
Yuetsu Kodama
View author publications
You can also search for this author inPubMed Google Scholar
Ryousei Takano
View author publications
You can also search for this author inPubMed Google Scholar
Yutaka Ishikawa
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Motohiko Matsuda.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Matsuda, M., Kudoh, T., Kodama, Y. et al. The design and implementation of MPI collective operations for clusters in long-and-fast networks. Cluster Comput 11, 45–55 (2008). https://doi.org/10.1007/s10586-007-0050-7

Download citation

Received: 14 March 2007
Accepted: 29 October 2007
Published: 08 December 2007
Issue Date: March 2008
DOI: https://doi.org/10.1007/s10586-007-0050-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The design and implementation of MPI collective operations for clusters in long-and-fast networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Network states-aware collective communication optimization

Exploring Hierarchical MPI Reduction Collective Algorithms Targeted to Multicore Node Clusters

ClustMap: A Topology-Aware MPI Process Placement Algorithm for Multi-core Clusters

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now