Complete exchange on the CM-5 and Touchstone Delta

Thakur, Rajeev; Ponnusamy, Ravi; Choudhary, Alok; Fox, Geoffrey

doi:10.1007/BF01901612

Complete exchange on the CM-5 and Touchstone Delta

Published: December 1995

Volume 8, pages 305–328, (1995)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Rajeev Thakur^1,2,
Ravi Ponnusamy^1,3,
Alok Choudhary^1,2 &
…
Geoffrey Fox^1,3

19 Accesses
3 Citations
Explore all metrics

Abstract

The complete exchange (or all-to-all personalized) communication pattern occurs frequently in many important parallel computing applications. It is the densest form of communication because all processors need to communicate with all other processors. This can result in severe link contention and degrade performance considerably. Hence, it is necessary to use efficient algorithms in order to get good performance over a wide range of message and multiprocessor sizes. In this paper we present several algorithms to perform complete exchange on the Thinking Machines CM-5 and the Intel Touchstone Delta multiprocessors. Since these machines have different architectures and communication capabilities, different algorithms are needed to get the best performance on each of them. We present four algorithms for the CM-5 and six algorithms for the Delta. Complete exchange algorithms generally assume that the number of processors is a power of two. However, on the Delta the number of processors allocated by a user need not be a power of two. We propose algorithms that are even applicable to non-power-of-two meshes on the Delta. We have developed analytical models to estimate the performance of the algorithms on the basis of system parameters. Performance results on the CM-5 and Delta are also presented and analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Barnett, M., Littlefield, R., Payne, D., and van de Geijn, R. 1993. Global combine on mesh architectures with wormhole routing. InConf. Proc.—7th Internat. Parallel Processing Symp.
Bokhari, S. 1991. Complete exchange on the iPSC/860.ICASE Tech. Rept. 91-4.
Bokhari, S., and Berryman, H. 1992. Complete exchange on a circuit switched mesh. InConf. Proc.—Scalable High Performance Computing Conf., pp. 300–306.
Bozkus, Z., Ranka, S., and Fox, G. 1992. Modeling the CM-5 multicomputer. InConf. Proc.—Frontiers of Massively Parallel Computation 92, pp. 100–107.
Google Scholar
Dally, W., and Seite, C. 1987. Deadlock-free message routing in multiprocessor interconnection networks.IEEE Trans. Comps., C-36, 5 (May): 547–553.
Google Scholar
Johnsson, S.L., and Ho, C.T. 1989. Optimum broadcasting and personalized communication in hypercubes.IEEE Trans. Comps. (Sept.): 1249–1268.
Google Scholar
Leiserson, C.E. 1984. FAT-TREES: Universal networks for hardware-efficient supercomputing. InConf. Proc.—Internat. Conf. on Parallel Processing, pp. 952–958.
Leiserson, C.E., Abuhamdeh, Z., Douglas, D., Feynman, C., Ganmukhi, M., Hill, J., Hillis, W., Kuszmaul, B., St. Pierre, M., Wells, D., Wong, M., Yang, S., and Zak, R. 1992. The network architecture of the Connection Machine CM-5. InConf. Proc.—Symp. on Parallel Algorithms and Architectures.
Ni, L., and McKinley, P. 1993. A survey of wormhole routing techniques in direct networks.Computer (Feb.): 62–76.
Google Scholar
Ponnusamy, R., Choudhary, A., and Fox, G. 1992. Communication overhead on CM-5: An experimental performance evaluation. InConf. Proc.—Frontiers of Massively Parallel Computation 92 (Oct.), pp. 108–115.
Google Scholar
Ponnusamy, R., Thakur, R., Choudhary, A., and Fox, G. 1992. Scheduling regular and irregular communication patterns on the CM-5. InConf. Proc.—Supercomputing 92 (Nov.), pp. 394–402.
Google Scholar
Scott, D. 1991. Efficient all-to-all communication patterns in hypercube and mesh topologies. InConf. Proc.—6th Distributed Memory Computing Conf., pp. 398–403.

Download references

Author information

Authors and Affiliations

Northeast Parallel Architectures Center, Syracuse University, 111 College Place, Rm. 3-228, 13244-4100, Syracuse, NY
Rajeev Thakur, Ravi Ponnusamy, Alok Choudhary & Geoffrey Fox
Department of Electrical and Computer Engineering, Syracuse University, USA
Rajeev Thakur & Alok Choudhary
Department of Computer and Information Science, Syracuse University, USA
Ravi Ponnusamy & Geoffrey Fox

Authors

Rajeev Thakur
View author publications
You can also search for this author in PubMed Google Scholar
Ravi Ponnusamy
View author publications
You can also search for this author in PubMed Google Scholar
Alok Choudhary
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey Fox
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thakur, R., Ponnusamy, R., Choudhary, A. et al. Complete exchange on the CM-5 and Touchstone Delta. J Supercomput 8, 305–328 (1995). https://doi.org/10.1007/BF01901612

Download citation

Received: 15 March 1993
Accepted: 15 June 1994
Issue Date: December 1995
DOI: https://doi.org/10.1007/BF01901612

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Complete exchange on the CM-5 and Touchstone Delta

Abstract

Access this article

Similar content being viewed by others

Comparative Efficiency Analysis of MPI Blocking and Non-blocking Communications with Coarray Fortran

The EPiGRAM Project: Preparing Parallel Programming Models for Exascale

Distributed High-Performance Parallel Mesh Generation with ViennaMesh

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Complete exchange on the CM-5 and Touchstone Delta

Abstract

Access this article

Similar content being viewed by others

Comparative Efficiency Analysis of MPI Blocking and Non-blocking Communications with Coarray Fortran

The EPiGRAM Project: Preparing Parallel Programming Models for Exascale

Distributed High-Performance Parallel Mesh Generation with ViennaMesh

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation