Abstract
The complete exchange (or all-to-all personalized) communication pattern occurs frequently in many important parallel computing applications. It is the densest form of communication because all processors need to communicate with all other processors. This can result in severe link contention and degrade performance considerably. Hence, it is necessary to use efficient algorithms in order to get good performance over a wide range of message and multiprocessor sizes. In this paper we present several algorithms to perform complete exchange on the Thinking Machines CM-5 and the Intel Touchstone Delta multiprocessors. Since these machines have different architectures and communication capabilities, different algorithms are needed to get the best performance on each of them. We present four algorithms for the CM-5 and six algorithms for the Delta. Complete exchange algorithms generally assume that the number of processors is a power of two. However, on the Delta the number of processors allocated by a user need not be a power of two. We propose algorithms that are even applicable to non-power-of-two meshes on the Delta. We have developed analytical models to estimate the performance of the algorithms on the basis of system parameters. Performance results on the CM-5 and Delta are also presented and analyzed.
Similar content being viewed by others
References
Barnett, M., Littlefield, R., Payne, D., and van de Geijn, R. 1993. Global combine on mesh architectures with wormhole routing. InConf. Proc.—7th Internat. Parallel Processing Symp.
Bokhari, S. 1991. Complete exchange on the iPSC/860.ICASE Tech. Rept. 91-4.
Bokhari, S., and Berryman, H. 1992. Complete exchange on a circuit switched mesh. InConf. Proc.—Scalable High Performance Computing Conf., pp. 300–306.
Bozkus, Z., Ranka, S., and Fox, G. 1992. Modeling the CM-5 multicomputer. InConf. Proc.—Frontiers of Massively Parallel Computation 92, pp. 100–107.
Dally, W., and Seite, C. 1987. Deadlock-free message routing in multiprocessor interconnection networks.IEEE Trans. Comps., C-36, 5 (May): 547–553.
Johnsson, S.L., and Ho, C.T. 1989. Optimum broadcasting and personalized communication in hypercubes.IEEE Trans. Comps. (Sept.): 1249–1268.
Leiserson, C.E. 1984. FAT-TREES: Universal networks for hardware-efficient supercomputing. InConf. Proc.—Internat. Conf. on Parallel Processing, pp. 952–958.
Leiserson, C.E., Abuhamdeh, Z., Douglas, D., Feynman, C., Ganmukhi, M., Hill, J., Hillis, W., Kuszmaul, B., St. Pierre, M., Wells, D., Wong, M., Yang, S., and Zak, R. 1992. The network architecture of the Connection Machine CM-5. InConf. Proc.—Symp. on Parallel Algorithms and Architectures.
Ni, L., and McKinley, P. 1993. A survey of wormhole routing techniques in direct networks.Computer (Feb.): 62–76.
Ponnusamy, R., Choudhary, A., and Fox, G. 1992. Communication overhead on CM-5: An experimental performance evaluation. InConf. Proc.—Frontiers of Massively Parallel Computation 92 (Oct.), pp. 108–115.
Ponnusamy, R., Thakur, R., Choudhary, A., and Fox, G. 1992. Scheduling regular and irregular communication patterns on the CM-5. InConf. Proc.—Supercomputing 92 (Nov.), pp. 394–402.
Scott, D. 1991. Efficient all-to-all communication patterns in hypercube and mesh topologies. InConf. Proc.—6th Distributed Memory Computing Conf., pp. 398–403.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Thakur, R., Ponnusamy, R., Choudhary, A. et al. Complete exchange on the CM-5 and Touchstone Delta. J Supercomput 8, 305–328 (1995). https://doi.org/10.1007/BF01901612
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01901612