Abstract
A lot of efforts have been devoted to address the software overhead problem in the past decade, which is known as the major hindrance on high-speed communication. However, this paper shows that having a low-latency communication system does not guarantee to achieve high performance, as there are other communication issues that have not been fully addressed by the use of low-latency communication, such as contention and scheduling of communication events. In this paper, we use the complete exchange operation as a case study to show that with careful design of communication schedules, we can achieve efficient communication as well as prevent congestion. We have developed a complete exchange algorithm, the Synchronous Shuffle Exchange, which is an optimal algorithm on the non-blocking network. To avoid congestion loss caused by the non-deterministic delays in communication events, a global congestion control scheme is introduced. This scheme coordinates all participating nodes to monitor and regulate the traffic load, which effectively avoids congestion loss and maintains sufficient throughput to maximize the performance. To improve the effectiveness of the congestion control scheme when working on the hierarchical network, we incorporate information on the network topology to devise a contention-aware permutation. This permutation scheme generates a communication schedule, which is both node and switch contention-free as well as distributing the network loads more evenly across the hierarchy. This relieves the congestion build-up at the uplink ports and improves the synchronism of the traffic information exchange between cluster nodes. Performance results of our implementation on a 32-node cluster with various network configurations are examined and reported in this paper.
Similar content being viewed by others
References
A. Barak, I. Gilderman and I. Mctrik, Performance of the communication layers of TCP/IP with the Myrinet Gigabit LAN, Computer Communication 22(11) (July 1999).
D. Bertsekas, C. Ozveren, G. Stamoulis, P. Tseng and J. Tsitsiklis, Optimal communication algorithms for hypercubes, Journal of Parallel and Distributed Computing 11 (1991) 263-275.
S. Bokhari, Multiphase complete exchange on paragon, sp2 and cs-2, IEEE Parallel and Distributed Technology 4(3) (1996) 45-59.
S.H. Bokhari and D.M. Nicol, Balancing contention and synchronization on the Intel Paragon, IEEE Concurrency 5(2) (1997) 74-83.
J. Bruck, C.-T. Ho, S. Kipnis, E. Upfal and D. Weathersby, Efficient algorithms for all-to-all communications in multiport message-passing systems, IEEE Transactions on Parallel and Distributed Systems 8(11) (1997) 1143-1156.
G. Chiola and G. Ciaccio, Gamma: A low-cost network of workstations based on active messages, in: Proceedings of the 5th EUROMICRO Workshop on Parallel and Distributed Processing PDP'97 (January 1997).
Cluster@TOP500, http://clusters.top500.org/
D.E. Culler, R.M. Karp, D.A. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian and T. von Eicken, LogP: Towards a realistic model of parallel computation, in: 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (May 1993).
D.E. Culler, J.P. Singh and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach (Morgan Kaufmann, 1999).
S. Donaldson, J. Hill and D. Skillicorn, Exploiting global structure for performance on clusters, in: Proceedings of IPPS/SPDP'99 (1999), pp. 176–182.
Extreme Networks, http://www.extremenetworks.com/products/
K. Hwang and Z. Xu, Scalable Parallel Computing (McGraw-Hill, 1998).
V. Jacobson, Congestion avoidance and control, in: Proceedings of ACM SIGCOMM 88 (1988) pp. 314-329.
R. Karp, A. Sahay, E. Santos and K. Schauser, Optimal broadcast and summation in the logp model, in: Proceedings of Symposium on Parallel Algorithms and Architectures (SPAA) (June 1993) pp. 142-153.
K. Keeton, T. Anderson and D. Patterson, Logp quantified: The case for low-overhead local area networks, in: Hot Interconnects III: A Symposium on High Performance Interconnects (August 1995).
C.M. Lee, A.T.C. Tam and C.L. Wang, Directed point: An efficient communication subsystem for cluster computing, in: International Conference on Parallel and Distributed Computing Systems (IASTED) (October 1998).
N. Nupairoj, L.M. Ni, J.-Y.L. Park and H.-A. Choi, Architecturedependent tuning of the parameterized communication model for optimal multicasting, in: IPPS: 11th International Parallel Processing Symposium, IEEE Computer Society Press (1997) pp. 578-582.
K. Park, Warp control: A dynamically stable congestion protocol and its analysis, Journal of High Speed Networks 2(4) (1993) 373-404.
J.-Y.L. Park, H.-A. Choi, N. Nupairoj and L.M. Ni, Construction of optimal multicast trees based on the parameterized communication model, in: Proceedings of the 1996 International Conference on Parallel Processing (August 1996) pp. 180-187.
A. Roy, I. Foster, W. Gropp, N. Karonis, V. S ander and B. Toonen, MPICH-GQ: Quality-of-service for message passing programs, in: Proceedings of the IEEE/ACM SC2000 Conference (2000).
S. Shibusawa, H. Makino, S. Nimiya and J. Hatta, Scatter and gather operations on an asynchronous communication model, in: ACM Symposium on Applied Computing (March 2000).
M. Sidi, W.Z. Liu, I. Cidon and I. Gopal, Congestion control through input rate regulation, IEEE Transactions on Communications 41(3) (1993) 471-477.
Y.J. Suh and S. Yalamanchili, All-to-all communication with minimum start-up costs in 2D/3D tori and meshes, IEEE Transactions on Parallel and Distributed Systems 9(5) (1998) 442-458.
A.T.C. Tam and C.-L. Wang, Realistic communication model for parallel computing on cluster, in: Proceedings of the 1st IEEE International Workshop on Cluster Computing (IWCC'99) (December 1999).
A.T.C. Tamand C.-L. Wang, Efficient scheduling of complete exchange on clusters, in: The ISCA 13th International Conference on Parallel and Distributed Computing Systems (PDCS-2000) (August 2000).
The Biopendium Cluster, http://www.inpharmatica.co.uk/biopdetail. htm
The CLiC Cluster, http://www.tu-chemnitz.de/urz/anwendungen/ CLIC/
Y.-C. Tseng and S.K.S. Gupta, All-to-all personalized communication in a wormhole-routed torus, IEEE Transactions on Parallel and Distributed Systems 7(5) (1996) 498-505.
L. Valliant, A bridging model for parallel computation, Communications of the ACM 33(8) (1990) 103-111.
T. von Eicken, D. Culler, S. Goldstein and K. Schauser, Active messages: A mechanism for integrated communication and computation, in: Proceedings of the 19th International Symposium on Computer Architecture (ACM Press, 1992).
M. Welsh, A. Basu and T. von Eicken, Low-Latency Communication over Fast Ethernet, Lecture Notes in Computer Science, Vol. 1123 (Springer, Berlin, 1996).
C.-Q. Yang and A.V.S. Reddy, A taxonomy for congestion control algorithms in packet switching networks, IEEE Network 9(4) (1995) 34-45.
Y. Yang and J. Wang, Optimal all-to-all personalized exchange in selfroutable multistage networks, IEEE Transactions on Parallel and Distributed Systems 11(3) (2000) 261-274.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tam, A.T., Wang, CL. Contention-Aware Communication Schedule for High-Speed Communication. Cluster Computing 6, 339–353 (2003). https://doi.org/10.1023/A:1025765910100
Issue Date:
DOI: https://doi.org/10.1023/A:1025765910100