Abstract
Group communication significantly influences the performance of data parallel applications. It is required often in two situations: one is array redistribution from phase to phase; the other is array remapping after loop partition. Nevertheless, the important factor that influences the efficiency of group communication is often neglected: a larger communication idle time may occur when there is node contention and difference among message lengths during one particular communication step. This paper is devoted to develop an efficient scheduling strategy using the compiling information provided by array subscripts, array distribution pattern and array access period. Our strategy not only avoids inter-processor contention, but it also minimizes real communication cost in each communication step. Our experimental results show that our strategy has better performance than the traditional implement of MPI_Alltoallv, alltoall based scheduling, and greedy scheduling.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
HPF Forum: High Performance Fortran Language Specification. version 2.0 edition. Rice University, Houston, Texas (1996)
Park, N., Prasanna, V.K., Raghavendra, C.S.: Efficient Algorithms for Block-cyclic Array Redistribution between Processor Sets. IEEE Trans. Parallel Distrib. Systems 10(12), 1217–1239 (1999)
Desprez, F., Dongarra, J., Petitet, A., Randriamaro, C., Robert, Y.: Scheduling Block-cyclic Array Redistribution. IEEE Trans. Parallel Distrib. Systems 9(2), 192–205 (1998)
Faraj, A., Yuan, X., Patarasuk, P.: A Message Scheduling Scheme for All-to-all Personalized Communication on Ethernet Switched Cluster. IEEE Trans. Parallel Distrib. Systems 18(2), 264–276 (2007)
Guo, M., Nakata, I., Yamashita, Y.: Contention-free Communication Scheduling for Array Redistribution. Parallel Comput. 25(3), 1325–1343 (2000)
Guo, M., Pan, Y.: Improving Communication Scheduling for Array Redistribution. J. Parallel Distrib. Comput. 65, 553–563 (2005)
Faraj, A., Yuan, X.: An Empirical Approach for Efficient All-to-All Personalized Communication on Ethernet Switched Clusters. In: The 34th International Conference on Parallel Processing, pp. 321–328 (2005)
Matsuda, M., Kudoh, T., Kodama, Y., Takano, R., Ishikawa, Y.: Efficient MPI Collective Operations for Clusters in Long-and-fast Networks. IEEE Conference on Cluster, 1–9 (2006)
Faraj, A., Patarasuk, P., Yuan, X.: A Study of Process Arrival Patterns for MPI Collective Operations. In: The 21th ACM International Conference on Supercomput., pp. 168–179 (2007)
Bozkus, Z., Choudhary, A., Fox, G., Haupt, T., Ranka, S., Wu, M.Y.: Compiling Fortran 90D/HPF for Distributed Memory MIMD Computers. J. Parallel and Distrib. Comput. 21, 15–26 (1994)
Benkner, S.: VFC: The Vienna Fortran Compiler. Scientific Programming 7(1), 67–81 (1999)
Hu, C.J.: Multi-paradigm Parallel Computing Centered on Data Parallel. Ph.D. Thesis, University of Peking, China (2001)
Yu, H.S., Hu, C.J., Huang, Q.J., Ding, W.K., Xu, Z.Q: A Time-slicing Optimization Framework of Computation Partitioning for Data-parallel Languages. J. Software 12(10), 1434–1446 (2001)
Hu, C.J., Li, J., Wang, J., Li, Y.H., Ding, L., Li, J.J.: Communication Generation for Irregular Parallel Applications. In: The international symposium on parallel computing in electrical engineering, pp. 263–270 (2006)
Huang, T.C., Shiu, L.C.: Efficient Communication Sets Generation for Block-cyclic Distribution on Distributed-memory Machines. J. Systems Arch. 48, 255–265 (2003)
Hwang, G.H.: An Efficient Algorithm for Communication Set Generation of Data Parallel Programs with Block-cyclic Distribution. Parallel Comput. 30, 473–501 (2004)
Adams, J.C., Brainerd, W.S., Martin, J.T., Smith, B.T., Wagener, J.L.: Fortran 90Handbook Complete Ansi/iso Reference. Intertext Publications McGraw-Hill Book Company, New York (1992)
MPICH-2 (2005), http://www-unix.mcs.anl.gov/mpi/
Karwande, A., Yuan, X., Lowenthal, K.D.: An MPI Prototype for Compiled Communication on Ethernet Switched Clusters. J. Parallel and Distrib. Comput., special issue on Design and Performance of Networks for Super-, Cluster-, and Grid-Computing 65(10), 1123–1133 (2005)
Wang, J., Hu, C.J.: Technology_report-07-2-4 (2007), http://202.204.54.130/mywiki/WangJue?action=AttachFile ,
Dietz, H.G., Chung, T.M., Mattox, T.I., Muhammad, T.: Purdue’s Adapter for Parallel Execution and Rapid Synchronization: The TTL PAPERS Design. Technical Report, Purdue University School of Electrical Engineering (1995)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, J., Hu, C., Li, J. (2007). Contention-Free Communication Scheduling for Group Communication in Data Parallelism. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS. OTM 2007. Lecture Notes in Computer Science, vol 4804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76843-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-76843-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76835-7
Online ISBN: 978-3-540-76843-2
eBook Packages: Computer ScienceComputer Science (R0)