Abstract
Most previous studies on tiling focus on the division of iteration space. However, on distributed memory parallel systems, the decomposition of computation and the distribution of data must be handled at the same time, in order to attain load balancing and to minimize data migration. In this paper, we formulate a 0-1 integer linear programming for the problem of globally optimal tiling to minimize the total execution time. To simplify the selection of tiling parameters, we restrict the tile shape to semi-oblique shape, and present two effective approaches to decide the tile shape for multi-dimensional semi-oblique shaped tiling. Besides, we present a tile-to-processor mapping scheme based on hyperplanes, which can express diverse parallelism and gain better performance than traditional methods. The experimentations with NPB2.3-serial SP and LU on Qsnet connected cluster achieved the average parallel efficiency of 87% and 73% respectively.
Chapter PDF
Similar content being viewed by others
Keywords
References
Andonov, R., Rajopadhye, S.: Optimal orthogonal tiling of 2-d iterations. Journal of Parallel and Distributed Computing 45(2), 159–165 (1997)
Andonov, R., Balev, S., Rajopadhye, S., Yanev, N.: Optimal semi-oblique tiling. IEEE Trans. Par: & Dist. Sys. 14(9), 944–960 (2003)
Boulet, P., Darte, A., Risset, T., Robert, Y. (pen)-ultimate tiling? Intergration. The VLSI journal 17, 33–51 (1994)
Desprez, F., Dongarra, J., Rastello, F., Robert, Y.: Determining the idle time of a tiling: new results. In: PACT 1997: Proceedings of the 1997 International Conference on Parallel architectures Compilation Techniques, Washington, DC, USA, p. 307. IEEE Computer Society, Los Alamitos (1997)
Desprez, F., Dongarra, J., Rastello, F., Rober, Y.: Determining the idle time of a tiling: new results. Journal of Information Science and Engineering 14, 164–190 (1998)
Griebl, M.: On tiling space-time mapped loop nests. In: Proceedings of SPAA 2001, pp. 322–323 (2001)
Grieble, M.: Automatic Parallelization of Loop Programs for Distributed Memory Architecture. Univercity of Passau, 2004. Habilitation Thesis (2004)
Hodzic, E., Shang, W.: On Supernode Transformation with Minimized Total Running Time. IEEE Trans. Parallel and Distributed Systems 9(5), 417–428 (1998)
Hodzic, E., Shang, W.: On time optimal supernode shape. IEEE Trans. On Parallel and Distributed Systems. 13(12), 1220–1233 (2002)
Hogstedt, K., Carter, L., Ferrante, J.: Determining the Idle Time of a Tiling. Principles of Programming Languages (January 1997)
Hogstedt, K., Carter, L., Ferrante, J.: Selecting Tile Shape for Minial Exectution Time. In: Proc. 11th ACM Symp. Parallel Algorithms and Architectures, pp. 201–211 (June 1999)
Högestedt, K., Carter, L., Ferrante, J.: On the parallel execution time of tiled loops. IEEE Trans. Parallel Distrib. Syst. 14(3), 307–321 (2003)
Irigoin, F., Troilet, R.: Supernode Partitioning. In: At proc. 15th Ann. ACM Symp. Principles of Programming Languages, pp. 319–329 (1988)
Kennedy, K., Kremer, U.: Automatic data layout for distributed memory machines. ACM Trans. Program. Lang. Syst. 20(4), 869–916 (1998)
Krishnamoorthy, S., Baskaran, M., Bondhugula, U., Ramanujam, J., Rountev, A., Sadayappan, P.: Effective automatic Parallelization of stencil computations. In: PLDI 2007, pp. 235–244 (2007)
Ohta, H., Saito, Y., Kainaga, M., Ono, H.: Optimal tile size adjustment in compiling general doacross loop nests. In: ICS 1995: Proceedings of the 9th international conference on Supercomputing, pp. 270–279. ACM Press, New York (1995)
Ramanujam, J., Sadayappan, P.: Tiling Multidimensional Iteration Spaces for Non Shared-Memory Machines. Supercomputing, 111–120 (1991)
Bondhuagula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Affine Transformations for Communication Minimal Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences. OSU CSE Technical Report (2007)
Schreiber, R., Dongarra, J.: Automatic blocking of nested loops. Technical report, University of Tennessee, Knoxvile. TN (August 1990)
Xue, J.: Communication-minimal tiling of uniform dependence loops. J. Parallel Distrib. Comput. 42(1), 42–59 (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, L., Chen, L., Wu, C., Feng, Xb. (2008). Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems. In: Luque, E., Margalef, T., Benítez, D. (eds) Euro-Par 2008 – Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85451-7_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-85451-7_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85450-0
Online ISBN: 978-3-540-85451-7
eBook Packages: Computer ScienceComputer Science (R0)