Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems

Liu, Lei; Chen, Li; Wu, ChengYong; Feng, Xiao-bing

doi:10.1007/978-3-540-85451-7_41

Lei Liu^1,2,
Li Chen¹,
ChengYong Wu¹ &
…
Xiao-bing Feng¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5168))

Included in the following conference series:

European Conference on Parallel Processing

762 Accesses
2 Citations

Abstract

Most previous studies on tiling focus on the division of iteration space. However, on distributed memory parallel systems, the decomposition of computation and the distribution of data must be handled at the same time, in order to attain load balancing and to minimize data migration. In this paper, we formulate a 0-1 integer linear programming for the problem of globally optimal tiling to minimize the total execution time. To simplify the selection of tiling parameters, we restrict the tile shape to semi-oblique shape, and present two effective approaches to decide the tile shape for multi-dimensional semi-oblique shaped tiling. Besides, we present a tile-to-processor mapping scheme based on hyperplanes, which can express diverse parallelism and gain better performance than traditional methods. The experimentations with NPB2.3-serial SP and LU on Qsnet connected cluster achieved the average parallel efficiency of 87% and 73% respectively.

Download to read the full chapter text

Chapter PDF

Scheduling Overheads for Task-Based Parallel Programming Models

Hierarchical task mapping for parallel applications on supercomputers

Article 15 November 2014

A Fast Parallel Graph Partitioner for Shared-Memory Inspector/Executor Strategies

Keywords

References

Andonov, R., Rajopadhye, S.: Optimal orthogonal tiling of 2-d iterations. Journal of Parallel and Distributed Computing 45(2), 159–165 (1997)
Article MATH Google Scholar
Andonov, R., Balev, S., Rajopadhye, S., Yanev, N.: Optimal semi-oblique tiling. IEEE Trans. Par: & Dist. Sys. 14(9), 944–960 (2003)
Article Google Scholar
Boulet, P., Darte, A., Risset, T., Robert, Y. (pen)-ultimate tiling? Intergration. The VLSI journal 17, 33–51 (1994)
Article Google Scholar
Desprez, F., Dongarra, J., Rastello, F., Robert, Y.: Determining the idle time of a tiling: new results. In: PACT 1997: Proceedings of the 1997 International Conference on Parallel architectures Compilation Techniques, Washington, DC, USA, p. 307. IEEE Computer Society, Los Alamitos (1997)
Chapter Google Scholar
Desprez, F., Dongarra, J., Rastello, F., Rober, Y.: Determining the idle time of a tiling: new results. Journal of Information Science and Engineering 14, 164–190 (1998)
Google Scholar
Griebl, M.: On tiling space-time mapped loop nests. In: Proceedings of SPAA 2001, pp. 322–323 (2001)
Google Scholar
Grieble, M.: Automatic Parallelization of Loop Programs for Distributed Memory Architecture. Univercity of Passau, 2004. Habilitation Thesis (2004)
Google Scholar
Hodzic, E., Shang, W.: On Supernode Transformation with Minimized Total Running Time. IEEE Trans. Parallel and Distributed Systems 9(5), 417–428 (1998)
Article Google Scholar
Hodzic, E., Shang, W.: On time optimal supernode shape. IEEE Trans. On Parallel and Distributed Systems. 13(12), 1220–1233 (2002)
Article Google Scholar
Hogstedt, K., Carter, L., Ferrante, J.: Determining the Idle Time of a Tiling. Principles of Programming Languages (January 1997)
Google Scholar
Hogstedt, K., Carter, L., Ferrante, J.: Selecting Tile Shape for Minial Exectution Time. In: Proc. 11th ACM Symp. Parallel Algorithms and Architectures, pp. 201–211 (June 1999)
Google Scholar
Högestedt, K., Carter, L., Ferrante, J.: On the parallel execution time of tiled loops. IEEE Trans. Parallel Distrib. Syst. 14(3), 307–321 (2003)
Article Google Scholar
Irigoin, F., Troilet, R.: Supernode Partitioning. In: At proc. 15th Ann. ACM Symp. Principles of Programming Languages, pp. 319–329 (1988)
Google Scholar
Kennedy, K., Kremer, U.: Automatic data layout for distributed memory machines. ACM Trans. Program. Lang. Syst. 20(4), 869–916 (1998)
Article Google Scholar
Krishnamoorthy, S., Baskaran, M., Bondhugula, U., Ramanujam, J., Rountev, A., Sadayappan, P.: Effective automatic Parallelization of stencil computations. In: PLDI 2007, pp. 235–244 (2007)
Google Scholar
Ohta, H., Saito, Y., Kainaga, M., Ono, H.: Optimal tile size adjustment in compiling general doacross loop nests. In: ICS 1995: Proceedings of the 9th international conference on Supercomputing, pp. 270–279. ACM Press, New York (1995)
Chapter Google Scholar
Ramanujam, J., Sadayappan, P.: Tiling Multidimensional Iteration Spaces for Non Shared-Memory Machines. Supercomputing, 111–120 (1991)
Google Scholar
Bondhuagula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Affine Transformations for Communication Minimal Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences. OSU CSE Technical Report (2007)
Google Scholar
Schreiber, R., Dongarra, J.: Automatic blocking of nested loops. Technical report, University of Tennessee, Knoxvile. TN (August 1990)
Google Scholar
Xue, J.: Communication-minimal tiling of uniform dependence loops. J. Parallel Distrib. Comput. 42(1), 42–59 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences, 100080, Beijing, China
Lei Liu, Li Chen, ChengYong Wu & Xiao-bing Feng
Graduate School of the Chinese Academy of Sciences, 100080, Beijing, China
Lei Liu

Authors

Lei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Li Chen
View author publications
You can also search for this author in PubMed Google Scholar
ChengYong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-bing Feng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Emilio Luque Tomàs Margalef Domingo Benítez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, L., Chen, L., Wu, C., Feng, Xb. (2008). Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems. In: Luque, E., Margalef, T., Benítez, D. (eds) Euro-Par 2008 – Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85451-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-540-85451-7_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85450-0
Online ISBN: 978-3-540-85451-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems

Abstract

Chapter PDF

Similar content being viewed by others

Scheduling Overheads for Task-Based Parallel Programming Models

Hierarchical task mapping for parallel applications on supercomputers

A Fast Parallel Graph Partitioner for Shared-Memory Inspector/Executor Strategies

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems

Abstract

Chapter PDF

Similar content being viewed by others

Scheduling Overheads for Task-Based Parallel Programming Models

Hierarchical task mapping for parallel applications on supercomputers

A Fast Parallel Graph Partitioner for Shared-Memory Inspector/Executor Strategies

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation