Abstract
The communication layer of modern HPC platforms is getting increasingly heterogeneous and hierarchical. As a result, even on platforms with homogeneous processors, the communication cost of many parallel applications will significantly vary depending on the mapping of their processes to the processors of the platform. The optimal mapping, minimizing the communication cost of the application, will strongly depend on the network structure and performance as well as the logical communication flow of the application. In our previous work, we proposed a general approach and two approximate heuristic algorithms aimed at minimization of the communication cost of data parallel applications which have two-dimensional symmetric communication pattern on heterogeneous hierarchical networks, and tested these algorithms in the context of the parallel matrix multiplication application. In this paper, we develop a new algorithm that is built on top of one of these heuristic approaches in the context of a real-life application, MPDATA, which is one of the major parts of the EULAG geophysical model. We carefully study the communication flow of MPDATA and discover that even under the assumption of a perfectly homogeneous communication network, the logical communication links of this application will have different bandwidths, which makes the optimization of its communication cost particularly challenging. We propose a new algorithm that is based on cost functions of one of our general heuristic algorithms and apply it to optimization of the communication cost of MPDATA, which has asymmetric heterogeneous communication pattern. We also present experimental results demonstrating performance gains due to this optimization.
Keywords
A. Lastovetsky—This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number 14/IA/2474. This research was conducted with the financial support of NCN under grants no. UMO-2015/17/D/ST6/04059. This work is partially supported by EU under the COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS). Experiments were carried out on Grid’5000 developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Malik, T., Rychkov, V., Lastovetsky, A.: Network-aware optimization of communications for parallel matrix multiplication on hierarchical hpc platforms. Concurrency Comput. Pract. Experience 28, 02–821 (2016). cpe.3609
Wyrzykowski, R., Szustak, L., Rojek, K.: Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators. parallel Comput. 40, 425–447 (2014)
Wyrzykowski, R., Szustak, L., Rojek, K., Tomas, A.: Towards efficient decomposition and parallelization of MPDATA on hybrid CPU-GPU cluster. In: Lirkov, I., Margenov, S., Waśniewski, J. (eds.) LSSC 2013. LNCS, vol. 8353, pp. 457–464. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43880-0_52
Szustak, L., Rojek, K., Wyrzykowski, R., Gepner, P.: Toward efficient distribution of mpdata stencil computation on intel mic architecture. In: Proceedings of the 1st International Workshop on High-Performance Stencil Computations, pp. 51–56 (2014)
Beaumont, O., Boudet, V., Legrand, A., Rastello, F., Robert, Y.: Heterogeneous matrix-matrix multiplication or partitioning a square into rectangles: Np-completeness and approximation algorithms. In: Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, pp. 298–305 (2001)
Lastovetsky, A., Dongarra, J.: High Performance Heterogeneous Computing. Wiley (2009)
Smolarkiewicz, P.: Multidimensional positive definite advection transport algorithm: an overview. Int. J. Numer. Meth. Fluids 50, 1123–1144 (2006)
Piotrowski, Z., Wyszogrodzki, A., Smolarkiewicz, P.: Towards petascale simulation of atmospheric circulations with soundproof equations. Acta Geophys. 59, 1294–1311 (2011)
Dichev, K., Lastovetsky, A.: Optimization of collective communication for heterogeneous hpc platforms. Wiley-Interscience (2013)
Agarwal, T., Sharma, A., Laxmikant, A., Kale, L.: Topology-aware task mapping for reducing communication contention on large parallel machines. In: IPDPS 2006, p. 10 (2006)
Solomonik, E., Bhatele, A., Demmel, J.: Improving communication performance in dense linear algebra via topology aware collectives. In: SC 2011, pp. 77: 1–77: 11. ACM, New York (2011)
Kielmann, T., Hofman, R.F., Bal, H.E., Plaat, A., Bhoedjang, R.A.: MagPIe: MPI’s collective communication operations for clustered wide area systems. In: ACM Sigplan Notices, vol. 34, pp. 131–140. ACM (1999)
Karonis, N., De Supinski, B., Foster, I., Gropp, W., Lusk, E., Bresnahan, J.: Exploiting hierarchy in parallel computer networks to optimize collective operation performance. IPDPS 2000, 377–384 (2000)
Ma, T., Bosilca, G., Bouteiller, A., Dongarra, J.: HierKNEM: an adaptive framework for kernel-assisted and topology-aware collective communications on many-core clusters. In: IPDPS 2012, pp. 970–982 (2012)
Kandalla, K., Subramoni, H., Vishnu, A., Panda, D.K.: Designing topology-aware collective communication algorithms for large scale infiniband clusters: case studies with scatter and gather. In: 2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–8(2010)
Coti, C., Herault, T., Cappello, F.: MPI applications on grids: a topology aware approach. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 466–477. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03869-3_45
Traff, J.: Implementing the MPI process topology mechanism. In: Supercomputing 2002, pp. 1–23 (2002)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Malik, T., Szustak, L., Wyrzykowski, R., Lastovetsky, A. (2016). Network-Aware Optimization of MPDATA on Homogeneous Multi-core Clusters with Heterogeneous Network. In: Carretero, J., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10049. Springer, Cham. https://doi.org/10.1007/978-3-319-49956-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-49956-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49955-0
Online ISBN: 978-3-319-49956-7
eBook Packages: Computer ScienceComputer Science (R0)