Abstract
As the heterogeneity of the high-performance computing platform and the scale of data-parallel applications increased significantly, data partition becomes a key issue. Recent works use computation performance model to optimize the data partition algorithm generally. However, these methods cannot take the communication overhead into account, resulting in incompatibility for the applications with high communication ratio or unbalanced communication topology. In this paper, a new heterogeneous-aware data partition algorithm, HaDPA, is proposed. Firstly, the computation and communication overhead are predicted by suitable computation and communication performance models given a partition topology. Then, the search tree is constructed, and the hierarchical deep first search with branch and bound is designed to obtain the optimal solution, which makes up the whole HaDPA process with the constructing of optimizing model. Finally, to verify the performance of the algorithm, Matrix multiplication and axial compressor rotor applications are tested on TianHe-2A supercomputer. Experimental results show that HaDPA can effectively reduce the execution time of data parallel applications. What’s more, the impact factors of performance improvement are analyzed and explained. Regression model proofs that the communication to computation ratio matters more to the data-partition on heterogeneous HPC platforms. Besides, compared with HPOPTA, the HaDPA improvement ratio increases with a higher communication ratio and a lower heterogeneity of hardware platform.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Li, J., Zhang, X., Han, L., Ji, Z., Dong, X., Hu, C.: OKCM: improving parallel task scheduling in high-performance computing systems using online learning. J. Supercomput. 77(6), 5960–5983 (2020). https://doi.org/10.1007/s11227-020-03506-5
Top500 (2020). https://www.top500.org/lists/top500/2020/11. Accessed 16 June 2021
Khaleghzadeh, H., Manumachu, R.R., Lastovetsky, A.L.: A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous HPC platforms. IEEE Trans. Parallel Distrib. Syst. 29(10), 2176–2190 (2018)
Li, J., Zhang, X., Zhou, J., Dong, X., Zhang, C.: swHPFM: refactoring and optimizing the structured grid fluid mechanical algorithm on the sunway taihulight supercomputer. Appl. Sci. 10(1), 72–93 (2020)
Martínez, J.A., Garzón, E.M., Plaza, A., García, I.: Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE. J. Supercomput. 58(2), 151–159 (2011)
Song, F., Tomov, S., Dongarra, J.J.: Enabling and scaling matrix computations on heterogeneous multi-core and multi-gpu systems. In: International Conference on Supercomputing, ICS 2012, Venice, Italy, June 25–29, 2012, pp. 365–376. ACM (2012)
Lastovetsky, A.L., Manumachu, R.R.: New model-based methods and algorithms for performance and energy optimization of data parallel applications on homogeneous multicore clusters. IEEE Trans. Parallel Distrib. Syst. 28(4), 1119–1133 (2017)
Marrakchi, S., Jemni, M.: Static scheduling with load balancing for solving triangular band linear systems on multicore processors. Fundam. Informaticae 179(1), 35–58 (2021)
Khaleghzadeh, H., Deldari, H., Reddy, R., Lastovetsky, A.: Hierarchical multicore thread mapping via estimation of remote communication. J. Supercomput. 74(3), 1321–1340 (2017). https://doi.org/10.1007/s11227-017-2176-6
Giordano, A., Rango, A.D., Rongo, R., D’Ambrosio, D., Spataro, W.: Dynamic load balancing in parallel execution of cellular automata. IEEE Trans. Parallel Distributed Syst. 32(2), 470–484 (2021)
Li, M., Chen, C., Zhu, G., Savaria, Y.: Local queueing-based data-driven task scheduling for multicore systems. In: IEEE 61st International Midwest Symposium on Circuits and Systems, MWSCAS 2018, Windsor, ON, Canada, 5–8 August, 2018, pp. 897–900. IEEE (2018)
Lastovetsky, A.L., Reddy, R.: Data partitioning with a functional performance model of heterogeneous processors. Int. J. High Perform. Comput. Appl. 21(1), 76–90 (2007)
Culler, D.E., Karp, R.M., Patterson, D.A., and A.S.: Logp: Towards a realistic model of parallel computation. In: Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), San Diego, California, USA, 19–22 May, 1993, pp. 1–12. ACM (1993)
Alexandrov, A.D., Ionescu, M.F., Schauser, K.E., Scheiman, C.J.: Loggp: incorporating long messages into the logp model for parallel computation. J. Parallel Distributed Comput. 44(1), 71–79 (1997)
Yuan, L., Zhang, Y., Tang, Y., Rao, L., Sun, X.: Loggph: a parallel computational model with hierarchical communication awareness. In: 13th IEEE International Conference on Computational Science and Engineering, CSE 2010, Hong Kong, China, 11–13 December, 2010. pp. 268–274. IEEE Computer Society (2010)
Chen, W., Zhai, J., Zhang, J., Zheng, W.: Loggpo: an accurate communication model for performance prediction of MPI programs. Sci. China Ser. F Inf. Sci. 52(10), 1785–1791 (2009)
Cameron, K.W., Ge, R., Sun, X.: log\(_{\text{ n }}{\rm p}\) and log\(_{\text{3 }}{\rm p}\): accurate analytical models of point-to-point communication in distributed systems. IEEE Trans. Comput. 56(3), 314–327 (2007)
Tu, B., Fan, J., Zhan, J., Zhao, X.: Performance analysis and optimization of MPI collective operations on multi-core clusters. J. Supercomput. 60(1), 141–162 (2012)
Rico-Gallego, J., Martín, J.C.D.: \(\tau \)-lop: modeling performance of shared memory MPI. Parallel Comput. 46, 14–31 (2015)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30, pp. 4765–4774. Curran Associates, Inc. (2017)
Acknowledgment
The work was funded by the National Key Research and Development Program of China (2016YFB0200902).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, J., Han, L., Qu, Y., Zhang, X. (2022). HaDPA: A Data-Partition Algorithm for Data Parallel Applications on Heterogeneous HPC Platforms. In: Lai, Y., Wang, T., Jiang, M., Xu, G., Liang, W., Castiglione, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2021. Lecture Notes in Computer Science(), vol 13156. Springer, Cham. https://doi.org/10.1007/978-3-030-95388-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-95388-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95387-4
Online ISBN: 978-3-030-95388-1
eBook Packages: Computer ScienceComputer Science (R0)