HaDPA: A Data-Partition Algorithm for Data Parallel Applications on Heterogeneous HPC Platforms

Li, Jingbo; Han, Li; Qu, Yuqi; Zhang, Xingjun

doi:10.1007/978-3-030-95388-1_12

Jingbo Li¹⁴,
Li Han¹⁴,
Yuqi Qu¹⁴ &
…
Xingjun Zhang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13156))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1790 Accesses

Abstract

As the heterogeneity of the high-performance computing platform and the scale of data-parallel applications increased significantly, data partition becomes a key issue. Recent works use computation performance model to optimize the data partition algorithm generally. However, these methods cannot take the communication overhead into account, resulting in incompatibility for the applications with high communication ratio or unbalanced communication topology. In this paper, a new heterogeneous-aware data partition algorithm, HaDPA, is proposed. Firstly, the computation and communication overhead are predicted by suitable computation and communication performance models given a partition topology. Then, the search tree is constructed, and the hierarchical deep first search with branch and bound is designed to obtain the optimal solution, which makes up the whole HaDPA process with the constructing of optimizing model. Finally, to verify the performance of the algorithm, Matrix multiplication and axial compressor rotor applications are tested on TianHe-2A supercomputer. Experimental results show that HaDPA can effectively reduce the execution time of data parallel applications. What’s more, the impact factors of performance improvement are analyzed and explained. Regression model proofs that the communication to computation ratio matters more to the data-partition on heterogeneous HPC platforms. Besides, compared with HPOPTA, the HaDPA improvement ratio increases with a higher communication ratio and a lower heterogeneity of hardware platform.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Li, J., Zhang, X., Han, L., Ji, Z., Dong, X., Hu, C.: OKCM: improving parallel task scheduling in high-performance computing systems using online learning. J. Supercomput. 77(6), 5960–5983 (2020). https://doi.org/10.1007/s11227-020-03506-5
Article Google Scholar
Top500 (2020). https://www.top500.org/lists/top500/2020/11. Accessed 16 June 2021
Khaleghzadeh, H., Manumachu, R.R., Lastovetsky, A.L.: A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous HPC platforms. IEEE Trans. Parallel Distrib. Syst. 29(10), 2176–2190 (2018)
Article Google Scholar
Li, J., Zhang, X., Zhou, J., Dong, X., Zhang, C.: swHPFM: refactoring and optimizing the structured grid fluid mechanical algorithm on the sunway taihulight supercomputer. Appl. Sci. 10(1), 72–93 (2020)
Article Google Scholar
Martínez, J.A., Garzón, E.M., Plaza, A., García, I.: Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE. J. Supercomput. 58(2), 151–159 (2011)
Article Google Scholar
Song, F., Tomov, S., Dongarra, J.J.: Enabling and scaling matrix computations on heterogeneous multi-core and multi-gpu systems. In: International Conference on Supercomputing, ICS 2012, Venice, Italy, June 25–29, 2012, pp. 365–376. ACM (2012)
Google Scholar
Lastovetsky, A.L., Manumachu, R.R.: New model-based methods and algorithms for performance and energy optimization of data parallel applications on homogeneous multicore clusters. IEEE Trans. Parallel Distrib. Syst. 28(4), 1119–1133 (2017)
Article Google Scholar
Marrakchi, S., Jemni, M.: Static scheduling with load balancing for solving triangular band linear systems on multicore processors. Fundam. Informaticae 179(1), 35–58 (2021)
Article MathSciNet Google Scholar
Khaleghzadeh, H., Deldari, H., Reddy, R., Lastovetsky, A.: Hierarchical multicore thread mapping via estimation of remote communication. J. Supercomput. 74(3), 1321–1340 (2017). https://doi.org/10.1007/s11227-017-2176-6
Article Google Scholar
Giordano, A., Rango, A.D., Rongo, R., D’Ambrosio, D., Spataro, W.: Dynamic load balancing in parallel execution of cellular automata. IEEE Trans. Parallel Distributed Syst. 32(2), 470–484 (2021)
Article Google Scholar
Li, M., Chen, C., Zhu, G., Savaria, Y.: Local queueing-based data-driven task scheduling for multicore systems. In: IEEE 61st International Midwest Symposium on Circuits and Systems, MWSCAS 2018, Windsor, ON, Canada, 5–8 August, 2018, pp. 897–900. IEEE (2018)
Google Scholar
Lastovetsky, A.L., Reddy, R.: Data partitioning with a functional performance model of heterogeneous processors. Int. J. High Perform. Comput. Appl. 21(1), 76–90 (2007)
Article Google Scholar
Culler, D.E., Karp, R.M., Patterson, D.A., and A.S.: Logp: Towards a realistic model of parallel computation. In: Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), San Diego, California, USA, 19–22 May, 1993, pp. 1–12. ACM (1993)
Google Scholar
Alexandrov, A.D., Ionescu, M.F., Schauser, K.E., Scheiman, C.J.: Loggp: incorporating long messages into the logp model for parallel computation. J. Parallel Distributed Comput. 44(1), 71–79 (1997)
Article Google Scholar
Yuan, L., Zhang, Y., Tang, Y., Rao, L., Sun, X.: Loggph: a parallel computational model with hierarchical communication awareness. In: 13th IEEE International Conference on Computational Science and Engineering, CSE 2010, Hong Kong, China, 11–13 December, 2010. pp. 268–274. IEEE Computer Society (2010)
Google Scholar
Chen, W., Zhai, J., Zhang, J., Zheng, W.: Loggpo: an accurate communication model for performance prediction of MPI programs. Sci. China Ser. F Inf. Sci. 52(10), 1785–1791 (2009)
Article Google Scholar
Cameron, K.W., Ge, R., Sun, X.: log\(_{\text{ n }}{\rm p}\) and log\(_{\text{3 }}{\rm p}\): accurate analytical models of point-to-point communication in distributed systems. IEEE Trans. Comput. 56(3), 314–327 (2007)
Article MathSciNet Google Scholar
Tu, B., Fan, J., Zhan, J., Zhao, X.: Performance analysis and optimization of MPI collective operations on multi-core clusters. J. Supercomput. 60(1), 141–162 (2012)
Article Google Scholar
Rico-Gallego, J., Martín, J.C.D.: \(\tau \)-lop: modeling performance of shared memory MPI. Parallel Comput. 46, 14–31 (2015)
Article Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30, pp. 4765–4774. Curran Associates, Inc. (2017)
Google Scholar

Download references

Acknowledgment

The work was funded by the National Key Research and Development Program of China (2016YFB0200902).

Author information

Authors and Affiliations

School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, 710049, Shaanxi, China
Jingbo Li, Li Han, Yuqi Qu & Xingjun Zhang

Authors

Jingbo Li
View author publications
You can also search for this author in PubMed Google Scholar
Li Han
View author publications
You can also search for this author in PubMed Google Scholar
Yuqi Qu
View author publications
You can also search for this author in PubMed Google Scholar
Xingjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xingjun Zhang .

Editor information

Editors and Affiliations

Xiamen University, Xiamen, China
Yongxuan Lai
Beijing Normal University, Zhuhai, China
Tian Wang
Xiamen University, Xiamen, China
Min Jiang
Tianjin University, Tianjin, China
Guangquan Xu
Hunan University, Changsha, China
Wei Liang
University of Naples Parthenope, Naples, Italy
Aniello Castiglione

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Han, L., Qu, Y., Zhang, X. (2022). HaDPA: A Data-Partition Algorithm for Data Parallel Applications on Heterogeneous HPC Platforms. In: Lai, Y., Wang, T., Jiang, M., Xu, G., Liang, W., Castiglione, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2021. Lecture Notes in Computer Science(), vol 13156. Springer, Cham. https://doi.org/10.1007/978-3-030-95388-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-95388-1_12
Published: 23 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95387-4
Online ISBN: 978-3-030-95388-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics