Skip to main content

HaDPA: A Data-Partition Algorithm for Data Parallel Applications on Heterogeneous HPC Platforms

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13156))

  • 1790 Accesses

Abstract

As the heterogeneity of the high-performance computing platform and the scale of data-parallel applications increased significantly, data partition becomes a key issue. Recent works use computation performance model to optimize the data partition algorithm generally. However, these methods cannot take the communication overhead into account, resulting in incompatibility for the applications with high communication ratio or unbalanced communication topology. In this paper, a new heterogeneous-aware data partition algorithm, HaDPA, is proposed. Firstly, the computation and communication overhead are predicted by suitable computation and communication performance models given a partition topology. Then, the search tree is constructed, and the hierarchical deep first search with branch and bound is designed to obtain the optimal solution, which makes up the whole HaDPA process with the constructing of optimizing model. Finally, to verify the performance of the algorithm, Matrix multiplication and axial compressor rotor applications are tested on TianHe-2A supercomputer. Experimental results show that HaDPA can effectively reduce the execution time of data parallel applications. What’s more, the impact factors of performance improvement are analyzed and explained. Regression model proofs that the communication to computation ratio matters more to the data-partition on heterogeneous HPC platforms. Besides, compared with HPOPTA, the HaDPA improvement ratio increases with a higher communication ratio and a lower heterogeneity of hardware platform.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, J., Zhang, X., Han, L., Ji, Z., Dong, X., Hu, C.: OKCM: improving parallel task scheduling in high-performance computing systems using online learning. J. Supercomput. 77(6), 5960–5983 (2020). https://doi.org/10.1007/s11227-020-03506-5

    Article  Google Scholar 

  2. Top500 (2020). https://www.top500.org/lists/top500/2020/11. Accessed 16 June 2021

  3. Khaleghzadeh, H., Manumachu, R.R., Lastovetsky, A.L.: A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous HPC platforms. IEEE Trans. Parallel Distrib. Syst. 29(10), 2176–2190 (2018)

    Article  Google Scholar 

  4. Li, J., Zhang, X., Zhou, J., Dong, X., Zhang, C.: swHPFM: refactoring and optimizing the structured grid fluid mechanical algorithm on the sunway taihulight supercomputer. Appl. Sci. 10(1), 72–93 (2020)

    Article  Google Scholar 

  5. Martínez, J.A., Garzón, E.M., Plaza, A., García, I.: Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE. J. Supercomput. 58(2), 151–159 (2011)

    Article  Google Scholar 

  6. Song, F., Tomov, S., Dongarra, J.J.: Enabling and scaling matrix computations on heterogeneous multi-core and multi-gpu systems. In: International Conference on Supercomputing, ICS 2012, Venice, Italy, June 25–29, 2012, pp. 365–376. ACM (2012)

    Google Scholar 

  7. Lastovetsky, A.L., Manumachu, R.R.: New model-based methods and algorithms for performance and energy optimization of data parallel applications on homogeneous multicore clusters. IEEE Trans. Parallel Distrib. Syst. 28(4), 1119–1133 (2017)

    Article  Google Scholar 

  8. Marrakchi, S., Jemni, M.: Static scheduling with load balancing for solving triangular band linear systems on multicore processors. Fundam. Informaticae 179(1), 35–58 (2021)

    Article  MathSciNet  Google Scholar 

  9. Khaleghzadeh, H., Deldari, H., Reddy, R., Lastovetsky, A.: Hierarchical multicore thread mapping via estimation of remote communication. J. Supercomput. 74(3), 1321–1340 (2017). https://doi.org/10.1007/s11227-017-2176-6

    Article  Google Scholar 

  10. Giordano, A., Rango, A.D., Rongo, R., D’Ambrosio, D., Spataro, W.: Dynamic load balancing in parallel execution of cellular automata. IEEE Trans. Parallel Distributed Syst. 32(2), 470–484 (2021)

    Article  Google Scholar 

  11. Li, M., Chen, C., Zhu, G., Savaria, Y.: Local queueing-based data-driven task scheduling for multicore systems. In: IEEE 61st International Midwest Symposium on Circuits and Systems, MWSCAS 2018, Windsor, ON, Canada, 5–8 August, 2018, pp. 897–900. IEEE (2018)

    Google Scholar 

  12. Lastovetsky, A.L., Reddy, R.: Data partitioning with a functional performance model of heterogeneous processors. Int. J. High Perform. Comput. Appl. 21(1), 76–90 (2007)

    Article  Google Scholar 

  13. Culler, D.E., Karp, R.M., Patterson, D.A., and A.S.: Logp: Towards a realistic model of parallel computation. In: Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), San Diego, California, USA, 19–22 May, 1993, pp. 1–12. ACM (1993)

    Google Scholar 

  14. Alexandrov, A.D., Ionescu, M.F., Schauser, K.E., Scheiman, C.J.: Loggp: incorporating long messages into the logp model for parallel computation. J. Parallel Distributed Comput. 44(1), 71–79 (1997)

    Article  Google Scholar 

  15. Yuan, L., Zhang, Y., Tang, Y., Rao, L., Sun, X.: Loggph: a parallel computational model with hierarchical communication awareness. In: 13th IEEE International Conference on Computational Science and Engineering, CSE 2010, Hong Kong, China, 11–13 December, 2010. pp. 268–274. IEEE Computer Society (2010)

    Google Scholar 

  16. Chen, W., Zhai, J., Zhang, J., Zheng, W.: Loggpo: an accurate communication model for performance prediction of MPI programs. Sci. China Ser. F Inf. Sci. 52(10), 1785–1791 (2009)

    Article  Google Scholar 

  17. Cameron, K.W., Ge, R., Sun, X.: log\(_{\text{ n }}{\rm p}\) and log\(_{\text{3 }}{\rm p}\): accurate analytical models of point-to-point communication in distributed systems. IEEE Trans. Comput. 56(3), 314–327 (2007)

    Article  MathSciNet  Google Scholar 

  18. Tu, B., Fan, J., Zhan, J., Zhao, X.: Performance analysis and optimization of MPI collective operations on multi-core clusters. J. Supercomput. 60(1), 141–162 (2012)

    Article  Google Scholar 

  19. Rico-Gallego, J., Martín, J.C.D.: \(\tau \)-lop: modeling performance of shared memory MPI. Parallel Comput. 46, 14–31 (2015)

    Article  Google Scholar 

  20. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30, pp. 4765–4774. Curran Associates, Inc. (2017)

    Google Scholar 

Download references

Acknowledgment

The work was funded by the National Key Research and Development Program of China (2016YFB0200902).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingjun Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, J., Han, L., Qu, Y., Zhang, X. (2022). HaDPA: A Data-Partition Algorithm for Data Parallel Applications on Heterogeneous HPC Platforms. In: Lai, Y., Wang, T., Jiang, M., Xu, G., Liang, W., Castiglione, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2021. Lecture Notes in Computer Science(), vol 13156. Springer, Cham. https://doi.org/10.1007/978-3-030-95388-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95388-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95387-4

  • Online ISBN: 978-3-030-95388-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics