Skip to main content
Log in

Algorithms for tree-shaped task partition and allocation on heterogeneous multiprocessors

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Application with a set of dependent distributed tasks is generally regarded as a direct acyclic graph or an out-tree. Tree-shaped task graphs are widely applied in a variety of computational domains, including electronic structure computations and sparse matrix factorization. Efficient algorithms for tree-shaped task partition and allocation can dominate the performance of heterogeneous computing systems, as most relevant publications have pointed out. This paper presents efficient algorithms for partitioning and allocating tree-shaped tasks on heterogeneous multiprocessor systems with limited memory to improve task-parallel computing. The proposed main algorithm consists of two stages: partition and allocation. During partition, an algorithm is provided for partitioning a task tree into multiple subtrees. It iteratively partitions the subtrees on the critical path of the quotient tree. During allocation, two algorithms are proposed for task allocation to minimize the task tree’s execution time. One is to preferentially allocate the largest subtree of the whole tree, and the other is to preferentially allocate the subtree located on the quotient tree’s critical path. Experimental results show that the proposed algorithms significantly improve the latest works in terms of average makespan, both on randomly generated trees and on a real-world dataset. On a real-world dataset, the average makespan of existing work is approximately \(6.28\times 10^8\). However, it is approximately \(2.13\times 10^8\) for our proposed algorithm. This results in a reduction of 64.33%. On randomly generated trees, the average makespan of existing work is approximately 8973. However, it is approximately 4040 for our proposed algorithm. This results in a reduction of 54.96%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data availability

The assembly trees dataset are generated by a set of sparse matrices, which can be obtained from the University of Florida Sparse Matrix Collection http://www.cise.ufl.edu/research/sparse/matrices/.

References

  1. Hussain H, Malik SUR, Hameed A, Khan SU, Bickler G, Min-Allah N, Qureshi MB, Zhang L, Yongji W, Ghani N et al (2013) A survey on resource allocation in high performance distributed computing systems. Parall Comput 39(11):709–736

    Article  MathSciNet  Google Scholar 

  2. Kelefouras V, Djemame K (2022) Workflow simulation and multi-threading aware task scheduling for heterogeneous computing. J Parall Distrib Comput 168:17–32

    Article  Google Scholar 

  3. Davis TA (2006) Direct methods for sparse linear systems. Society for Industrial and Applied Mathematics, Texas

    Book  MATH  Google Scholar 

  4. Kim K, Eijkhout V (2014) A parallel sparse direct solver via hierarchical DAG scheduling. ACM Trans Math Softw 41(1):1–27

    Article  MathSciNet  MATH  Google Scholar 

  5. Sao P, Li XS, Vuduc R (2018) A communication-avoiding 3D LU factorization algorithm for sparse matrices. In: IEEE International parallel and distributed processing symposium, pp. 908–919

  6. Gou C, Benoit A, Marchal L (2020) Partitioning tree-shaped task graphs for distributed platforms with limited memory. IEEE Trans Parall Distrib Syst 31(7):1533–1544

    Article  Google Scholar 

  7. Ozkaya MY, Benoit A, Ucar B, Herrmann J, Catalyurek UV (2019) A scalable clustering-based task scheduler for homogeneous processors using DAG partitioning. In: IEEE International Parallel and Distributed Processing Symposium, pp. 155–165

  8. Meyerhenke H, Sanders P, Schulz C (2017) Parallel graph partitioning for complex networks. IEEE Trans Parall Distrib Syst 28(9):2625–2638

    Article  Google Scholar 

  9. Zhou AC, Shen B, Xiao Y, Ibrahim S, He B (2019) Cost-aware partitioning for efficient large graph processing in geo-distributed datacenters. IEEE Trans Parall Distrib Syst 31(7):1707–1723

    Article  Google Scholar 

  10. Jacquelin M, Marchal L, Robert Y, Ucar B (2011) On optimal tree traversals for sparse matrix factorization. In: IEEE international parallel & distributed processing symposium, pp. 556–567

  11. Djigal H, Feng J, Lu J, Ge J (2021) IPPTS: an efficient algorithm for scientific workflow scheduling in heterogeneous computing systems. IEEE Trans Parall Distrib Syst 32(5):1057–1071

    Article  Google Scholar 

  12. Zhou N, Qi D, Wang X, Zheng Z, Lin W (2017) A list scheduling algorithm for heterogeneous systems based on a critical node cost table and pessimistic cost table. Concurr Comput Pract Exp 29(5):e3944

    Article  Google Scholar 

  13. Wu C-G, Wang L, Wang J-J (2021) A path relinking enhanced estimation of distribution algorithm for direct acyclic graph task scheduling problem. Knowl Syst 228:1–15

    Google Scholar 

  14. Wang H, Sinnen O (2018) List-scheduling versus cluster-scheduling. IEEE Trans Parall Distrib Syst 29(8):1736–1749

    Article  Google Scholar 

  15. Yoosefi A, Naji HR (2017) A clustering algorithm for communicationaware scheduling of task graphs on multi-core reconfigurable systems. IEEE Trans Parall Distrib Syst 28(10):2718–2732

    Article  Google Scholar 

  16. Sinnen O, To A, Kaur M (2011) Contention-aware scheduling with task duplication. J Parall Distrib Comput 71(1):77–86

    Article  Google Scholar 

  17. He K, Meng X, Pan Z, Yuan L, Zhou P (2018) A novel task-duplication based clustering algorithm for heterogeneous computing environments. IEEE Trans Parall Distrib Syst 30(1):2–14

    Article  Google Scholar 

  18. Ramezani R (2021) Dynamic scheduling of task graphs in multi-fpga systems using critical path. J Supercomput 77(1):597–618

    Article  Google Scholar 

  19. Marchal L, Nagy H, Simon B, Vivien F (2018) Parallel scheduling of DAGs under memory constraints. In: IEEE international parallel and distributed processing symposium, pp. 204–213

  20. Kitagawa Y, Ishigooka T, Azumi T (2018) Dag scheduling algorithm for a cluster-based many-core architecture. In: IEEE International Conference on Embedded And Ubiquitous Computing, pp. 150–157

  21. Geng X, Mao Y, Xiong M, Liu Y (2019) An improved task scheduling algorithm for scientific workflow in cloud computing environment. Cluster Comput 22(3):7539–7548

    Article  Google Scholar 

  22. Tang X, Shi W, Wu F (2019) Interconnection network energy-aware workflow scheduling algorithm on heterogeneous systems. IEEE Trans Indust Inform 16:7637–7645

    Article  Google Scholar 

  23. Guermouche A, Marchal L, Simon B, Vivien F (2015) Scheduling trees of malleable tasks for sparse linear algebra. In: European Conference on Parallel Processing, pp. 479–490

  24. Eyraud-Dubois L, Marchal L, Sinnen O, Vivien F (2015) Parallel scheduling of task trees with limited memory. ACM Trans Parall Comput 2(2):1–37

    Article  Google Scholar 

  25. Rennich SC, Stosic D, Davis TA (2016) Accelerating sparse cholesky factorization on GPUs. Parall Comput 59:140–150

    Article  MathSciNet  Google Scholar 

  26. Kayaaslan E, Lambert T, Marchal L, Ucar B (2018) Scheduling series-parallel task graphs to minimize peak memory. Theor Comput Sci 707:1–23

    Article  MathSciNet  MATH  Google Scholar 

  27. Gou C, Benoit A, Marchal L (2018) Memory-aware tree partitioning on homogeneous platforms. In: Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 321–324

  28. Aupy G, Brasseur C, Marchal L (2017) Dynamic memory-aware task-tree scheduling. In: IEEE international parallel and distributed processing symposium, pp. 758–767

  29. Guinand F, Moukrim A, Sanlaville E (2004) Sensitivity analysis of tree scheduling on two machines with communication delays. Parall Comput 30(1):103–120

    Article  MathSciNet  Google Scholar 

  30. Bai H, Zhang X, Liu Y, Xie Y (2021) Resource scheduling based on routing tree and detection matrix for internet of things. Int J Distrib Sens Netw 17:1–13

    Article  Google Scholar 

  31. Herrmann J, Marchal L, Robert Y (2014) Memory-aware list scheduling for hybrid platforms. In: IEEE international parallel & distributed processing symposium workshops, pp. 689–698

  32. Bak S, Hernandez O, Gates M, Luszczek P, Sarkar V (2021) Task-graph scheduling extensions for efficient synchronization and communication. In: Proceedings of the Acm International Conference on Supercomputing, pp. 88–101

  33. Herrmann J, Marchal L, Robert Y (2013) Model and complexity results for tree traversals on hybrid platforms. In: EUROPEAN CONFERENCE ON PARALLEL PROCESSING, pp. 647–658

  34. Arabnejad H, Barbosa JG (2014) List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Trans Parall Distrib Syst 25(3):682–694

    Article  Google Scholar 

  35. Taheri G, Khonsari A, Entezari-Maleki R, Sousa L (2020) A hybrid algorithm for task scheduling on heterogeneous multiprocessor embedded systems. Appl Soft Comput 91:1–14

    Article  Google Scholar 

  36. Jeong D, Kim J, Oldja M-L, Ha S (2021) Parallel scheduling of multiple sdf graphs onto heterogeneous processors. IEEE Access 9:20493–20507

    Article  Google Scholar 

  37. Li J, Zheng G, Zhang H, Shi G (2019) Task scheduling algorithm for heterogeneous real-time systems based on deadline constraints. In: IEEE International Conference on Electronics Information And Emergency Communication, pp. 113–116

  38. He S, Wu J, Wei B, Wu J (2021) Task tree partition and subtree allocation for heterogeneous multiprocessors. In: IEEE International Conference on Parallel Distributed Processing With Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking, pp. 571–577

Download references

Acknowledgements

Part of the work has been presented in 2021 IEEE International Conference on Parallel & Distributed Processing with Applications, Sept. 30 – Oct. 3, 2021, New York, USA.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62072118 and 62202108. It was also supported in part by Huangpu International Sci & Tech Cooperation Foundation of Guangzhou, China under Grant No. 2021GH12, Guangdong Natural Science Foundation under Grant Nos. 2023A1515011230, 2023A1515030183 and 2021B1515120010.

Author information

Authors and Affiliations

Authors

Contributions

S.H., J.W. and B.W. conceived of the presented idea. J.W. encouraged S.H. to investigate the critical path and supervised the findings of this work. S.H. carried out the experiment and wrote the main manuscript text. All authors discussed the results and revised the manuscript.

Corresponding author

Correspondence to Jigang Wu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, S., Wu, J., Wei, B. et al. Algorithms for tree-shaped task partition and allocation on heterogeneous multiprocessors. J Supercomput 79, 13210–13240 (2023). https://doi.org/10.1007/s11227-023-05186-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05186-3

Keywords

Navigation