Degree-of-Node Task Scheduling of Fine-Grained Parallel Programs on Heterogeneous Systems

Lin, Han; Li, Ming-Fan; Jia, Cheng-Fan; Liu, Jun-Nan; An, Hong

doi:10.1007/s11390-019-1962-4

Degree-of-Node Task Scheduling of Fine-Grained Parallel Programs on Heterogeneous Systems

Regular Paper
Published: 06 September 2019

Volume 34, pages 1096–1108, (2019)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Han Lin¹,
Ming-Fan Li¹,
Cheng-Fan Jia¹,
Jun-Nan Liu¹ &
…
Hong An¹

190 Accesses
9 Citations
Explore all metrics

Abstract

Processor specialization has become the development trend of modern processor industry. It is quite possible that this will still be the main-stream in the next decades of semiconductor era. As the diversity of heterogeneous systems grows, organizing computation efficiently on systems with multiple kinds of heterogeneous processors is a challenging problem and will be a normality. In this paper, we analyze some state-of-the-art task scheduling algorithms of heterogeneous computing systems and propose a Degree of Node First (DONF) algorithm for task scheduling of fine-grained parallel programs on heterogeneous systems. The major innovations of DONF include: 1) simplifying task priority calculation for directed acyclic graph (DAG) based fine-grained parallel programs which not only reduces the complexity of task selection but also enables the algorithm to solve the scheduling problem for dynamic DAGs; 2) building a novel communication model in the processor selection phase that makes the task scheduling much more efficient. They are achieved by exploring finegrained parallelism via a dataflow program execution model, and validated through experimental results with a selected set of benchmarks. The results on synthesized and real-world application DAGs show a very good performance. The proposed DONF algorithm significantly outperforms all the evaluated state-of-the-art heuristic algorithms in terms of scheduling length ratio (SLR) and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

A new distributed graph coloring algorithm for large graphs

Article 23 March 2023

Boosting white shark optimizer for global optimization and cloud scheduling problem

Article 28 March 2024

References

Suetterlein J. DARTS: A runtime based on the Codelet execution model [Master Thesis]. University of Delaware, 2014.
Qu P, Yan J, Zhang Y H, Gao G R. Parallel turing machine, a proposal. Journal of Computer Science and Technology, 2017, 32(2): 269-285.
Article MathSciNet Google Scholar
Valiant L G. A bridging model for parallel computation. Communications of the ACM, 1990, 33(8): 103-111.
Article Google Scholar
Zuckerman S, Suetterlein J, Knauerhase R, Gao G R. Using a “codelet” program execution model for exascale machines: Position paper. In Proc. the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, June 2011, pp.64-69.
Suettlerlein J, Zuckerman S, Gao G R. An implementation of the Codelet model. In Proc. the 19th Int. Conf. Parallel Processing, August 2013, pp.633-644.
Ullman J D. NP-complete scheduling problems. Journal of Computer and System Sciences, 1975, 10(3): 384-393.
Article MathSciNet Google Scholar
Lenstra J K, Kan A H G R. Computational complexity of discrete optimization problems. Annals of Discrete Mathematics, 1979, 4: 121-140.
Article MathSciNet Google Scholar
Garey M R, Johnson D S. Computers and Intractability: A Guide to the Theory of NP-Completeness (1st edition). W.H. Freeman, 1979.
Arabnejad H, Barbosa J G. List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Trans. Parallel and Distributed Systems, 2014, 25(3): 682-694.
Article Google Scholar
Dennis J B, Fosseen J B, Linderman J P. Data flow schemas. In Proc. Int. Symp. Theoretical Programming, Aug. 1972, pp.187-216.
Dennis J B. First version of a data flow procedure language. In Proc. Programming Symposium, April 1974, pp.362-376.
Dennis J B. Data flow supercomputers. IEEE Computer, 1980, 13(11): 48-56.
Article Google Scholar
Arvind A, Gostelow K P, Plouffe W. Indeterminacy, monitors, and dataflow. ACM SIGOPS Operating Systems Review, 1977, 11(5): 159-169.
Article Google Scholar
Arvind A, Kathail V. A multiple processor data flow machine that supports generalized procedures. In Proc. the 8th Symp. Computer Architecture, May 1981, pp.291-302.
Watson I, Gurd J. A practical data flow computer. IEEE Computer, 1982, 15(2): 51-57.
Article Google Scholar
Hum H H J, Maquelin O, Theobald K B et al. A design study of the EARTH multiprocessor. In Proc. the IFIP WG10.3 Working Conf. Parallel Architectures and Compilation Techniques, June 1995, pp.59-68.
Farquhar W G, Evripidou P. DART: A data-driven processor architecture for real-time computing. In Proc. the IFIP WG10. 3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, January 1993, pp.141-152.
Evripidou P. Thread synchronization unit (TSU): A building block for high performance computers. In Proc. the 1997 Int. Symp. High Performance Computing, Nov. 1997, pp.107-118.
Topcuouglu H, Hariri S, Wu M Y. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst., 2002, 13(3):260-274.
Article Google Scholar
Bittencourt L F, Sakellariou R, Madeira E R M. DAG scheduling using a lookahead variant of the heterogeneous earliest finish time algorithm. In Proc. the 18th Euromicro Int. Conf. Parallel, Distributed and Network-Based Processing, February 2010, pp.27-34.
Munir E U, Mohsin S, Hussain A, Nisar M W, Ali S. SDBATS: A novel algorithm for task scheduling in heterogeneous computing systems. In Proc. the 27th Int. Symp. Parallel and Distributed Processing Symposium Workshops & PhD Forum, May 2013, pp.43-53.
Wang G, Wang Y X, Liu H, Guo H. HSIP: A novel task scheduling algorithm for heterogeneous computing. Sci. Program., 2016, 2016: Article No. 3676149.
Yelick K, Bonachea D, Chen W Y et al. Productivity and performance using partitioned global address space languages. In Proc. the 2007 International Workshop on Parallel Symbolic Computation, July 2007, pp.24-32.
Murray D G, McSherry F, Isaacs R et al. Naiad: A timely dataflow system. In Proc. the 24th ACM SIGOPS Symp. Operating Systems Principles, Nov. 2013, pp.439-455.
Daoud M I, Kharma N. A high performance algorithm for static task scheduling in heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing, 2008, 68(4): 399-409.
Article Google Scholar
Blumofe R D, Joerg C F, Kuszmaul B C et al. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing, 1996, 37(1): 55-69.
Article Google Scholar
Blumofe R D, Frigo M, Joerg C F et al. DAG-consistent distributed shared memory. In Proc. the 10th International Parallel Processing Symposium, April 1996, pp.132-141.
Augonnet C, Thibault S, Namyst R, Wacrenier P A. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 2011, 23(2): 187-198.
Article Google Scholar
Dongarra J, Heroux M A, Luszczek P. HPCG benchmark: A new metric for ranking high performance computing systems. Technical Report, Electrical Engineering and Computer Science Department, Knoxville, Tennessee, 2015. https://www.hpcg-benchmark.org/pubs/index.html, May 2019.
Su Z C, Chen J S, Lin H et al. A dataflow-based runtime support on a 100P actual system. In Proc. the 2017 IEEE Int. Symp. Parallel and Distributed Processing with Applications and the 2017 IEEE Int. Conf. Ubiquitous Computing and Communications, December 2017, pp.599-606.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd Int. Conf. Learning Representations, May 2015, Article No. 4.
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conference on Neural Information Processing Systems, December 2012, pp.1106-1114.
Sun Y, Liang D, Wang X G et al. DeepID3: Face recognition with very deep neural networks. arXiv:1502.00873, 2015. https://arxiv.org/abs/1502.00873, May 2019.
Dai J F, He K M, Sun J. Convolutional feature masking for joint object and stuff segmentation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3992-4000.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3431-3440.
Ma C, Huang J B, Yang X K et al. Hierarchical convolutional features for visual tracking. In Proc. the 2015 IEEE Int. Conf. Computer Vision, December 2015, pp.3074-3082.

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China
Han Lin, Ming-Fan Li, Cheng-Fan Jia, Jun-Nan Liu & Hong An

Authors

Han Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Fan Li
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Fan Jia
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Nan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hong An
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Han Lin.

Electronic supplementary material

ESM 1

(PDF 273 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, H., Li, MF., Jia, CF. et al. Degree-of-Node Task Scheduling of Fine-Grained Parallel Programs on Heterogeneous Systems. J. Comput. Sci. Technol. 34, 1096–1108 (2019). https://doi.org/10.1007/s11390-019-1962-4

Download citation

Received: 08 November 2018
Revised: 04 July 2019
Published: 06 September 2019
Issue Date: September 2019
DOI: https://doi.org/10.1007/s11390-019-1962-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Degree-of-Node Task Scheduling of Fine-Grained Parallel Programs on Heterogeneous Systems

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

A new distributed graph coloring algorithm for large graphs

Boosting white shark optimizer for global optimization and cloud scheduling problem

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Degree-of-Node Task Scheduling of Fine-Grained Parallel Programs on Heterogeneous Systems

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

A new distributed graph coloring algorithm for large graphs

Boosting white shark optimizer for global optimization and cloud scheduling problem

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation