Abstract
Processor specialization has become the development trend of modern processor industry. It is quite possible that this will still be the main-stream in the next decades of semiconductor era. As the diversity of heterogeneous systems grows, organizing computation efficiently on systems with multiple kinds of heterogeneous processors is a challenging problem and will be a normality. In this paper, we analyze some state-of-the-art task scheduling algorithms of heterogeneous computing systems and propose a Degree of Node First (DONF) algorithm for task scheduling of fine-grained parallel programs on heterogeneous systems. The major innovations of DONF include: 1) simplifying task priority calculation for directed acyclic graph (DAG) based fine-grained parallel programs which not only reduces the complexity of task selection but also enables the algorithm to solve the scheduling problem for dynamic DAGs; 2) building a novel communication model in the processor selection phase that makes the task scheduling much more efficient. They are achieved by exploring finegrained parallelism via a dataflow program execution model, and validated through experimental results with a selected set of benchmarks. The results on synthesized and real-world application DAGs show a very good performance. The proposed DONF algorithm significantly outperforms all the evaluated state-of-the-art heuristic algorithms in terms of scheduling length ratio (SLR) and efficiency.
Similar content being viewed by others
References
Suetterlein J. DARTS: A runtime based on the Codelet execution model [Master Thesis]. University of Delaware, 2014.
Qu P, Yan J, Zhang Y H, Gao G R. Parallel turing machine, a proposal. Journal of Computer Science and Technology, 2017, 32(2): 269-285.
Valiant L G. A bridging model for parallel computation. Communications of the ACM, 1990, 33(8): 103-111.
Zuckerman S, Suetterlein J, Knauerhase R, Gao G R. Using a “codelet” program execution model for exascale machines: Position paper. In Proc. the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, June 2011, pp.64-69.
Suettlerlein J, Zuckerman S, Gao G R. An implementation of the Codelet model. In Proc. the 19th Int. Conf. Parallel Processing, August 2013, pp.633-644.
Ullman J D. NP-complete scheduling problems. Journal of Computer and System Sciences, 1975, 10(3): 384-393.
Lenstra J K, Kan A H G R. Computational complexity of discrete optimization problems. Annals of Discrete Mathematics, 1979, 4: 121-140.
Garey M R, Johnson D S. Computers and Intractability: A Guide to the Theory of NP-Completeness (1st edition). W.H. Freeman, 1979.
Arabnejad H, Barbosa J G. List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Trans. Parallel and Distributed Systems, 2014, 25(3): 682-694.
Dennis J B, Fosseen J B, Linderman J P. Data flow schemas. In Proc. Int. Symp. Theoretical Programming, Aug. 1972, pp.187-216.
Dennis J B. First version of a data flow procedure language. In Proc. Programming Symposium, April 1974, pp.362-376.
Dennis J B. Data flow supercomputers. IEEE Computer, 1980, 13(11): 48-56.
Arvind A, Gostelow K P, Plouffe W. Indeterminacy, monitors, and dataflow. ACM SIGOPS Operating Systems Review, 1977, 11(5): 159-169.
Arvind A, Kathail V. A multiple processor data flow machine that supports generalized procedures. In Proc. the 8th Symp. Computer Architecture, May 1981, pp.291-302.
Watson I, Gurd J. A practical data flow computer. IEEE Computer, 1982, 15(2): 51-57.
Hum H H J, Maquelin O, Theobald K B et al. A design study of the EARTH multiprocessor. In Proc. the IFIP WG10.3 Working Conf. Parallel Architectures and Compilation Techniques, June 1995, pp.59-68.
Farquhar W G, Evripidou P. DART: A data-driven processor architecture for real-time computing. In Proc. the IFIP WG10. 3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, January 1993, pp.141-152.
Evripidou P. Thread synchronization unit (TSU): A building block for high performance computers. In Proc. the 1997 Int. Symp. High Performance Computing, Nov. 1997, pp.107-118.
Topcuouglu H, Hariri S, Wu M Y. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst., 2002, 13(3):260-274.
Bittencourt L F, Sakellariou R, Madeira E R M. DAG scheduling using a lookahead variant of the heterogeneous earliest finish time algorithm. In Proc. the 18th Euromicro Int. Conf. Parallel, Distributed and Network-Based Processing, February 2010, pp.27-34.
Munir E U, Mohsin S, Hussain A, Nisar M W, Ali S. SDBATS: A novel algorithm for task scheduling in heterogeneous computing systems. In Proc. the 27th Int. Symp. Parallel and Distributed Processing Symposium Workshops & PhD Forum, May 2013, pp.43-53.
Wang G, Wang Y X, Liu H, Guo H. HSIP: A novel task scheduling algorithm for heterogeneous computing. Sci. Program., 2016, 2016: Article No. 3676149.
Yelick K, Bonachea D, Chen W Y et al. Productivity and performance using partitioned global address space languages. In Proc. the 2007 International Workshop on Parallel Symbolic Computation, July 2007, pp.24-32.
Murray D G, McSherry F, Isaacs R et al. Naiad: A timely dataflow system. In Proc. the 24th ACM SIGOPS Symp. Operating Systems Principles, Nov. 2013, pp.439-455.
Daoud M I, Kharma N. A high performance algorithm for static task scheduling in heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing, 2008, 68(4): 399-409.
Blumofe R D, Joerg C F, Kuszmaul B C et al. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing, 1996, 37(1): 55-69.
Blumofe R D, Frigo M, Joerg C F et al. DAG-consistent distributed shared memory. In Proc. the 10th International Parallel Processing Symposium, April 1996, pp.132-141.
Augonnet C, Thibault S, Namyst R, Wacrenier P A. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 2011, 23(2): 187-198.
Dongarra J, Heroux M A, Luszczek P. HPCG benchmark: A new metric for ranking high performance computing systems. Technical Report, Electrical Engineering and Computer Science Department, Knoxville, Tennessee, 2015. https://www.hpcg-benchmark.org/pubs/index.html, May 2019.
Su Z C, Chen J S, Lin H et al. A dataflow-based runtime support on a 100P actual system. In Proc. the 2017 IEEE Int. Symp. Parallel and Distributed Processing with Applications and the 2017 IEEE Int. Conf. Ubiquitous Computing and Communications, December 2017, pp.599-606.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd Int. Conf. Learning Representations, May 2015, Article No. 4.
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conference on Neural Information Processing Systems, December 2012, pp.1106-1114.
Sun Y, Liang D, Wang X G et al. DeepID3: Face recognition with very deep neural networks. arXiv:1502.00873, 2015. https://arxiv.org/abs/1502.00873, May 2019.
Dai J F, He K M, Sun J. Convolutional feature masking for joint object and stuff segmentation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3992-4000.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3431-3440.
Ma C, Huang J B, Yang X K et al. Hierarchical convolutional features for visual tracking. In Proc. the 2015 IEEE Int. Conf. Computer Vision, December 2015, pp.3074-3082.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
ESM 1
(PDF 273 kb)
Rights and permissions
About this article
Cite this article
Lin, H., Li, MF., Jia, CF. et al. Degree-of-Node Task Scheduling of Fine-Grained Parallel Programs on Heterogeneous Systems. J. Comput. Sci. Technol. 34, 1096–1108 (2019). https://doi.org/10.1007/s11390-019-1962-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-019-1962-4