Enhancing application performance via DAG-driven scheduling in task parallelism for cloud center

Li, Chenxi; Liao, Xiaofei; Jin, Hai

doi:10.1007/s12083-017-0576-2

Enhancing application performance via DAG-driven scheduling in task parallelism for cloud center

Published: 16 June 2017

Volume 12, pages 381–391, (2019)
Cite this article

Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Chenxi Li¹,
Xiaofei Liao¹ &
Hai Jin¹

350 Accesses
1 Citation
Explore all metrics

Abstract

Nowadays, offloading technologies are applied to smart devices, which add more jobs into cloud data center. In cloud data center, limited physical resources and competitions of different jobs all need to be improved the performance. Considering more jobs are in the kind of task parallelism, how to improve their performance is very important. However, due to the size of transistor is approaching physical extreme limit, the count of transistor integrated into a single CPU core is seriously restricted. On the other hand, constrained by cooling efficiency, the frequency of CPU can not be raising without restriction which could lead CPU’s energy consumption and heat production to a rapidly growth. As performance improvement of new generations of hardware has slowed, the era of serial computing is over and programmers are getting hard to acquire free application acceleration through hardware updating. The direction of computer architecture is transforming to parallel structure and in order to be highly qualified in the new world of parallel computing, squeezing last bit of performance of current state-of-the-art architectures is an urgent task for whole cloud computing community. In this paper, we present Function Flow a C++11-based generic framework for task parallelism. Our insight is that heavy use of generic parallel algorithms in task parallelism may introduce numerous unnecessary synchronous operation which can cause loss of performance of applications. To solve the problem, in Function Flow we propose a DAG-driven task scheduler for programs that can be expressed as a Direct Acyclic Graph of tasks with dependency edges. Function Flow distributes work threads to cores and schedules tasks based purely on tasks’ state in DAGs constructed by programmers. Because our implementation is based on callback mechanism, DAGs are represented compactly and the scheduler in Function Flow works in a dynamic and fully-distributed manner. To achieve high performance the only thing programmers need to do is characterizing dependencies between tasks with the help of user-friendly interfaces provided by Function Flow. We use several micro-benchmarks to demonstrate the efficiency of our approach and analyze the performance of the framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

References

Li SG et al (2015) Automatic tuning of sparse matrix-vector multiplication on multicore clusters. Science China Information Sciences 58(9):92102–092102
Google Scholar
Blumofe RD, Leiserson CE (1994) Scheduling multithreaded computations by work stealing. J ACM (JACM) 46(5):720–748
Article MathSciNet Google Scholar
Blumofe RD et al (1995) Cilk: an efficient multithreaded runtime system, ACM Sigplan Symposium on Principles and Practice of Parallel Programming ACM, 207–216
Robison AD (2012) Cilk plus: Language support for thread and vector parallelism, Talk at HP-CAST, 18–25
Acar UA, Blelloch GE, Blumofe RD (2002) The data locality of work stealing. Theory of Computing Systems 35(3):321–347
Article MathSciNet Google Scholar
Reinders J (2007) Intel threading building blocks: outfitting c++ for multi-core processor parallelism, O’Reilly Media, Inc
Guo Yi et al (2010) SLAW: A scalable locality-aware adaptive work-stealing scheduler, IEEE International Symposium on Parallel & Distributed Processing IEEE, 341–342
Guo Yi et al (2009) Work-first and help-first scheduling policies for async-finish task parallelism, IEEE International Symposium on Parallel & distributed Processing IEEE Computer Society, 1–12
Chen Quan, Guo M, Huang Z (2012) CATS: Cache Aware task-stealing based on online profiling in multi-socket multi-core architectures, ACM International Conference on Supercomputing ACM, 163–172
Chen Quan, Guo M, Guan H (2014) LAWS: Locality-Aware work-stealing for multi-socket multi-core architectures, ACM International Conference on Supercomputing ACM, 3–12
Augonnet Cdric, et al (2009) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency & Computation Practice & Experience 23(2):187–198
Article Google Scholar
Bosilca G et al (2011) DAGUe: a generic distributed dag engine for high performance computing, IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum IEEE, 1151–1158
Gautier T et al (2013) XKAapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, IEEE, International Symposium on Parallel & Distributed Processing IEEE, 1299–1308
Maglalang J, Krishnamoorthy S, Agrawal K (2016) Locality-aware dynamic task graph scheduling
Li D et al (2010) Superscalar communication: a runtime optimization for distributed applications. Science China Information Sciences 53(10):1931–1946
Article Google Scholar
Demmel JW, Higham NJ, Schreiber RS (2010) Stability of block l u factorization. Numerical Linear Algebra with Applications 2(2):173–190
Article MathSciNet Google Scholar
Agullo E et al (2009) Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, 012037
Buttari A et al (2007) Parallel tiled QR factorization for multicore architectures. Concurrency & Computation Practice & Experience 20(13):1573C1590
Google Scholar
Chen J et al (2013) Block Algorithm and Its Implementation for Cholesky Factorization, 232–236
Squillante MS, Nelson RD (1991) Analysis of task migration in shared-memory multiprocessor scheduling, ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems DBLP, 143–155
Mitzenmacher M (1998) Analyses of load stealing models based on differential equations, 10th ACM Symposium on Parallel Algorithms and Architectures ACM, 212–221

Download references

Author information

Authors and Affiliations

Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Chenxi Li, Xiaofei Liao & Hai Jin

Authors

Chenxi Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Liao
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofei Liao.

Additional information

This article is part of the Topical Collection: Special Issue on Software Defined Networking: Trends, Challenges and Prospective Smart Solutions

Guest Editors: Ahmed E. Kamal, Liangxiu Han, Sohail Jabbar, and Liu Lu

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Liao, X. & Jin, H. Enhancing application performance via DAG-driven scheduling in task parallelism for cloud center. Peer-to-Peer Netw. Appl. 12, 381–391 (2019). https://doi.org/10.1007/s12083-017-0576-2

Download citation

Received: 01 March 2017
Accepted: 31 May 2017
Published: 16 June 2017
Issue Date: March 2019
DOI: https://doi.org/10.1007/s12083-017-0576-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing application performance via DAG-driven scheduling in task parallelism for cloud center

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

Performance improvement of the triangular matrix product in commodity clusters

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhancing application performance via DAG-driven scheduling in task parallelism for cloud center

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

Performance improvement of the triangular matrix product in commodity clusters

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation