Skip to main content

Advertisement

Log in

Enhancing application performance via DAG-driven scheduling in task parallelism for cloud center

  • Published:
Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Abstract

Nowadays, offloading technologies are applied to smart devices, which add more jobs into cloud data center. In cloud data center, limited physical resources and competitions of different jobs all need to be improved the performance. Considering more jobs are in the kind of task parallelism, how to improve their performance is very important. However, due to the size of transistor is approaching physical extreme limit, the count of transistor integrated into a single CPU core is seriously restricted. On the other hand, constrained by cooling efficiency, the frequency of CPU can not be raising without restriction which could lead CPU’s energy consumption and heat production to a rapidly growth. As performance improvement of new generations of hardware has slowed, the era of serial computing is over and programmers are getting hard to acquire free application acceleration through hardware updating. The direction of computer architecture is transforming to parallel structure and in order to be highly qualified in the new world of parallel computing, squeezing last bit of performance of current state-of-the-art architectures is an urgent task for whole cloud computing community. In this paper, we present Function Flow a C++11-based generic framework for task parallelism. Our insight is that heavy use of generic parallel algorithms in task parallelism may introduce numerous unnecessary synchronous operation which can cause loss of performance of applications. To solve the problem, in Function Flow we propose a DAG-driven task scheduler for programs that can be expressed as a Direct Acyclic Graph of tasks with dependency edges. Function Flow distributes work threads to cores and schedules tasks based purely on tasks’ state in DAGs constructed by programmers. Because our implementation is based on callback mechanism, DAGs are represented compactly and the scheduler in Function Flow works in a dynamic and fully-distributed manner. To achieve high performance the only thing programmers need to do is characterizing dependencies between tasks with the help of user-friendly interfaces provided by Function Flow. We use several micro-benchmarks to demonstrate the efficiency of our approach and analyze the performance of the framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Listing 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Li SG et al (2015) Automatic tuning of sparse matrix-vector multiplication on multicore clusters. Science China Information Sciences 58(9):92102–092102

    Google Scholar 

  2. Blumofe RD, Leiserson CE (1994) Scheduling multithreaded computations by work stealing. J ACM (JACM) 46(5):720–748

    Article  MathSciNet  Google Scholar 

  3. Blumofe RD et al (1995) Cilk: an efficient multithreaded runtime system, ACM Sigplan Symposium on Principles and Practice of Parallel Programming ACM, 207–216

  4. Robison AD (2012) Cilk plus: Language support for thread and vector parallelism, Talk at HP-CAST, 18–25

  5. Acar UA, Blelloch GE, Blumofe RD (2002) The data locality of work stealing. Theory of Computing Systems 35(3):321–347

    Article  MathSciNet  Google Scholar 

  6. Reinders J (2007) Intel threading building blocks: outfitting c++ for multi-core processor parallelism, O’Reilly Media, Inc

  7. Guo Yi et al (2010) SLAW: A scalable locality-aware adaptive work-stealing scheduler, IEEE International Symposium on Parallel & Distributed Processing IEEE, 341–342

  8. Guo Yi et al (2009) Work-first and help-first scheduling policies for async-finish task parallelism, IEEE International Symposium on Parallel & distributed Processing IEEE Computer Society, 1–12

  9. Chen Quan, Guo M, Huang Z (2012) CATS: Cache Aware task-stealing based on online profiling in multi-socket multi-core architectures, ACM International Conference on Supercomputing ACM, 163–172

  10. Chen Quan, Guo M, Guan H (2014) LAWS: Locality-Aware work-stealing for multi-socket multi-core architectures, ACM International Conference on Supercomputing ACM, 3–12

  11. Augonnet Cdric, et al (2009) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency & Computation Practice & Experience 23(2):187–198

    Article  Google Scholar 

  12. Bosilca G et al (2011) DAGUe: a generic distributed dag engine for high performance computing, IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum IEEE, 1151–1158

  13. Gautier T et al (2013) XKAapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, IEEE, International Symposium on Parallel & Distributed Processing IEEE, 1299–1308

  14. Maglalang J, Krishnamoorthy S, Agrawal K (2016) Locality-aware dynamic task graph scheduling

  15. Li D et al (2010) Superscalar communication: a runtime optimization for distributed applications. Science China Information Sciences 53(10):1931–1946

    Article  Google Scholar 

  16. Demmel JW, Higham NJ, Schreiber RS (2010) Stability of block l u factorization. Numerical Linear Algebra with Applications 2(2):173–190

    Article  MathSciNet  Google Scholar 

  17. Agullo E et al (2009) Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, 012037

  18. Buttari A et al (2007) Parallel tiled QR factorization for multicore architectures. Concurrency & Computation Practice & Experience 20(13):1573C1590

    Google Scholar 

  19. Chen J et al (2013) Block Algorithm and Its Implementation for Cholesky Factorization, 232–236

  20. Squillante MS, Nelson RD (1991) Analysis of task migration in shared-memory multiprocessor scheduling, ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems DBLP, 143–155

  21. Mitzenmacher M (1998) Analyses of load stealing models based on differential equations, 10th ACM Symposium on Parallel Algorithms and Architectures ACM, 212–221

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofei Liao.

Additional information

This article is part of the Topical Collection: Special Issue on Software Defined Networking: Trends, Challenges and Prospective Smart Solutions

Guest Editors: Ahmed E. Kamal, Liangxiu Han, Sohail Jabbar, and Liu Lu

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, C., Liao, X. & Jin, H. Enhancing application performance via DAG-driven scheduling in task parallelism for cloud center. Peer-to-Peer Netw. Appl. 12, 381–391 (2019). https://doi.org/10.1007/s12083-017-0576-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12083-017-0576-2

Keywords

Navigation