FunctionFlow: coordinating parallel tasks

Fan, Xuepeng; Liao, Xiaofei; Jin, Hai

doi:10.1007/s11704-016-6286-8

FunctionFlow: coordinating parallel tasks

Research Article
Published: 27 April 2018

Volume 13, pages 73–85, (2019)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Xuepeng Fan¹,
Xiaofei Liao¹ &
Hai Jin¹

91 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

With the growing popularity of task-based parallel programming, nowadays task-parallel programming libraries and languages are still with limited support for coordinating parallel tasks. Such limitation forces programmers to use additional independent components to coordinate the parallel tasks — the components can be third-party libraries or additional components in the same programming library or language. Moreover, mixing tasks and coordination components increase the difficulty of task-based programming, and blind schedulers for understanding tasks’ dependencies.

In this paper, we propose a task-based parallel programming library, FunctionFlow, which coordinates tasks in the purpose of avoiding additional independent coordination components. First, we use dependency expression to represent ubiquitous tasks’ termination. The key idea behind dependency expression is to use && for both task’s termination and || for any task termination, along with the combination of dependency expressions. Second, as runtime support, we use a lightweight representation for dependency expression. Also, we use suspended-task queue to schedule tasks that still have prerequisites to run.

Finally, we demonstrate FunctionFlow’s effectiveness in two aspects, case study about implementing popular parallel patterns with FunctionFlow, and performance comparision with state-of-the-art practice, TBB. Our demonstration shows that FunctionFlow can generally coordinate parallel tasks without involving additional components, along with comparable performance with TBB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing application performance via DAG-driven scheduling in task parallelism for cloud center

Article 16 June 2017

Functional Programming Interface for Parallel and Distributed Computing

Towards a Cost Model to Optimize User-Defined Functions in an ETL Workflow Based on User-Defined Performance Metrics

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Reinders J. Intel Threading Building Blocks: Outfitting C++ for Multicore Processor Parallelism. Sebastopol, CA: O’Reilly Media, Inc., 2007
Google Scholar
Leijen D, Schulte W, Burckhardt S. The design of a task parallel library. In: Proceedings of ACM Annual Conference on Object Oriented Programming Systems, Languages, and Applications. 2009, 227–242
Google Scholar
Kambadur P, Gupta A, Ghoting A, Avron H, Lumsdaine A. PFunc: modern task parallelism for modern high performance computing. In: Proceedings of ACM Conference on High Performance Computing Networking, Storage and Analysis. 2009, 1–11
Google Scholar
Frigo M, Leiserson C E, Randall K H. The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices, 1998, 33(5): 212–223
Article Google Scholar
Dagum L, Menon R. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering, 2002, 5(1): 46–55
Article Google Scholar
Saraswat V, Sarkar V, von Praun C. X10: concurrent programming for modern architectures. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2007, 271
Google Scholar
Chamberlain B L, Callahan D, Zima H P. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications, 2013, 21(3): 291–312
Article Google Scholar
Imam S, Sarkar V. Cooperative scheduling of parallel tasks with general synchronization patterns. In: Proceedings of the 24th European Conference on Object-Oriented Programming. 2014, 618–643
Google Scholar
Saad Y. Iterative Methods for Sparse Linear Systems. 2nd ed. Philadelphia: SIAM, 2003
Book MATH Google Scholar
Alexandrescu A. Modern C++ Design: Generic Programming and Design Patterns Applied. Addison Wesley, 2001
Google Scholar
Pyla H K, Ribbens C, Varadarajan S. Exploiting coarse-grain speculative parallelism. ACM SIGPLAN Notices, 2011, 46(10): 555–574
Article Google Scholar
Kazi I H, Lilja D J. Coarse-grained thread pipelining: a speculative parallel execution model for shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 2001, 12(9): 952–966
Article Google Scholar
Li S, Hu C, Zhang J, Zhang Y. Automatic tuning of sparse matrixvector multiplication on multi core clusters. Science in China Series F: Information Sciences, 2015, 58(9): 1–14
Google Scholar
Zhang F, Qiao X, Liu Z. Parallel divide and conquer bio-sequence comparison based on smith-waterman algorithm. Science in China Series F: Information Sciences, 2004, 47(2): 221–231
MathSciNet MATH Google Scholar
Chi C C, Juurlink B, Meenderinck C. Evaluation of parallel H.264 decoding strategies for the cell broadband engine. In: Proceedings of the 24th ACM International Conference on Supercomputing. 2010, 105–114
Chapter Google Scholar
Subhlok J, Stichnoth J M, Ohallaron D R, Gross T. Exploiting task and data parallelism on a multicomputer. In: Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1993, 13–22
Google Scholar
Chase D, Lev Y. Dynamic circular work-stealing deque. In: Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures. 2005, 21–28
Google Scholar
Dechev D, Pirkelbauer P, Stroustrup B. Understanding and effectively preventing the ABA problem in descriptor-based lock-free designs. In: Proceedings of the 13th IEEE International Symposiumon Object/Component/Service-Oriented Real-Time Distributed Computing. 2010, 185–192
Google Scholar
Herlihy M. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 1991, 13(1): 124–149
Article Google Scholar
Bienia C, Li K. PARSEC 2.0: a new benchmark suite for chip-multiprocessors. In: Proceedings of the 5th AnnualWorkshop on Modeling Benchmarking and Simulation. 2009
Google Scholar
Woo S C, Ohara M, Torrie E, Singh J P, Gupta A. The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd ACM Annual International Symposium on Computer Architecture. 1995, 24–36
Google Scholar
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S. Scalable k-means++. Very Large Data Bases Endowment, 2012, 5(7): 622–633
Google Scholar
Luo Y, Duraiswami R. Canny edge detection on NVIDIA CUDA. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2008, 1–8
Google Scholar
Spatz C. Basic Statistics: Tales of Distributions. Belmont: Wads worth Cengage Learning, 1981
MATH Google Scholar
Zhou J, Demsky B. Bamboo: a data-centric, object-oriented approach to many-core software. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation. 2010, 388–399
Google Scholar
Tzenakis G, Papatriantafyllou A, Vandierendonck H, Pratikakis P, Nikolopoulos D S. BDDT: blocklevel dynamic dependence analysis for task-based parallelism. Lecture Notes in Computer Science, 2013, 8299: 17–31
Article Google Scholar
Lam M S, Rinard M C. Coarse-grain parallel programming in Jade. In: Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1991, 94–105
Google Scholar
Perez J M, Badia R M, Labarta J. A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of the 9th IEEE International Conference on Cluster Computing. 2008, 142–151
Google Scholar
Chatterjee S S, Gururaj R. Lazy-parallel function calls for automatic parallelization. In: Proceedings of the 1st International Conference on Computational Intelligence and Information Technology. 2011, 811–816
Chapter Google Scholar
Aldinucci M, Danelutto M, Kilpatrick P, Torquati M. Fastflow: highlevel and efficient streaming on multi-core. In: Pllana S, Xhafa F, eds. Programming Multi-core and Many-core Computing Systems. Parallel and Distributed Computing, Chapter 13. Wiley, 2014
Google Scholar
Tasirlar S, Sarkar V. Data-driven tasks and their implementation. In: Proceedings of the 40th IEEE International Conference on Parallel Processing. 2011, 652–661
Google Scholar
Fan X, Jin H, Zhu L, Liao X, Ye C, Tu X. Function flow: making synchronization easier in task parallelism. In: Proceedings of the 2012 ACM International Workshop on Programming Models and Applications for Multicores and Manycores. 2012, 74–82
Google Scholar
Kwok Y K, Ahmad I. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys, 1999, 31(4): 406–471
Article Google Scholar
Guo Y, Barik R, Raman R, Sarkar V. Work-first and help-first scheduling policies fora sync-finish task parallelism. In: Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing Symposium. 2009, 1–12
Google Scholar
Tardieu O, Wang H, Lin H. A work-stealing scheduler for X10’s task parallelism with suspension. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2012, 267–276
Google Scholar
Xia Y, Prasanna V K, Li J. Hierarchical scheduling of DAG structured computations on manycore processors with dynamic thread grouping. In: Proceedings of the 15th Workshop on Job Scheduling Strategies for Parallel Processing. 2010, 154–174
Chapter Google Scholar
Ahmad I, Kwok Y K, Wu M Y. Analysis, evaluation, and comparison of algorithms for scheduling task graphs onparallel processors. In: Proceedings of the 2nd IEEE International Symposium on Parallel Architectures, Algorithms, and Networks. 1996, 207–213
Google Scholar
Agarwal S, Barik R, Bonachea D, Sarkar V, Shyamasundar R K, Yelick K. Deadlock-free scheduling of X10 computations with bounded resources. In: Proceedings of the 19th Annual ACM symposium on Parallel Algorithms and Architectures. 2007, 229–240
Google Scholar
Agrawal K, Leiserson C E, Sukha J. Executing task graphs using workstealing. In: Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing Symposium. 2010, 1–12
Google Scholar

Download references

Acknowledgements

This paper was supported by the National High-Tech Research and Development Program of China (2015AA015303), and the National Natural Science Foundation of China (Grant No. 61732010).

Author information

Authors and Affiliations

Services Computing Technology and System Lab (SCTS) & Cluster and Grid Computing Lab (CGCL), School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Xuepeng Fan, Xiaofei Liao & Hai Jin

Authors

Xuepeng Fan
View author publications
Search author on:PubMed Google Scholar
Xiaofei Liao
View author publications
Search author on:PubMed Google Scholar
Hai Jin
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Hai Jin.

Additional information

Xuepeng Fan is a PhD student in computer science at Huazhong University of Science and Technology (HUST), China. He received his BS degree in HUST in 2009. His research interest focuses on performance issues and building parallel computing systems, including multicore system and distributed system.

Xiaofei Liao received a PhD degree in computer science and engineering from Huazhong University of Science and Technology (HUST), China in 2005. He is now a professor in the School of Computer Science and Engineering at HUST. His research interests are in the areas of system virtualization, system software, and Cloud computing.

Hai Jin is a Cheung Kung Scholars Chair Professor of computer science and engineering at Huazhong University of Science and Technology (HUST), China. Jin received his PhD in computer engineering from HUST in 1994. In 1996, he was awarded a German Academic Exchange Service fellowship to visit the Technical University of Chemnitz, Germany. Jin worked at The University of Hong Kong, China between 1998 and 2000, and as a visiting scholar at the University of Southern California, USA between 1999 and 2000. He was awarded Excellent Youth Award from the National Science Foundation of China in 2001. Jin’s research interests include computer architecture, virtualization technology, cluster computing and cloud computing, peer-to-peer computing, network storage, and network security.

Electronic supplementary material

Supplementary material, approximately 463 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, X., Liao, X. & Jin, H. FunctionFlow: coordinating parallel tasks. Front. Comput. Sci. 13, 73–85 (2019). https://doi.org/10.1007/s11704-016-6286-8

Download citation

Received: 29 May 2016
Accepted: 26 September 2016
Published: 27 April 2018
Issue Date: February 2019
DOI: https://doi.org/10.1007/s11704-016-6286-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FunctionFlow: coordinating parallel tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing application performance via DAG-driven scheduling in task parallelism for cloud center

Functional Programming Interface for Parallel and Distributed Computing

Towards a Cost Model to Optimize User-Defined Functions in an ETL Workflow Based on User-Defined Performance Metrics

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 463 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now