Exploiting controlled-grained parallelism in message-driven stream programs

Su, Yan; Shi, Feng; Talpur, Shahnawaz; Wei, Jin; Tan, Hai

doi:10.1007/s11227-014-1264-0

Exploiting controlled-grained parallelism in message-driven stream programs

Published: 30 July 2014

Volume 70, pages 488–509, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yan Su¹,
Feng Shi¹,
Shahnawaz Talpur^1,2,
Jin Wei¹ &
…
Hai Tan¹

199 Accesses
Explore all metrics

Abstract

With the increasing amount of parallelism obtainable on multicore platforms, stream programming has been proposed as an effective solution for exposing distributed parallelization. Nonetheless, a pressing demand of scheduling task and data parallelism in stream programming exists that can accomplish robust multicore performance in the face of varying application characteristics. This paper addresses the problem of scheduling task and data parallelism in stream programming. We present StreamMDE, an asynchronous concurrency stream programming framework which offers a novel parallel programming model for scheduling task and data parallelism in the message-driven execution paradigm. A key property of this framework is exposing controlled-grained parallelism, which allows us to control the granularity of task and data parallelism in stream graph. Our empirical evaluation of StreamMDE shows that higher efficiency of mixed task and data parallelism in stream programming can be exploited with the appropriate granularity control. The framework bridges the gap between the parallel scale and the architecture of stream programs and facilitates in designing and coding stream features in different schedules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Diaz J, Munoz-Caro C, Nino A (2012) A survey of parallel programming models and tools in the multi and many-core era. IEEE Trans Parallel Distrib Syst 23:1369–1386. doi:10.1109/TPDS.2011.308
Article Google Scholar
Christadler I, Erbacci G, Simpson AD (2012) Performance and productivity of new programming languages. In: Facing the multicore-challenge II. Springer, Berlin, pp 24–35. doi:10.1007/978-3-642-30397-5_3
Mei, C (2012) Message-driven parallel language runtime design and optimizations for multicore-based massively parallel machines. In: Dissertations and theses-computer science. http://hdl.handle.net/2142/34238
Reinders, J (2012) Only the first steps of the parallel evolution have been taken thus far. In: Facing the multicore-challenge II. Springer, Berlin, pp 1–9. doi:10.1007/978-3-642-30397-5_1
Erez M, Ahn JH, Gummaraju J, Rosenblum M, Dally WJ (2007) Executing irregular scientific applications on stream architectures. In: ACM proceedings of the 21st annual international conference on supercomputing, pp 93–104
Buck I, Foley T, Horn D, Sugerman J, Fatahalian K, Houston M, Hanrahan P (2004) Brook for GPUs: stream computing on graphics hardware. ACM Trans Graphics (TOG) 23:777–786
Article Google Scholar
Chen MK, Li XF, Lian R, Lin JH, Liu L, Liu T, Ju R (2005) Shangri-La: achieving high performance from compiled network applications while enabling ease of programming. In: ACM SIGPLAN Notices, pp 224–236
Nvidia CUDA (2007) Compute unified device architecture programming guide
Thies W, Karczmarek M, Amarasinghe S (2002) StreamIt: a language for streaming applications. Compiler construction. Springer, Berlin, pp 179–196
Chapter Google Scholar
Edwards SA, Vasudevan N, Tardieu O (2008) Programming shared memory multiprocessors with deterministic message-passing concurrency: compiling SHIM to Pthreads. In: Design, automation and test in Europe, 2008. DATE ’08, pp 1498–1503. doi:10.1109/DATE.2008.4484886
Zhuravlev S, Saez JC, Blagodurov S, Fedorova A, Prieto M (2012) Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Comput Surv 45:4:1–4:28. doi:10.1145/2379776.2379780
Tang L (2013) The study on resource competition in message-driven system. In: 2013 fifth international conference on computational and information sciences (ICCIS), pp 1607–1610. doi:10.1109/ICCIS.2013.421
Dooley I, Chao M, Lifflander J, Kale LV (2010) A study of memory-aware scheduling in message driven parallel programs. In: 2010 international conference on high performance computing (HiPC), pp 1–10. doi:10.1109/HIPC.2010.5713177
Kale LV, Krishnan S (1993) CHARM++: A portable concurrent object oriented system based on C++. In: Proceedings of the eighth annual conference on object-oriented programming systems, languages, and applications, New York, pp 91–108. doi:10.1145/165854.165874
Grsoy A, Laxmikant VK (2004) Performance and modularity benefits of message-driven execution. J Parallel Distrib Comput 64:461–480. doi:10.1016/j.jpdc.2004.03.006
Article Google Scholar
Willcock JJ, Hoefler T, Edmonds NG, Lumsdaine A (2010) AM++: a generalized active message framework. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques, Vienna, Austria, pp 401–410. doi:10.1145/1854273.1854323
Guha S, Gunopulos D, Koudas N (2003) Correlating synchronous and asynchronous data streams. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 529–534
Lee E, Messerschmitt DG (1987) Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans Comput 100(1):24–35
Article Google Scholar
Murthy PK, Lee EA (2002) Multidimensional synchronous data flow. IEEE Trans Signal Process 50(8):2064–2079
Article Google Scholar
West EA, Grimshaw AS (1995) Braid: integrating task and data parallelism. In: Proceedings of fifth symposium on the frontiers of massively parallel computation, pp 211–219. doi: 10.1109/FMPC.1995.380446
Lee EA, Messerschmitt DG (1987) Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans Comput 100:24–35
Article Google Scholar
Thies W, Karczmarek M, Amarasinghe S (2002) StreamIt: A language for streaming applications. In: Springer compiler construction, pp 179–196
Chao L-F, Sha EH-M (1997) Scheduling data-flow graphs via retiming and unfolding. IEEE Trans Parallel Distrib Syst 8:1259–1267
Article Google Scholar
Gordon Michael I, Thies W, Amarasinghe S (2006) Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. ACM SIGARCH 34:151–162. doi:10.1145/1168919.1168877
Article Google Scholar
Halbwachs N, Caspi P, Raymond P, Pilaud D (1991) The synchronous data flow programming language LUSTRE. Proc IEEE 79:1305–1320
Article Google Scholar
Spring JH, Privat J, Guerraoui R, Vitek J (2007) Streamflex: high-throughput stream programming in java. ACM SIGPLAN Notices 42:211–228
Article Google Scholar
Wolf W, Jerraya AA, Martin G (2008) Multiprocessor system-on-chip (MPSoC) technology. IEEE Trans Comput Aided Des Integr Circuits Syst 27:1701–1713
Article Google Scholar
Wolsey LA (1998) Integer programming. Wiley New York, p 42
Ruggiero M, Guerri A, Bertozzi D, Poletti F, Milano M (2006) Communication-aware allocation and scheduling framework for stream-oriented multi-processor systems-on-chip. IEEE Des Autom Test Europe 1:6–37
Google Scholar
Danny K (2013) C11: A new C standard aiming at safer programming. http://blog.smartbear.com/codereviewer/c11-a-new-c-standard-aiming-at-safer-programming
Bjarne S (2013) C++11–the new ISO C++ standard. http://www.stroustrup.com/C++11FAQ.html
Combinable Class. Microsoft. http://msdn.microsoft.com/en-us/library/dd492850.aspx
Gordon MI, Thies W, Karczmarek M, Lin J, Meli AS, Lamb AA, Leger C, Wong J, Hoffmann H, Maze D, Amarasinghe S (2002) A stream compiler for communication-exposed architectures. ACM SIGARCH 30:291–303. doi:10.1145/635506.605428
Article Google Scholar
Kapasi UJ, Rixner S, Dally WJ, Khailany B, Ahn JH, Mattson P, Owens JD (2003) Programmable stream processors. IEEE Comput 36:54–62
Andrews J, Baker N (2006) Xbox 360 system architecture. IEEE Micro 26:25–37
Article Google Scholar
Hernández AF (2013) Yet another survey on SIMD instructions

Download references

Author information

Authors and Affiliations

Beijing Institute of Technology, Beijing, China
Yan Su, Feng Shi, Shahnawaz Talpur, Jin Wei & Hai Tan
Mehran University of Engineering and Technology, Jamshoro, Sindh, Pakistan
Shahnawaz Talpur

Authors

Yan Su
View author publications
You can also search for this author inPubMed Google Scholar
Feng Shi
View author publications
You can also search for this author inPubMed Google Scholar
Shahnawaz Talpur
View author publications
You can also search for this author inPubMed Google Scholar
Jin Wei
View author publications
You can also search for this author inPubMed Google Scholar
Hai Tan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jin Wei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Su, Y., Shi, F., Talpur, S. et al. Exploiting controlled-grained parallelism in message-driven stream programs. J Supercomput 70, 488–509 (2014). https://doi.org/10.1007/s11227-014-1264-0

Download citation

Published: 30 July 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s11227-014-1264-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting controlled-grained parallelism in message-driven stream programs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stream parallelism with ordered data constraints on multi-core systems

High-Level and Productive Stream Parallelism for Dedup, Ferret, and Bzip2

DSParLib: A C++ Template Library for Distributed Stream Parallelism

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Exploiting controlled-grained parallelism in message-driven stream programs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stream parallelism with ordered data constraints on multi-core systems

High-Level and Productive Stream Parallelism for Dedup, Ferret, and Bzip2

DSParLib: A C++ Template Library for Distributed Stream Parallelism

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now