Abstract
With the increasing amount of parallelism obtainable on multicore platforms, stream programming has been proposed as an effective solution for exposing distributed parallelization. Nonetheless, a pressing demand of scheduling task and data parallelism in stream programming exists that can accomplish robust multicore performance in the face of varying application characteristics. This paper addresses the problem of scheduling task and data parallelism in stream programming. We present StreamMDE, an asynchronous concurrency stream programming framework which offers a novel parallel programming model for scheduling task and data parallelism in the message-driven execution paradigm. A key property of this framework is exposing controlled-grained parallelism, which allows us to control the granularity of task and data parallelism in stream graph. Our empirical evaluation of StreamMDE shows that higher efficiency of mixed task and data parallelism in stream programming can be exploited with the appropriate granularity control. The framework bridges the gap between the parallel scale and the architecture of stream programs and facilitates in designing and coding stream features in different schedules.














Similar content being viewed by others
References
Diaz J, Munoz-Caro C, Nino A (2012) A survey of parallel programming models and tools in the multi and many-core era. IEEE Trans Parallel Distrib Syst 23:1369–1386. doi:10.1109/TPDS.2011.308
Christadler I, Erbacci G, Simpson AD (2012) Performance and productivity of new programming languages. In: Facing the multicore-challenge II. Springer, Berlin, pp 24–35. doi:10.1007/978-3-642-30397-5_3
Mei, C (2012) Message-driven parallel language runtime design and optimizations for multicore-based massively parallel machines. In: Dissertations and theses-computer science. http://hdl.handle.net/2142/34238
Reinders, J (2012) Only the first steps of the parallel evolution have been taken thus far. In: Facing the multicore-challenge II. Springer, Berlin, pp 1–9. doi:10.1007/978-3-642-30397-5_1
Erez M, Ahn JH, Gummaraju J, Rosenblum M, Dally WJ (2007) Executing irregular scientific applications on stream architectures. In: ACM proceedings of the 21st annual international conference on supercomputing, pp 93–104
Buck I, Foley T, Horn D, Sugerman J, Fatahalian K, Houston M, Hanrahan P (2004) Brook for GPUs: stream computing on graphics hardware. ACM Trans Graphics (TOG) 23:777–786
Chen MK, Li XF, Lian R, Lin JH, Liu L, Liu T, Ju R (2005) Shangri-La: achieving high performance from compiled network applications while enabling ease of programming. In: ACM SIGPLAN Notices, pp 224–236
Nvidia CUDA (2007) Compute unified device architecture programming guide
Thies W, Karczmarek M, Amarasinghe S (2002) StreamIt: a language for streaming applications. Compiler construction. Springer, Berlin, pp 179–196
Edwards SA, Vasudevan N, Tardieu O (2008) Programming shared memory multiprocessors with deterministic message-passing concurrency: compiling SHIM to Pthreads. In: Design, automation and test in Europe, 2008. DATE ’08, pp 1498–1503. doi:10.1109/DATE.2008.4484886
Zhuravlev S, Saez JC, Blagodurov S, Fedorova A, Prieto M (2012) Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Comput Surv 45:4:1–4:28. doi:10.1145/2379776.2379780
Tang L (2013) The study on resource competition in message-driven system. In: 2013 fifth international conference on computational and information sciences (ICCIS), pp 1607–1610. doi:10.1109/ICCIS.2013.421
Dooley I, Chao M, Lifflander J, Kale LV (2010) A study of memory-aware scheduling in message driven parallel programs. In: 2010 international conference on high performance computing (HiPC), pp 1–10. doi:10.1109/HIPC.2010.5713177
Kale LV, Krishnan S (1993) CHARM++: A portable concurrent object oriented system based on C++. In: Proceedings of the eighth annual conference on object-oriented programming systems, languages, and applications, New York, pp 91–108. doi:10.1145/165854.165874
Grsoy A, Laxmikant VK (2004) Performance and modularity benefits of message-driven execution. J Parallel Distrib Comput 64:461–480. doi:10.1016/j.jpdc.2004.03.006
Willcock JJ, Hoefler T, Edmonds NG, Lumsdaine A (2010) AM++: a generalized active message framework. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques, Vienna, Austria, pp 401–410. doi:10.1145/1854273.1854323
Guha S, Gunopulos D, Koudas N (2003) Correlating synchronous and asynchronous data streams. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 529–534
Lee E, Messerschmitt DG (1987) Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans Comput 100(1):24–35
Murthy PK, Lee EA (2002) Multidimensional synchronous data flow. IEEE Trans Signal Process 50(8):2064–2079
West EA, Grimshaw AS (1995) Braid: integrating task and data parallelism. In: Proceedings of fifth symposium on the frontiers of massively parallel computation, pp 211–219. doi: 10.1109/FMPC.1995.380446
Lee EA, Messerschmitt DG (1987) Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans Comput 100:24–35
Thies W, Karczmarek M, Amarasinghe S (2002) StreamIt: A language for streaming applications. In: Springer compiler construction, pp 179–196
Chao L-F, Sha EH-M (1997) Scheduling data-flow graphs via retiming and unfolding. IEEE Trans Parallel Distrib Syst 8:1259–1267
Gordon Michael I, Thies W, Amarasinghe S (2006) Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. ACM SIGARCH 34:151–162. doi:10.1145/1168919.1168877
Halbwachs N, Caspi P, Raymond P, Pilaud D (1991) The synchronous data flow programming language LUSTRE. Proc IEEE 79:1305–1320
Spring JH, Privat J, Guerraoui R, Vitek J (2007) Streamflex: high-throughput stream programming in java. ACM SIGPLAN Notices 42:211–228
Wolf W, Jerraya AA, Martin G (2008) Multiprocessor system-on-chip (MPSoC) technology. IEEE Trans Comput Aided Des Integr Circuits Syst 27:1701–1713
Wolsey LA (1998) Integer programming. Wiley New York, p 42
Ruggiero M, Guerri A, Bertozzi D, Poletti F, Milano M (2006) Communication-aware allocation and scheduling framework for stream-oriented multi-processor systems-on-chip. IEEE Des Autom Test Europe 1:6–37
Danny K (2013) C11: A new C standard aiming at safer programming. http://blog.smartbear.com/codereviewer/c11-a-new-c-standard-aiming-at-safer-programming
Bjarne S (2013) C++11–the new ISO C++ standard. http://www.stroustrup.com/C++11FAQ.html
Combinable Class. Microsoft. http://msdn.microsoft.com/en-us/library/dd492850.aspx
Gordon MI, Thies W, Karczmarek M, Lin J, Meli AS, Lamb AA, Leger C, Wong J, Hoffmann H, Maze D, Amarasinghe S (2002) A stream compiler for communication-exposed architectures. ACM SIGARCH 30:291–303. doi:10.1145/635506.605428
Kapasi UJ, Rixner S, Dally WJ, Khailany B, Ahn JH, Mattson P, Owens JD (2003) Programmable stream processors. IEEE Comput 36:54–62
Andrews J, Baker N (2006) Xbox 360 system architecture. IEEE Micro 26:25–37
Hernández AF (2013) Yet another survey on SIMD instructions
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Su, Y., Shi, F., Talpur, S. et al. Exploiting controlled-grained parallelism in message-driven stream programs. J Supercomput 70, 488–509 (2014). https://doi.org/10.1007/s11227-014-1264-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1264-0