Skip to main content

Advertisement

Log in

Exploiting controlled-grained parallelism in message-driven stream programs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With the increasing amount of parallelism obtainable on multicore platforms, stream programming has been proposed as an effective solution for exposing distributed parallelization. Nonetheless, a pressing demand of scheduling task and data parallelism in stream programming exists that can accomplish robust multicore performance in the face of varying application characteristics. This paper addresses the problem of scheduling task and data parallelism in stream programming. We present StreamMDE, an asynchronous concurrency stream programming framework which offers a novel parallel programming model for scheduling task and data parallelism in the message-driven execution paradigm. A key property of this framework is exposing controlled-grained parallelism, which allows us to control the granularity of task and data parallelism in stream graph. Our empirical evaluation of StreamMDE shows that higher efficiency of mixed task and data parallelism in stream programming can be exploited with the appropriate granularity control. The framework bridges the gap between the parallel scale and the architecture of stream programs and facilitates in designing and coding stream features in different schedules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Diaz J, Munoz-Caro C, Nino A (2012) A survey of parallel programming models and tools in the multi and many-core era. IEEE Trans Parallel Distrib Syst 23:1369–1386. doi:10.1109/TPDS.2011.308

    Article  Google Scholar 

  2. Christadler I, Erbacci G, Simpson AD (2012) Performance and productivity of new programming languages. In: Facing the multicore-challenge II. Springer, Berlin, pp 24–35. doi:10.1007/978-3-642-30397-5_3

  3. Mei, C (2012) Message-driven parallel language runtime design and optimizations for multicore-based massively parallel machines. In: Dissertations and theses-computer science. http://hdl.handle.net/2142/34238

  4. Reinders, J (2012) Only the first steps of the parallel evolution have been taken thus far. In: Facing the multicore-challenge II. Springer, Berlin, pp 1–9. doi:10.1007/978-3-642-30397-5_1

  5. Erez M, Ahn JH, Gummaraju J, Rosenblum M, Dally WJ (2007) Executing irregular scientific applications on stream architectures. In: ACM proceedings of the 21st annual international conference on supercomputing, pp 93–104

  6. Buck I, Foley T, Horn D, Sugerman J, Fatahalian K, Houston M, Hanrahan P (2004) Brook for GPUs: stream computing on graphics hardware. ACM Trans Graphics (TOG) 23:777–786

    Article  Google Scholar 

  7. Chen MK, Li XF, Lian R, Lin JH, Liu L, Liu T, Ju R (2005) Shangri-La: achieving high performance from compiled network applications while enabling ease of programming. In: ACM SIGPLAN Notices, pp 224–236

  8. Nvidia CUDA (2007) Compute unified device architecture programming guide

  9. Thies W, Karczmarek M, Amarasinghe S (2002) StreamIt: a language for streaming applications. Compiler construction. Springer, Berlin, pp 179–196

    Chapter  Google Scholar 

  10. Edwards SA, Vasudevan N, Tardieu O (2008) Programming shared memory multiprocessors with deterministic message-passing concurrency: compiling SHIM to Pthreads. In: Design, automation and test in Europe, 2008. DATE ’08, pp 1498–1503. doi:10.1109/DATE.2008.4484886

  11. Zhuravlev S, Saez JC, Blagodurov S, Fedorova A, Prieto M (2012) Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Comput Surv 45:4:1–4:28. doi:10.1145/2379776.2379780

  12. Tang L (2013) The study on resource competition in message-driven system. In: 2013 fifth international conference on computational and information sciences (ICCIS), pp 1607–1610. doi:10.1109/ICCIS.2013.421

  13. Dooley I, Chao M, Lifflander J, Kale LV (2010) A study of memory-aware scheduling in message driven parallel programs. In: 2010 international conference on high performance computing (HiPC), pp 1–10. doi:10.1109/HIPC.2010.5713177

  14. Kale LV, Krishnan S (1993) CHARM++: A portable concurrent object oriented system based on C++. In: Proceedings of the eighth annual conference on object-oriented programming systems, languages, and applications, New York, pp 91–108. doi:10.1145/165854.165874

  15. Grsoy A, Laxmikant VK (2004) Performance and modularity benefits of message-driven execution. J Parallel Distrib Comput 64:461–480. doi:10.1016/j.jpdc.2004.03.006

    Article  Google Scholar 

  16. Willcock JJ, Hoefler T, Edmonds NG, Lumsdaine A (2010) AM++: a generalized active message framework. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques, Vienna, Austria, pp 401–410. doi:10.1145/1854273.1854323

  17. Guha S, Gunopulos D, Koudas N (2003) Correlating synchronous and asynchronous data streams. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 529–534

  18. Lee E, Messerschmitt DG (1987) Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans Comput 100(1):24–35

    Article  Google Scholar 

  19. Murthy PK, Lee EA (2002) Multidimensional synchronous data flow. IEEE Trans Signal Process 50(8):2064–2079

    Article  Google Scholar 

  20. West EA, Grimshaw AS (1995) Braid: integrating task and data parallelism. In: Proceedings of fifth symposium on the frontiers of massively parallel computation, pp 211–219. doi: 10.1109/FMPC.1995.380446

  21. Lee EA, Messerschmitt DG (1987) Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans Comput 100:24–35

    Article  Google Scholar 

  22. Thies W, Karczmarek M, Amarasinghe S (2002) StreamIt: A language for streaming applications. In: Springer compiler construction, pp 179–196

  23. Chao L-F, Sha EH-M (1997) Scheduling data-flow graphs via retiming and unfolding. IEEE Trans Parallel Distrib Syst 8:1259–1267

    Article  Google Scholar 

  24. Gordon Michael I, Thies W, Amarasinghe S (2006) Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. ACM SIGARCH 34:151–162. doi:10.1145/1168919.1168877

    Article  Google Scholar 

  25. Halbwachs N, Caspi P, Raymond P, Pilaud D (1991) The synchronous data flow programming language LUSTRE. Proc IEEE 79:1305–1320

    Article  Google Scholar 

  26. Spring JH, Privat J, Guerraoui R, Vitek J (2007) Streamflex: high-throughput stream programming in java. ACM SIGPLAN Notices 42:211–228

    Article  Google Scholar 

  27. Wolf W, Jerraya AA, Martin G (2008) Multiprocessor system-on-chip (MPSoC) technology. IEEE Trans Comput Aided Des Integr Circuits Syst 27:1701–1713

    Article  Google Scholar 

  28. Wolsey LA (1998) Integer programming. Wiley New York, p 42

  29. Ruggiero M, Guerri A, Bertozzi D, Poletti F, Milano M (2006) Communication-aware allocation and scheduling framework for stream-oriented multi-processor systems-on-chip. IEEE Des Autom Test Europe 1:6–37

    Google Scholar 

  30. Danny K (2013) C11: A new C standard aiming at safer programming. http://blog.smartbear.com/codereviewer/c11-a-new-c-standard-aiming-at-safer-programming

  31. Bjarne S (2013) C++11–the new ISO C++ standard. http://www.stroustrup.com/C++11FAQ.html

  32. Combinable Class. Microsoft. http://msdn.microsoft.com/en-us/library/dd492850.aspx

  33. Gordon MI, Thies W, Karczmarek M, Lin J, Meli AS, Lamb AA, Leger C, Wong J, Hoffmann H, Maze D, Amarasinghe S (2002) A stream compiler for communication-exposed architectures. ACM SIGARCH 30:291–303. doi:10.1145/635506.605428

    Article  Google Scholar 

  34. Kapasi UJ, Rixner S, Dally WJ, Khailany B, Ahn JH, Mattson P, Owens JD (2003) Programmable stream processors. IEEE Comput 36:54–62

  35. Andrews J, Baker N (2006) Xbox 360 system architecture. IEEE Micro 26:25–37

    Article  Google Scholar 

  36. Hernández AF (2013) Yet another survey on SIMD instructions

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Wei.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Su, Y., Shi, F., Talpur, S. et al. Exploiting controlled-grained parallelism in message-driven stream programs. J Supercomput 70, 488–509 (2014). https://doi.org/10.1007/s11227-014-1264-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1264-0

Keywords