Skip to main content
Log in

SCnC: Efficient Unification of Streaming with Dynamic Task Parallelism

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Stream processing is a special form of the dataflow execution model that offers extensive opportunities for optimization and automatic parallelization. To take full advantage of the paradigm programmers are typically required to learn a new language and re-implement their applications. This work shows that it is possible to exploit streaming as a safe and automatic optimization of a more general dataflow-based model—one in which computation kernels are written in standard, general-purpose languages and organized as a coordination graph. We propose streaming concurrent collections (SCnC), a streaming system that can efficiently run a subset of programs supported by concurrent collections (CnC). CnC is a general purpose parallel programming paradigm that integrates task parallelism and dataflow computing. The proposed streaming support allows application developers to reason about their program as a general dataflow graph, while benefiting from the performance and tight memory footprint of stream parallelism when their program satisfies streaming constraints. In this paper, we formally define the application requirements for using SCnC, and outline a static decision procedure for identifying and processing eligible SCnC subgraphs. We present initial results showing that transitioning from general CnC to SCnC leads to a throughput increase of up to 40\(\times \) for certain benchmarks, and also enables programs with large data sizes to execute in available memory for cases where CnC execution may run out of memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The OpenStream system, developed concurrently with this work, has a similar feature, but the OpenStream state cannot be inferred from stream accesses.

  2. If there are multiple consumer functions, all combinations must be considered and only the maximum buffer size obtained is safe.

  3. As shown in Sect. 3.2, the control graph is a tree, so there is only one such path.

References

  1. Thies, W., Karczmarek, M., Amarasinghe, S.P.: Streamit: a language for streaming applications. In: CC ’02, pp. 179–196. Springer, London

  2. Budimlic, Z., Burke, M., Cavé, V., Knobe, K., Lowney, G., Newton, R., Palsberg, J., Peixotto, D.M., Sarkar, V., Schlimbach, F., Tasirlar, S.: Concurrent collections. Sci. Program. 18(3–4), 203–207 (2010)

    Google Scholar 

  3. Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46, 720–748 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  4. Agarwal, S., Barik, R., Bonachea, D., Sarkar, V., Shyamasundar, R.K., Yelick, K.: Deadlock-free scheduling of X10 computations with bounded resources. In: SPAA ’07 ACM, New York

  5. Guo, Y., Barik, R., Raman, R., Sarkar, V.: Work-first and help-first scheduling policies for async-finish task parallelism. In: IPDPS’09

  6. MathWorks Symbolic Math Toolbox Documentation. http://www.mathworks.com/help/symbolic/index.html. Accessed Feb 2015

  7. Li, P., Agrawal, K., Buhler, J., Chamberlain, R.D.: Deadlock avoidance for streaming computations with filtering. In: SPAA ’10

  8. Li, P., Agrawal, K., Buhler, J., Chamberlain, R.D., Lancaster, J.M.: Deadlock-avoidance for streaming applications with split-join structure: two case studies. In: ASAP, pp. 333–336 (2010)

  9. Soul, R., Gordon, M.I., Amarasinghe, S., Grimm, R., Hirzel, M.: Hitting the Sweet Spot for Streaming Languages. NY University CS Technical Report TR2012-948 (2009)

  10. Cavé, V., Zhao, J., Shirako, J., Sarkar, V.: Habanero-java: the new adventures of old X10. In: Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, PPPJ ’11 (2011)

  11. Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In: ICS ’08, pp. 277–288, ACM, New York

  12. Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phaser accumulators: a new reduction construct. In: IPDPS 09

  13. Georges, A., Buytaert, D., Eeckhout, L.: Statistically rigorous java performance evaluation. In: OOPSLA’07, pp. 57–76. ACM

  14. Meyerson, A.: Online facility location. In: FOCS ’01

  15. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–698 (1986)

    Article  Google Scholar 

  16. Nijhuis, M., Bos, H., Bal, H.E.:A component-based coordination language for efficient reconfigurable streaming applications. In: ICPP (2007)

  17. Nijhuis, M.: Framework for parallel streaming applications. Ph.D. dissertation (2007)

  18. Auerbach, J., Bacon, D.F., Cheng, P., Rabbah, R.: Lime: a java-compatible and synthesizable language for heterogeneous architectures. In: OOPSLA ’10, pp. 89–108, ACM, New York

  19. Liao, S., Du, Z., Wu, G., Lueh, G.-Y.: Data and computation transformations for brook streaming applications on multiprocessors. In: CGO ’06, pp. 196–207, IEEE Computer Society, Washington

  20. Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for gpus: stream computing on graphics hardware. In: SIGGRAPH ’04, pp. 777–786, ACM, New York (2004)

  21. Aoyagi, Y., Uehara, M., Mori, H.: A case study on predictive method of task allocation in stream-based computing. In: Proceedings of the 13th International Conference on Information Networking, ICOIN ’98

  22. Collins, R.L., Carloni, L.P.: Flexible filters: load balancing through backpressure for stream programs. In: EMSOFT ’09

  23. Aleen, F., Sharif, M., Pande, S.: Input-driven dynamic execution prediction of streaming applications. In: PPoPP ’10, pp. 315–324

  24. Miranda, C., Pop, A., Dumont, P., Cohen, A., Duranton, M.: Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes. In: CASES ’10, pp. 11–20. ACM

  25. Vandierendonck, H., Tzenakis, G., Nikolopoulos, D.S.: A unified scheduler for recursive and task dataflow parallelism. In: PACT ’11

  26. Pop, A., Cohen, A.: Openstream: expressiveness and data-flow compilation of openmp streaming programs. In: TACO ’13

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dragoş Sbîrlea.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sbîrlea, D., Shirako, J., Newton, R. et al. SCnC: Efficient Unification of Streaming with Dynamic Task Parallelism. Int J Parallel Prog 44, 233–256 (2016). https://doi.org/10.1007/s10766-015-0353-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-015-0353-x

Keywords

Navigation