Abstract
Stream processing is a special form of the dataflow execution model that offers extensive opportunities for optimization and automatic parallelization. To take full advantage of the paradigm programmers are typically required to learn a new language and re-implement their applications. This work shows that it is possible to exploit streaming as a safe and automatic optimization of a more general dataflow-based model—one in which computation kernels are written in standard, general-purpose languages and organized as a coordination graph. We propose streaming concurrent collections (SCnC), a streaming system that can efficiently run a subset of programs supported by concurrent collections (CnC). CnC is a general purpose parallel programming paradigm that integrates task parallelism and dataflow computing. The proposed streaming support allows application developers to reason about their program as a general dataflow graph, while benefiting from the performance and tight memory footprint of stream parallelism when their program satisfies streaming constraints. In this paper, we formally define the application requirements for using SCnC, and outline a static decision procedure for identifying and processing eligible SCnC subgraphs. We present initial results showing that transitioning from general CnC to SCnC leads to a throughput increase of up to 40\(\times \) for certain benchmarks, and also enables programs with large data sizes to execute in available memory for cases where CnC execution may run out of memory.
Similar content being viewed by others
Notes
The OpenStream system, developed concurrently with this work, has a similar feature, but the OpenStream state cannot be inferred from stream accesses.
If there are multiple consumer functions, all combinations must be considered and only the maximum buffer size obtained is safe.
As shown in Sect. 3.2, the control graph is a tree, so there is only one such path.
References
Thies, W., Karczmarek, M., Amarasinghe, S.P.: Streamit: a language for streaming applications. In: CC ’02, pp. 179–196. Springer, London
Budimlic, Z., Burke, M., Cavé, V., Knobe, K., Lowney, G., Newton, R., Palsberg, J., Peixotto, D.M., Sarkar, V., Schlimbach, F., Tasirlar, S.: Concurrent collections. Sci. Program. 18(3–4), 203–207 (2010)
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46, 720–748 (1999)
Agarwal, S., Barik, R., Bonachea, D., Sarkar, V., Shyamasundar, R.K., Yelick, K.: Deadlock-free scheduling of X10 computations with bounded resources. In: SPAA ’07 ACM, New York
Guo, Y., Barik, R., Raman, R., Sarkar, V.: Work-first and help-first scheduling policies for async-finish task parallelism. In: IPDPS’09
MathWorks Symbolic Math Toolbox Documentation. http://www.mathworks.com/help/symbolic/index.html. Accessed Feb 2015
Li, P., Agrawal, K., Buhler, J., Chamberlain, R.D.: Deadlock avoidance for streaming computations with filtering. In: SPAA ’10
Li, P., Agrawal, K., Buhler, J., Chamberlain, R.D., Lancaster, J.M.: Deadlock-avoidance for streaming applications with split-join structure: two case studies. In: ASAP, pp. 333–336 (2010)
Soul, R., Gordon, M.I., Amarasinghe, S., Grimm, R., Hirzel, M.: Hitting the Sweet Spot for Streaming Languages. NY University CS Technical Report TR2012-948 (2009)
Cavé, V., Zhao, J., Shirako, J., Sarkar, V.: Habanero-java: the new adventures of old X10. In: Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, PPPJ ’11 (2011)
Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In: ICS ’08, pp. 277–288, ACM, New York
Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phaser accumulators: a new reduction construct. In: IPDPS 09
Georges, A., Buytaert, D., Eeckhout, L.: Statistically rigorous java performance evaluation. In: OOPSLA’07, pp. 57–76. ACM
Meyerson, A.: Online facility location. In: FOCS ’01
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–698 (1986)
Nijhuis, M., Bos, H., Bal, H.E.:A component-based coordination language for efficient reconfigurable streaming applications. In: ICPP (2007)
Nijhuis, M.: Framework for parallel streaming applications. Ph.D. dissertation (2007)
Auerbach, J., Bacon, D.F., Cheng, P., Rabbah, R.: Lime: a java-compatible and synthesizable language for heterogeneous architectures. In: OOPSLA ’10, pp. 89–108, ACM, New York
Liao, S., Du, Z., Wu, G., Lueh, G.-Y.: Data and computation transformations for brook streaming applications on multiprocessors. In: CGO ’06, pp. 196–207, IEEE Computer Society, Washington
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for gpus: stream computing on graphics hardware. In: SIGGRAPH ’04, pp. 777–786, ACM, New York (2004)
Aoyagi, Y., Uehara, M., Mori, H.: A case study on predictive method of task allocation in stream-based computing. In: Proceedings of the 13th International Conference on Information Networking, ICOIN ’98
Collins, R.L., Carloni, L.P.: Flexible filters: load balancing through backpressure for stream programs. In: EMSOFT ’09
Aleen, F., Sharif, M., Pande, S.: Input-driven dynamic execution prediction of streaming applications. In: PPoPP ’10, pp. 315–324
Miranda, C., Pop, A., Dumont, P., Cohen, A., Duranton, M.: Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes. In: CASES ’10, pp. 11–20. ACM
Vandierendonck, H., Tzenakis, G., Nikolopoulos, D.S.: A unified scheduler for recursive and task dataflow parallelism. In: PACT ’11
Pop, A., Cohen, A.: Openstream: expressiveness and data-flow compilation of openmp streaming programs. In: TACO ’13
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sbîrlea, D., Shirako, J., Newton, R. et al. SCnC: Efficient Unification of Streaming with Dynamic Task Parallelism. Int J Parallel Prog 44, 233–256 (2016). https://doi.org/10.1007/s10766-015-0353-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-015-0353-x