Buffer Sizing for Self-timed Stream Programs on Heterogeneous Distributed Memory Multiprocessors

Carpenter, Paul M.; Ramirez, Alex; Ayguadé, Eduard

doi:10.1007/978-3-642-11515-8_9

Paul M. Carpenter²¹,
Alex Ramirez²¹ &
Eduard Ayguadé²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5952))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

1226 Accesses
3 Citations

Abstract

Stream programming is a promising way to expose concurrency to the compiler. A stream program is built from kernels that communicate only via point-to-point streams. The stream compiler statically allocates these kernels to processors, applying blocking, fission and fusion transformations. The compiler determines the sizes of the communication buffers, which affects performance since local memories can be small.

In this paper, we propose a feedback-directed algorithm that determines the size of each communication buffer, based on i) the stream program that has been mapped onto processors, ii) feedback from an earlier execution, and iii) the memory constraints. The algorithm exposes a trade-off between throughput and latency. It is general, in that it applies to stream programs with unstructured stream graphs, and it supports variable execution times and communication rates.

We show results for the StreamIt benchmarks and random graphs. For the StreamIt benchmarks, throughput is optimal after the first iteration. For random graphs with stochastic computation times, throughput is within 3% of optimal after four iterations. Compared with the previous general algorithm, by Basten and Hoogerbrugge, our algorithm has significantly better performance and latency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Olukotun, K., Hammond, L.: The future of microprocessors. Queue 3(7), 26–29 (2005)
Article Google Scholar
Pham, D., Behnen, E., Bolliger, M., Hofstee, H.: et al.: The design methodology and implementation of a first-generation Cell processor: a multi-core SoC. In: Custom Integrated Circuits Conference 2005, pp. 45–49 (2005)
Google Scholar
Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, pp. 114–124 (2008)
Google Scholar
Choi, Y., Lin, Y., Chong, N., Mahlke, S., Mudge, T.: Stream Compilation for Real-Time Embedded Multicore Systems. In: Proceedings of the 2009 International Symposium on Code Generation and Optimization, vol. 00, pp. 210–220 (2009)
Google Scholar
IST-034869, A.: Advanced Compiler Technologies for Embedded Streaming, http://www.hitech-projects.com/euprojects/ACOTES/
ACOTES: IST ACOTES Project Deliverable D2.2 Report on Streaming Programming Model and Abstract Streaming Machine Description Final Version (2008)
Google Scholar
Becchi, M., Crowley, P.: Dynamic thread assignment on heterogeneous multiprocessor architectures. In: Proceedings of the 3rd conference on Computing frontiers, pp. 29–40. ACM, New York (2006)
Chapter Google Scholar
Hofstee, H.P.: Power efficient processor architecture and the cell processor, pp. 258–262. IEEE Computer Society, Los Alamitos (2005)
Google Scholar
Parks, T.: Bounded scheduling of process networks. PhD thesis, University of California (1995)
Google Scholar
Buck, J.: Scheduling dynamic dataflow graphs with bounded memory using the token flow model. PhD thesis, University of California (1993)
Google Scholar
Geilen, M., Basten, T.: Requirements on the execution of Kahn process networks. LNCS, pp. 319–334. Springer, Heidelberg (2003)
Google Scholar
van der Wolf, P., de Kock, E., Henriksson, T., Kruijtzer, W., Essink, G.: Design and programming of embedded multiprocessors: an interface-centric approach. In: Proceedings of the 2nd international conference on Hardware/software codesign and system synthesis, pp. 206–217 (2004)
Google Scholar
Carpenter, P.M., Ramirez, A., Ayguade, E.: The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors. In: SAMOS Workshop 2009, pp. 12–23. Springer, Heidelberg (2009)
Google Scholar
Ito, K., Parhi, K.: Determining the minimum iteration period of an algorithm. The Journal of VLSI Signal Processing 11(3), 229–244 (1995)
Article Google Scholar
Dasdan, A., Gupta, R.: Faster maximum and minimum mean cycle algorithms for system-performance analysis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 17(10), 889–899 (1998)
Article Google Scholar
Govindarajan, R., Gao, G.: A novel framework for multi-rate scheduling in DSP applications. In: International Conference on Application-Specific Array Processors, pp. 77–88 (1993)
Google Scholar
Lee, E., Messerschmitt, D.: Synchronous data flow. Proceedings of the IEEE 75(9), 1235–1245 (1987)
Article Google Scholar
Lee, E.A.: A coupled hardware and software architecture for programmable digital signal processors (synchronous data flow). PhD thesis (1986)
Google Scholar
Karp, R.: A characterization of the minimum cycle mean in a digraph. Discrete Mathematics 23(3), 309–311 (1978)
Article MATH MathSciNet Google Scholar
Pollack, M.: The maximum capacity through a network. Operations Research, 733–736 (1960)
Google Scholar
Fredman, M., Tarjan, R.: Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM (J. ACM) 34(3), 596–615 (1987)
Article MathSciNet Google Scholar
Vassilevska, V., Williams, R., Yuster, R.: All-pairs bottleneck paths for general graphs in truly sub-cubic time. In: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pp. 585–589. ACM, New York (2007)
Chapter Google Scholar
Basten, T., Hoogerbrugge, J.: Efficient execution of process networks. Communicating Process Architectures (2001)
Google Scholar
Gordon, M., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. ASPLOS, 151–162 (2006)
Google Scholar
Carpenter, P.M., Ramirez, A., Ayguade, E.: Mapping Stream Programs onto Heterogeneous Multiprocessor Systems. In: CASES 2009, October 11-16 (2009)
Google Scholar
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 4th edn. Morgan Kaufmann, San Francisco (2007)
Google Scholar
Stuijk, S., Geilen, M., Basten, T.: Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs. In: Proceedings of the 43rd annual conference on Design automation, pp. 899–904 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Barcelona Supercomputing Center, C/Jordi Girona, 31, 08034, Barcelona, Spain
Paul M. Carpenter, Alex Ramirez & Eduard Ayguadé

Authors

Paul M. Carpenter
View author publications
You can also search for this author in PubMed Google Scholar
Alex Ramirez
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Ayguadé
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station C0803, TX 78712-0240, Austin, USA
Yale N. Patt
Dipartimento di Ingegneria della Informazione, Università di Pisa, Via Diotisalvi 2, 56100, Pisa, Italy
Pierfrancesco Foglia
IBM T.J.Watson Research Center, 19 Skyline Drive, NY 10532, Hawthorne, USA
Evelyn Duesterwald
Hewlett-Packard, Cami de Can Graells 1-21, Sant Cugat del Vallés, 08174, Barcelona, Spain
Paolo Faraboschi
Computer Architecture Department, Technical University of Catalunya (UPC), c/Jordi Girona 1-3, 08034, Barcelona, Spain
Xavier Martorell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carpenter, P.M., Ramirez, A., Ayguadé, E. (2010). Buffer Sizing for Self-timed Stream Programs on Heterogeneous Distributed Memory Multiprocessors. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-11515-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11514-1
Online ISBN: 978-3-642-11515-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics