ABSTRACT
Tuning applications for multicore systems involve subtle concurrency concepts and target-dependent optimizations. This paper advocates for a streaming execution model, called ER, where persistent processes communicate and synchronize through a multi-consumer processing applications, we demonstrate the scalability and efficiency advantages of streaming compared to data-driven scheduling. To exploit these benefits in compilers for parallel languages, we propose an intermediate representation enabling the compilation of data-flow tasks into streaming processes. This intermediate representation also facilitates the application of classical compiler optimizations to concurrent programs.
- G. Al-Kadi and A. S. Terechko. A hardware task scheduler for embedded video processing. In Proc. of the 4th Intl. Conf. on High Performance and Embedded Architectures and Compilers (HiPEAC'09), Paphos, Cyprus, Jan. 2009. Google ScholarDigital Library
- M. Aldinucci, M. Meneghin, and M. Torquati. Efficient Smith-Waterman on multi-core with FastFlow. In Euromicro Intl. Conf. on Parallel, Distributed and Network-Based Processing, pages 195--199, Pisa, Feb. 2010. Google ScholarDigital Library
- Arvind, R. S. Nikhil, and K. Pingali. I-structures: Data structures for parallel computing. ACM Trans. on Programming Languages and Systems, 11(4):598--632, 1989. Google ScholarDigital Library
- C. Augonnet, S. Thibault, R. Namyst, and M. Nijhuis. Exploiting the Cell/BE architecture with the StarPU unified runtime system. In Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS'09), pages 329--339, 2009. Google ScholarDigital Library
- A. Azevedo, C. Meenderinck, B. H. H. Juurlink, A. Terechko, J. Hoogerbrugge, M. Alvarez, and A. Ramírez. Parallel H.264 decoding on an embedded multicore processor. In Proc. of the 4th Intl. Conf. on High Performance and Embedded Architectures and Compilers (HiPEAC'09), Paphos, Cyprus, Jan. 2009. Google ScholarDigital Library
- P. M. Carpenter, D. Ródenas, X. Martorell, A. Ramırez, and E. Ayguadé. A streaming machine description and programming model. In Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS'07), pages 107--116, Samos, Greece, July 2007. Google ScholarDigital Library
- P. Caspi and M. Pouzet. Synchronous Kahn networks. In ACM Intl. Conf. on Functional programming (ICFP'96), pages 226--238, 1996. Google ScholarDigital Library
- A. Cohen, L. Mandel, F. Plateau, and M. Pouzet. Abstraction of clocks in synchronous data-flow systems. In 6th Asian Symp. on Programming Languages and Systems (APLAS 08), Bangalore, India, Dec. 2008. Google ScholarDigital Library
- I. Corp. Occam Programming Manual. Prentice Hall, 1984. Google ScholarDigital Library
- D. E. Culler and Arvind. Resource requirements of dataflow programs. In ISCA, pages 141--150, 1988. Google ScholarDigital Library
- J. B. Dennis and G. R. Gao. An efficient pipelined dataflow processor architecture. In Supercomputing (SC'88), pages 368--373, 1988. Google ScholarDigital Library
- H. M. et al. Acotes project: Advanced compiler technologies for embedded streaming. Intl. J. of Parallel Programming, 2010. Special issue on European HiPEAC network of excellence member's projects.Google Scholar
- F. L. Fessant and L. Maranget. Compiling join-patterns. Electr. Notes Theor. Comput. Sci., 16(3), 1998.Google Scholar
- C. Fournet and G. Gonthier. The reflexive chemical abstract machine and the join-calculus. In ACM Symp. on Principles of Programming Languages, pages 372--385, St. Petersburg Beach, Florida, Jan. 1996. ACM. Google ScholarDigital Library
- J. Giacomoni, T. Moseley, and M. Vachharajani. Fastforward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In ACM Symp. on Principles and practice of parallel programming (PPoPP'08), pages 43--52, Salt Lake City, Utah, 2008. Google ScholarDigital Library
- R. Gupta. Exploiting parallelism on a fine-grain MIMD architecture based upon channel queues. Intl. J. of Parallel Programming, 21(3):169--192, 1992. Google ScholarDigital Library
- W. Haid, L. Schor, K. Huang, I. Bacivarov, and L. Thiele. Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOs. In Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia'09), pages 35--44, Grenoble, France, Oct. 2009.Google ScholarCross Ref
- N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous dataflow programming language Lustre. Proc. of the IEEE, 79(9):1305--1320, Sept. 1991.Google ScholarCross Ref
- R. H. Halstead, Jr. Multilisp: a language for concurrent symbolic computation. ACM Trans. on Programming Languages and Systems, 7(4):501--538, 1985. Google ScholarDigital Library
- T. Henriksson and P. van der Wolf. TTL hardware interface: A high-level interface for streaming multiprocessor architectures. In Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia'06), pages 107--112, Seoul, Korea, Oct. 2006. Google ScholarDigital Library
- C. A. R. Hoare. Communicating Sequential Processes. Prentice-Hall, 1985. Google ScholarDigital Library
- G. Kahn. The semantics of a simple language for parallel programming. In J. L. Rosenfeld, editor, Information processing, pages 471--475, Stockholm, Sweden, Aug. 1974. North Holland, Amsterdam.Google Scholar
- C. Kim, J.-L. Gaudiot, and W. Proskurowski. Parallel computing with the sisal applicative language: Programmability and performance issues. Software, Practice and Experience, 26(9):1025--1051, 1996. Google ScholarDigital Library
- C. Kyriacou, P. Evripidou, and P. Trancoso. Data-driven multithreading using conventional microprocessors. IEEE Trans. on Parallel Distributed Systems, 17(10):1176--1188, 2006. Google ScholarDigital Library
- E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. on Computers, 36(1):24--25, 1987. Google ScholarDigital Library
- E. A. Lee and A. L. Sangiovanni-Vincentelli. A framework for comparing models of computation. IEEE Trans. on CAD of Integrated Circuits and Systems, 17(12):1217--1229, 1998. Google ScholarDigital Library
- K. H. R. M. Frigo, C. E. Leiserson. The implementation of the Cilk-5 multithreaded language. In ACM Symp. on Programming Language Design and Implementation (PLDI'98), pages 212--223, Montreal, Quebec, June 1998. Google ScholarDigital Library
- V. Marjanovic, J. Labarta, E. Ayguadé, and M. Valero. Effective communication and computation overlap with hybrid MPI/SMPSs. In PPOPP, 2010. Google ScholarDigital Library
- R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, i and ii. Inf. Comput., 100(1):1--40 and 41--77, 1992. Google ScholarDigital Library
- M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient deterministic multithreading in software. In The Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Washington, DC, Mar 2009. Google ScholarDigital Library
- G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In IEEE Intl. Symp. on Microarchitecture (MICRO'05), pages 105--118, 2005. Google ScholarDigital Library
- J. M. Pérez, P. Bellens, R. M. Badia, and J. Labarta. CellSs: Making it easier to program the cell broadband engine processor. IBM Journal of Research and Development, 51(5):593--604, 2007. Google ScholarDigital Library
- J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta. Hierarchical task-based programming with starss. Intl. J. on High Performance Computing Architecture, 23(3):284--299, 2009. Google ScholarDigital Library
- A. Pop and A. Cohen. A stream-comptuting extension to OpenMP. In Proc. of the 4th Intl. Conf. on High Performance and Embedded Architectures and Compilers (HiPEAC'11), Jan. 2011. Google ScholarDigital Library
- A. Pop, S. Pop, and J. Sjödin. Automatic streamization in GCC. In GCC Developer's Summit, Montreal, Quebec, June 2009.Google Scholar
- M. C. Rinard and M. S. Lam. The design, implementation, and evaluation of Jade. ACM Trans. on Programming Languages and Systems, 20(3):483--545, 1998. Google ScholarDigital Library
- M. Själander, A. Terechko, and M. Duranton. A look-ahead task management unit for embedded multi-core architectures. In Proc. of the 2008 11th EUROMICRO Conf. on Digital System Design Architectures, Parma, Italy, Sept. 2008. Google ScholarDigital Library
- K. Stavrou, M. Nikolaides, D. Pavlou, S. Arandi, P. Evripidou, and P. Trancoso. Tflux: A portable platform for data-driven multithreading on commodity multicore systems. In Intl. Conf. on Parallel Processing (ICPP'08), pages 25--34, Portland, Oregon, Sept. 2008. Google ScholarDigital Library
- S. Stuijk. Concurrency in computational networks. Master's thesis, Technische Universiteit Eindhoven (TU/e), Oct. 2002. # 446407.Google Scholar
- W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'10), Vienna, Austria, Sept. 2010. Google ScholarDigital Library
- W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A language for streaming applications. In Intl. Conf. on Compiler Construction, Grenoble, France, Apr. 2002. Google ScholarDigital Library
- I. Watson and J. R. Gurd. A practical data flow computer. IEEE Computer, 15(2):51--57, 1982. Google ScholarDigital Library
Index Terms
Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes
Recommendations
ERBIUM: a deterministic, concurrent intermediate representation for portable and scalable performance
CF '10: Proceedings of the 7th ACM international conference on Computing frontiersOptimizing compilers and runtime libraries do not shield programmers from the complexity of multi-core hardware; as a result the need for manual, target-specific optimizations increases with every processor generation. High-level languages are being ...
Virtual world consistency: A condition for STM systems (with a versatile protocol with invisible read operations)
The aim of a Software Transactional Memory (STM) is to discharge the programmers from the management of synchronization in multiprocess programs that access concurrent objects. To that end, an STM system provides the programmer with the concept of a ...
A versatile STM protocol with invisible read operations that satisfies the virtual world consistency condition
SIROCCO'09: Proceedings of the 16th international conference on Structural Information and Communication ComplexityThe aim of a Software Transactional Memory (STM) is to discharge the programmers from the management of synchronization in multiprocess programs that access concurrent objects. To that end, a STM system provides the programmer with the concept of a ...
Comments