ABSTRACT
The emergence of multicore processors has increased the need for simple parallel programming models usable by nonexperts. The ability to specify subparts of a bigger data structure is an important trait of High Productivity Programming Languages. Such a concept can also be applied to dependency-aware task-parallel programming models. In those paradigms, tasks may have data dependencies, and those are used for scheduling them in parallel.
However, calculating dependencies between subparts of bigger data structures is challenging. Accessed data may be strided, and can fully or partially overlap the accesses of other tasks. Techniques that are too approximate may produce too many extra dependencies and limit parallelism. Techniques that are too precise may be impractical in terms of time and space.
We present the abstractions, data structures and algorithms to calculate dependencies between tasks with strided and possibly different memory access patterns. Our technique is performed at run time from a description of the inputs and outputs of each task and is not affected by pointer arithmetic nor reshaping. We demonstrate how it can be applied to increase programming productivity. We also demonstrate that scalability is comparable to other solutions and in some cases higher due to better parallelism extraction.
- E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999. Google ScholarDigital Library
- M. G. Burke and R. K. Cytron. Interprocedural dependence analysis and parallelization. ACM SIGPLAN Notices, 39(4):139--154, 2004. Google ScholarDigital Library
- B. Chamberlain, D. Callahan, and H. Zima. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications, 21(3):291--312, 2007. Google ScholarDigital Library
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In OOPSLA '05: Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 519--538, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, February 2005.Google ScholarCross Ref
- P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, 1991. Google ScholarDigital Library
- G. Houzeaux, B. Eguzkitza, and M. Vázquez. A variational multiscale model for the advection-diffusion-reaction equation. Communications in Numerical Methods in Engineering, 25(7):787--809, 2009.Google ScholarCross Ref
- International Organization for Standardization and International Electrotechnical Commission. International standard; ISO/IEC 9899:1999, Geneva, Switzerland, 1999.Google Scholar
- Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM Transactions on Programming Languages and Systems (TOPLAS), 24(1):65--109, 2002. Google ScholarDigital Library
- J. M. Perez, R. M. Badia, and J. Labarta. A dependency-aware task-based programming environment for multi-core architectures. In C. Productions, editor, Proceedings of the 2008 IEEE International Conference on Cluster Computing, pages 142--151, September 2008.Google ScholarCross Ref
- J. M. Perez, P. Bellens, R. M. Badia, and J. Labarta. CellSs: Making it easier to program the Cell Broadband Engine processor. IBM Journal of Research and Development, 51(5):593--604, September 2007. Google ScholarDigital Library
- S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. In ICS '02: Proceedings of the 16th international conference on Supercomputing, pages 274--284, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- V. Salapura, M. Blumrich, and A. Gara. Improving the accuracy of snoop filtering using stream registers. In MEDEA '07: Proceedings of the 2007 workshop on Memory performance, pages 25--32, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- R. Triolet, F. Irigoin, and P. Feautrier. Direct parallelization of call statements. In SIGPLAN '86: Proceedings of the 1986 SIGPLAN symposium on Compiler construction, pages 176--185, New York, NY, USA, 1986. ACM. Google ScholarDigital Library
- K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. Concurrency: Practice and Experience, 10(11--13):825--836, 1998.Google Scholar
Index Terms
- Handling task dependencies under strided and aliased references
Recommendations
Automatic annotation of tasks in structured code
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation TechniquesThis paper describes the design and implementation of a suit of static analyses and code generation techniques to annotate programs with OpenMP pragmas for task parallelism. These techniques approximate the ranges covered by memory regions, bound ...
Taskminer: automatic identification of tasks
SBLP '18: Proceedings of the XXII Brazilian Symposium on Programming LanguagesThis paper presents TaskMiner, a tool that automatically finds task parallelism in C code. TaskMiner solves classic problems of irregular parallelism, such as finding the memory ranges accessed by tasks, removing spurious static dependencies, detecting ...
Task-level analysis for a language with async/finish parallelism
LCTES '11: Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsThe task level of a program is the maximum number of tasks that can be available (i.e., not finished nor suspended) simultaneously during its execution for any input data. Static knowledge of the task level is of utmost importance for understanding and ...
Comments