research-article

Handling task dependencies under strided and aliased references

Authors:
Josep M. Perez

Universitat Politècnica de Catalunya (UPC-DAC)

Universitat Politècnica de Catalunya (UPC-DAC)
View Profile

,
Rosa M. Badia

Spanish National Research Council (CSIC - IIIA)

Spanish National Research Council (CSIC - IIIA)
View Profile

,
Jesus Labarta

Universitat Politècnica de Catalunya (UPC-DAC)

Universitat Politècnica de Catalunya (UPC-DAC)
View Profile

ICS '10: Proceedings of the 24th ACM International Conference on SupercomputingJune 2010Pages 263–274https://doi.org/10.1145/1810085.1810122

Published:02 June 2010Publication History

ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing

Pages 263–274

ABSTRACT

The emergence of multicore processors has increased the need for simple parallel programming models usable by nonexperts. The ability to specify subparts of a bigger data structure is an important trait of High Productivity Programming Languages. Such a concept can also be applied to dependency-aware task-parallel programming models. In those paradigms, tasks may have data dependencies, and those are used for scheduling them in parallel.

However, calculating dependencies between subparts of bigger data structures is challenging. Accessed data may be strided, and can fully or partially overlap the accesses of other tasks. Techniques that are too approximate may produce too many extra dependencies and limit parallelism. Techniques that are too precise may be impractical in terms of time and space.

We present the abstractions, data structures and algorithms to calculate dependencies between tasks with strided and possibly different memory access patterns. Our technique is performed at run time from a description of the inputs and outputs of each task and is not affected by pointer arithmetic nor reshaping. We demonstrate how it can be applied to increase programming productivity. We also demonstrate that scalability is comparable to other solutions and in some cases higher due to better parallelism extraction.

References

E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999. Google ScholarDigital Library
M. G. Burke and R. K. Cytron. Interprocedural dependence analysis and parallelization. ACM SIGPLAN Notices, 39(4):139--154, 2004. Google ScholarDigital Library
B. Chamberlain, D. Callahan, and H. Zima. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications, 21(3):291--312, 2007. Google ScholarDigital Library
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In OOPSLA '05: Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 519--538, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, February 2005.Google ScholarCross Ref
P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, 1991. Google ScholarDigital Library
G. Houzeaux, B. Eguzkitza, and M. Vázquez. A variational multiscale model for the advection-diffusion-reaction equation. Communications in Numerical Methods in Engineering, 25(7):787--809, 2009.Google ScholarCross Ref
International Organization for Standardization and International Electrotechnical Commission. International standard; ISO/IEC 9899:1999, Geneva, Switzerland, 1999.Google Scholar
Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM Transactions on Programming Languages and Systems (TOPLAS), 24(1):65--109, 2002. Google ScholarDigital Library
J. M. Perez, R. M. Badia, and J. Labarta. A dependency-aware task-based programming environment for multi-core architectures. In C. Productions, editor, Proceedings of the 2008 IEEE International Conference on Cluster Computing, pages 142--151, September 2008.Google ScholarCross Ref
J. M. Perez, P. Bellens, R. M. Badia, and J. Labarta. CellSs: Making it easier to program the Cell Broadband Engine processor. IBM Journal of Research and Development, 51(5):593--604, September 2007. Google ScholarDigital Library
S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. In ICS '02: Proceedings of the 16th international conference on Supercomputing, pages 274--284, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
V. Salapura, M. Blumrich, and A. Gara. Improving the accuracy of snoop filtering using stream registers. In MEDEA '07: Proceedings of the 2007 workshop on Memory performance, pages 25--32, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
R. Triolet, F. Irigoin, and P. Feautrier. Direct parallelization of call statements. In SIGPLAN '86: Proceedings of the 1986 SIGPLAN symposium on Compiler construction, pages 176--185, New York, NY, USA, 1986. ACM. Google ScholarDigital Library
K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. Concurrency: Practice and Experience, 10(11--13):825--836, 1998.Google Scholar

Index Terms

Handling task dependencies under strided and aliased references

Recommendations

Automatic annotation of tasks in structured code
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

This paper describes the design and implementation of a suit of static analyses and code generation techniques to annotate programs with OpenMP pragmas for task parallelism. These techniques approximate the ranges covered by memory regions, bound ...
Read More
Taskminer: automatic identification of tasks
SBLP '18: Proceedings of the XXII Brazilian Symposium on Programming Languages

This paper presents TaskMiner, a tool that automatically finds task parallelism in C code. TaskMiner solves classic problems of irregular parallelism, such as finding the memory ranges accessed by tasks, removing spurious static dependencies, detecting ...
Read More
Task-level analysis for a language with async/finish parallelism
LCTES '11: Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

The task level of a program is the maximum number of tasks that can be available (i.e., not finished nor suspended) simultaneously during its execution for any input data. Static knowledge of the task level is of utmost importance for understanding and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing
June 2010
365 pages
ISBN:9781450300186
DOI:10.1145/1810085
General Chair:
Taisuke Boku
University of Tsukuba
,
Program Chairs:
Hiroshi Nakashima
Kyoto University
,
Avi Mendelson
Microsoft
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 June 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dependencies
discontiguous data
domains
parallelism
region tree
regions
tasks
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 425
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Handling task dependencies under strided and aliased references

ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic annotation of tasks in structured code

Taskminer: automatic identification of tasks

Task-level analysis for a language with async/finish parallelism