skip to main content
10.1145/1810085.1810122acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Handling task dependencies under strided and aliased references

Published:02 June 2010Publication History

ABSTRACT

The emergence of multicore processors has increased the need for simple parallel programming models usable by nonexperts. The ability to specify subparts of a bigger data structure is an important trait of High Productivity Programming Languages. Such a concept can also be applied to dependency-aware task-parallel programming models. In those paradigms, tasks may have data dependencies, and those are used for scheduling them in parallel.

However, calculating dependencies between subparts of bigger data structures is challenging. Accessed data may be strided, and can fully or partially overlap the accesses of other tasks. Techniques that are too approximate may produce too many extra dependencies and limit parallelism. Techniques that are too precise may be impractical in terms of time and space.

We present the abstractions, data structures and algorithms to calculate dependencies between tasks with strided and possibly different memory access patterns. Our technique is performed at run time from a description of the inputs and outputs of each task and is not affected by pointer arithmetic nor reshaping. We demonstrate how it can be applied to increase programming productivity. We also demonstrate that scalability is comparable to other solutions and in some cases higher due to better parallelism extraction.

References

  1. E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. G. Burke and R. K. Cytron. Interprocedural dependence analysis and parallelization. ACM SIGPLAN Notices, 39(4):139--154, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Chamberlain, D. Callahan, and H. Zima. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications, 21(3):291--312, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In OOPSLA '05: Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 519--538, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, February 2005.Google ScholarGoogle ScholarCross RefCross Ref
  6. P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Houzeaux, B. Eguzkitza, and M. Vázquez. A variational multiscale model for the advection-diffusion-reaction equation. Communications in Numerical Methods in Engineering, 25(7):787--809, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  8. International Organization for Standardization and International Electrotechnical Commission. International standard; ISO/IEC 9899:1999, Geneva, Switzerland, 1999.Google ScholarGoogle Scholar
  9. Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM Transactions on Programming Languages and Systems (TOPLAS), 24(1):65--109, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. M. Perez, R. M. Badia, and J. Labarta. A dependency-aware task-based programming environment for multi-core architectures. In C. Productions, editor, Proceedings of the 2008 IEEE International Conference on Cluster Computing, pages 142--151, September 2008.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. M. Perez, P. Bellens, R. M. Badia, and J. Labarta. CellSs: Making it easier to program the Cell Broadband Engine processor. IBM Journal of Research and Development, 51(5):593--604, September 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. In ICS '02: Proceedings of the 16th international conference on Supercomputing, pages 274--284, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Salapura, M. Blumrich, and A. Gara. Improving the accuracy of snoop filtering using stream registers. In MEDEA '07: Proceedings of the 2007 workshop on Memory performance, pages 25--32, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Triolet, F. Irigoin, and P. Feautrier. Direct parallelization of call statements. In SIGPLAN '86: Proceedings of the 1986 SIGPLAN symposium on Compiler construction, pages 176--185, New York, NY, USA, 1986. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. Concurrency: Practice and Experience, 10(11--13):825--836, 1998.Google ScholarGoogle Scholar

Index Terms

  1. Handling task dependencies under strided and aliased references

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing
            June 2010
            365 pages
            ISBN:9781450300186
            DOI:10.1145/1810085

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 2 June 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate584of2,055submissions,28%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader