research-article

Simple optimizations for an applicative array language for graphics processors

Author:
Bradford Larsen

Tufts University, Medford, MA, USA

Tufts University, Medford, MA, USA
View Profile

DAMP '11: Proceedings of the sixth workshop on Declarative aspects of multicore programmingJanuary 2011Pages 25–34https://doi.org/10.1145/1926354.1926360

Published:23 January 2011Publication History

DAMP '11: Proceedings of the sixth workshop on Declarative aspects of multicore programming

Pages 25–34

ABSTRACT

Graphics processors (GPUs) are highly parallel devices that promise high performance, and they are now flexible enough to be used for general-purpose computing. A programming language based on implicitly data-parallel collective array operations can permit high-level, effective programming of GPUs. I describe three optimizations for such a language: automatic use of GPU shared memory cache, array fusion, and hoisting of nested parallel constructs. These optimizations are simple to implement because of the design of the language to which they are applied but can result in large run-time speedups.

References

G. E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, 38(11):1526--1538, 1989. ISSN 0018-9340. Google ScholarDigital Library
G. E. Blelloch. Programming parallel algorithms. Communications of the ACM, 39(3):85--97, 1996. ISSN 0001-0782. Google ScholarDigital Library
G. E. Blelloch, J. C. Hardwick, S. Chatterjee, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. In PPOPP '93: Proceedings of the fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 102--111, New York, NY, USA, 1993. ACM. ISBN 0-89791-589-5. Google ScholarDigital Library
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. In SIGGRAPH '04: ACM SIGGRAPH 2004 Papers, pages 777--786, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: Compiling an embedded data parallel language. Technical Report UCB/EECS-2010-124, EECS Department, University of California, Berkeley, September 2010.Google Scholar
J. Cheney and R. Hinze. First-class phantom types. Technical Report TR2003-1901, Cornell University, July 2003.Google Scholar
D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In ICFP '07: Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, pages 315--326, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-815-2. Google ScholarDigital Library
S. Edelkamp, D. Sulewski, and C. Yücel. Perfect hashing for state space exploration on the GPU. In R. I. Brafman, H. Geffner, J. Hoffmann, and H. A. Kautz, editors, Proceedings of the 29th International Conference on Automated Planning and Scheduling, ICAPS 2010, Toronto, Ontario, Canada, May 12-16, 2010, pages 57--64. AAAI Press, May 2010.Google Scholar
C. Elliott, S. Finne, and O. de Moor. Compiling embedded languages. Journal of Functional Programming, 13(3):455--481, May 2003. Google ScholarDigital Library
E. Elsen, M. Houston, V. Vishal, E. Darve, P. Hanrahan, and V. Pande. N-body simulation on GPUs. In SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, page 188, New York, NY, USA, 2006. ACM. ISBN 0-7695-2700-0. Google ScholarDigital Library
A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. In FPCA '93: Proceedings of the Conference on Functional Programming Languages and Computer Architecture, pages 223--232, New York, NY, USA, 1993. ACM. ISBN 0-89791-595-X. Google ScholarDigital Library
M. Harris. Optimizing parallel reduction in CUDA. PDF, 2008. Provided in the documentation of the CUDA 3.2 SDK.Google Scholar
K. E. Iverson. A programming language. In AIEE-IRE '62 (Spring): Proceedings of the May 1-3, 1962, spring joint computer conference, pages 345--351, New York, NY, USA, 1962. ACM. Google ScholarDigital Library
T. Johnsson. Lambda lifting: transforming programs to recursive equations. In Proceedings of a Conference on Functional Programming Languages and Computer Architecture, pages 190--203, New York, NY, USA, 1985. Springer-Verlag New York, Inc. ISBN 3-387-15975-4. Google ScholarDigital Library
G. Keller, M. M. Chakravarty, R. Leschinskiy, S. P. Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010, pages 261--272, New York, NY, USA, September 2010. ACM. ISBN 978-1-60558-794-3. Google ScholarDigital Library
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathematical Software, 5(3):308--323, 1979. ISSN 0098-3500. Google ScholarDigital Library
S. Lee, M. M. T. Chakravarty, V. Grover, and G. Keller. GPU kernels as data-parallel array computations in Haskell. Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods, 2009.Google Scholar
G. Mainland and G. Morrisett. Nikola: Embedding compiled GPU functions in Haskell. In Proceedings of the third ACM Haskell symposium on Haskell, pages 67--78, New York, NY, USA, September 2010. ACM. ISBN 978-1-4503-0252-4. Google ScholarDigital Library
P. Manolios and Y. Zhang. Implementing survey propagation on graphics processing units. In A. Biere and C. P. Gomes, editors, Theory and Applications of Satisfiability Testing - SAT 2006, 9th International Conference, Seattle, WA, USA, August 12-15, 2006, Proceedings, volume 4121 of Lecture Notes in Computer Science, pages 311--324. Springer, 2006. ISBN 3-540-37206-7 Google ScholarDigital Library
M. D. McCool, S. Du Toit, T. Popa, B. Chan, and K. Moule. Shader algebra. In SIGGRAPH '04: ACM SIGGRAPH 2004 Papers, pages 787--795, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
NVIDIA. NVIDIA CUDA Programming Guide Version 3.2. NVIDIA, 2010.Google Scholar
F. Pfenning and C. Elliott. Higher-order abstract syntax. ACM SIGPLAN Notices, 23(7):199--208, July 1988. Google ScholarDigital Library
S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pages 97--106, Aire-la-Ville, Switzerland, 2007. Eurographics Association. ISBN 978-1-59593-625-7. Google ScholarDigital Library
M. Silberstein, A. Schuster, D. Geiger, A. Patney, and J. D. Owens. Efficient computation of sum-products on GPUs through software-managed cache. In ICS '08: Proceedings of the 22nd Annual International Conference on Supercomputing, pages 309--318, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-158-3. Google ScholarDigital Library
T. L. Veldhuizen. Arrays in Blitz++. In D. Caromel, R. Oldehoeft, and M. Tholburn, editors, ISCOPE '98: Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environ-ments, volume 1505 of Lecture Notes in Computer Science, pages 223--230, London, UK, 1998. Springer-Verlag. ISBN 3-540-65387-2. Google ScholarDigital Library
H. Xi, C. Chen, and G. Chen. Guarded recursive datatype constructors. In POPL '03: Proceedings of the 30th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 224--235, New York, NY, USA, 2003. ACM. ISBN 1-58113-628-5. Google ScholarDigital Library

Index Terms

Simple optimizations for an applicative array language for graphics processors
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language types
        Functional languages

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Read More
Modular array-based GPU computing in a dynamically-typed language
ARRAY 2017: Proceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

Nowadays, GPU accelerators are widely used in areas with large data-parallel computations such as scientific computations or neural networks. Programmers can either write code in low-level CUDA/OpenCL code or use a GPU extension for a high-level ...
Read More
A MultiGPU Performance-Portable Solution for Array Programming Based on Kokkos
ARRAY 2023: Proceedings of the 9th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming

Today, multiGPU nodes are widely used in high-performance computing and data centers. However, current programming models do not provide simple, transparent, and portable support for automatically targeting multiple GPUs within a node on application ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DAMP '11: Proceedings of the sixth workshop on Declarative aspects of multicore programming
January 2011
72 pages
ISBN:9781450304863
DOI:10.1145/1926354
General Chair:
Manuel Carro
Universidad Politécnica de Madrid, Spain
,
Program Chair:
John Reppy
University of Chicago, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 January 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
array programming
cuda
gpgpu
Qualifiers
- research-article
Conference
Upcoming Conference
POPL '25

Sponsor:

sigplan

The 52nd Annual ACM SIGPLAN Symposium on Principles of Programming Languages

January 19 - 25, 2025

Denver , CO , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 208
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Simple optimizations for an applicative array language for graphics processors

DAMP '11: Proceedings of the sixth workshop on Declarative aspects of multicore programming

ABSTRACT

References

Cited By

Index Terms

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Modular array-based GPU computing in a dynamically-typed language

A MultiGPU Performance-Portable Solution for Array Programming Based on Kokkos

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Simple optimizations for an applicative array language for graphics processors

DAMP '11: Proceedings of the sixth workshop on Declarative aspects of multicore programming

ABSTRACT

References

Cited By

Index Terms

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Modular array-based GPU computing in a dynamically-typed language

A MultiGPU Performance-Portable Solution for Array Programming Based on Kokkos

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media