skip to main content
10.1145/1926354.1926360acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
research-article

Simple optimizations for an applicative array language for graphics processors

Published:23 January 2011Publication History

ABSTRACT

Graphics processors (GPUs) are highly parallel devices that promise high performance, and they are now flexible enough to be used for general-purpose computing. A programming language based on implicitly data-parallel collective array operations can permit high-level, effective programming of GPUs. I describe three optimizations for such a language: automatic use of GPU shared memory cache, array fusion, and hoisting of nested parallel constructs. These optimizations are simple to implement because of the design of the language to which they are applied but can result in large run-time speedups.

References

  1. G. E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, 38(11):1526--1538, 1989. ISSN 0018-9340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. E. Blelloch. Programming parallel algorithms. Communications of the ACM, 39(3):85--97, 1996. ISSN 0001-0782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. E. Blelloch, J. C. Hardwick, S. Chatterjee, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. In PPOPP '93: Proceedings of the fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 102--111, New York, NY, USA, 1993. ACM. ISBN 0-89791-589-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. In SIGGRAPH '04: ACM SIGGRAPH 2004 Papers, pages 777--786, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: Compiling an embedded data parallel language. Technical Report UCB/EECS-2010-124, EECS Department, University of California, Berkeley, September 2010.Google ScholarGoogle Scholar
  6. J. Cheney and R. Hinze. First-class phantom types. Technical Report TR2003-1901, Cornell University, July 2003.Google ScholarGoogle Scholar
  7. D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In ICFP '07: Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, pages 315--326, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-815-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Edelkamp, D. Sulewski, and C. Yücel. Perfect hashing for state space exploration on the GPU. In R. I. Brafman, H. Geffner, J. Hoffmann, and H. A. Kautz, editors, Proceedings of the 29th International Conference on Automated Planning and Scheduling, ICAPS 2010, Toronto, Ontario, Canada, May 12-16, 2010, pages 57--64. AAAI Press, May 2010.Google ScholarGoogle Scholar
  9. C. Elliott, S. Finne, and O. de Moor. Compiling embedded languages. Journal of Functional Programming, 13(3):455--481, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Elsen, M. Houston, V. Vishal, E. Darve, P. Hanrahan, and V. Pande. N-body simulation on GPUs. In SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, page 188, New York, NY, USA, 2006. ACM. ISBN 0-7695-2700-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. In FPCA '93: Proceedings of the Conference on Functional Programming Languages and Computer Architecture, pages 223--232, New York, NY, USA, 1993. ACM. ISBN 0-89791-595-X. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Harris. Optimizing parallel reduction in CUDA. PDF, 2008. Provided in the documentation of the CUDA 3.2 SDK.Google ScholarGoogle Scholar
  13. K. E. Iverson. A programming language. In AIEE-IRE '62 (Spring): Proceedings of the May 1-3, 1962, spring joint computer conference, pages 345--351, New York, NY, USA, 1962. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Johnsson. Lambda lifting: transforming programs to recursive equations. In Proceedings of a Conference on Functional Programming Languages and Computer Architecture, pages 190--203, New York, NY, USA, 1985. Springer-Verlag New York, Inc. ISBN 3-387-15975-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Keller, M. M. Chakravarty, R. Leschinskiy, S. P. Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010, pages 261--272, New York, NY, USA, September 2010. ACM. ISBN 978-1-60558-794-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathematical Software, 5(3):308--323, 1979. ISSN 0098-3500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Lee, M. M. T. Chakravarty, V. Grover, and G. Keller. GPU kernels as data-parallel array computations in Haskell. Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods, 2009.Google ScholarGoogle Scholar
  18. G. Mainland and G. Morrisett. Nikola: Embedding compiled GPU functions in Haskell. In Proceedings of the third ACM Haskell symposium on Haskell, pages 67--78, New York, NY, USA, September 2010. ACM. ISBN 978-1-4503-0252-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Manolios and Y. Zhang. Implementing survey propagation on graphics processing units. In A. Biere and C. P. Gomes, editors, Theory and Applications of Satisfiability Testing - SAT 2006, 9th International Conference, Seattle, WA, USA, August 12-15, 2006, Proceedings, volume 4121 of Lecture Notes in Computer Science, pages 311--324. Springer, 2006. ISBN 3-540-37206-7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. D. McCool, S. Du Toit, T. Popa, B. Chan, and K. Moule. Shader algebra. In SIGGRAPH '04: ACM SIGGRAPH 2004 Papers, pages 787--795, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. NVIDIA. NVIDIA CUDA Programming Guide Version 3.2. NVIDIA, 2010.Google ScholarGoogle Scholar
  22. F. Pfenning and C. Elliott. Higher-order abstract syntax. ACM SIGPLAN Notices, 23(7):199--208, July 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pages 97--106, Aire-la-Ville, Switzerland, 2007. Eurographics Association. ISBN 978-1-59593-625-7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Silberstein, A. Schuster, D. Geiger, A. Patney, and J. D. Owens. Efficient computation of sum-products on GPUs through software-managed cache. In ICS '08: Proceedings of the 22nd Annual International Conference on Supercomputing, pages 309--318, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-158-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. L. Veldhuizen. Arrays in Blitz++. In D. Caromel, R. Oldehoeft, and M. Tholburn, editors, ISCOPE '98: Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environ-ments, volume 1505 of Lecture Notes in Computer Science, pages 223--230, London, UK, 1998. Springer-Verlag. ISBN 3-540-65387-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Xi, C. Chen, and G. Chen. Guarded recursive datatype constructors. In POPL '03: Proceedings of the 30th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 224--235, New York, NY, USA, 2003. ACM. ISBN 1-58113-628-5. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Simple optimizations for an applicative array language for graphics processors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DAMP '11: Proceedings of the sixth workshop on Declarative aspects of multicore programming
        January 2011
        72 pages
        ISBN:9781450304863
        DOI:10.1145/1926354
        • General Chair:
        • Manuel Carro,
        • Program Chair:
        • John Reppy

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 January 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Upcoming Conference

        POPL '25

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader