skip to main content
10.1145/3388333.3388654acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiwoclConference Proceedingsconference-collections
research-article

Automated OpenCL GPU kernel fusion for Stan Math

Published:27 April 2020Publication History

ABSTRACT

We developed an OpenCL GPU kernel fusion library for the Stan software for Bayesian statistics. The library automatically combines kernels, optimizes computation, and is simple to use. The practical utility of the library is that it speeds up the development of new GPU kernels while keeping the performance of automatically combined kernels comparable to hand crafted kernels. We demonstrate this with experiments on basic operations and a linear regression model likelihood.

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265--283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Paul-Christian Bürkner et al. 2017. brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software 80, 1 (2017), 1--28.Google ScholarGoogle ScholarCross RefCross Ref
  3. Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A probabilistic programming language. Journal of statistical software 76, 1 (2017).Google ScholarGoogle ScholarCross RefCross Ref
  4. Jiří Filipovič and Siegfried Benkner. 2015. OpenCL kernel fusion for GPU, Xeon Phi and CPU. In 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 98--105.Google ScholarGoogle Scholar
  5. Jiří Filipovič, Matúš Madzin, Jan Fousek, and Luděk Matyska. 2015. Optimizing CUDA code by kernel fusion: application on BLAS. The Journal of Supercomputing 71, 10 (2015), 3934--3957.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jonah Gabry and Ben Goodrich. 2016. rstanarm: Bayesian Applied Regression Modeling via Stan. R package version 2, 1 (2016).Google ScholarGoogle Scholar
  7. Gaël Guennebaud and Benoît Jacob and others. 2010. Eigen v3. http://eigen.tuxfamily.org.Google ScholarGoogle Scholar
  8. Karl Rupp, Philippe Tillet, Florian Rudolf, Josef Weinbub, Andreas Morhammer, Tibor Grasser, Ansgar Jungel, and Siegfried Selberherr. 2016. ViennaCL---linear algebra library for multi-and many-core architectures. SIAM Journal on Scientific Computing 38, 5 (2016), S412-S439.Google ScholarGoogle Scholar
  9. Sean J Taylor and Benjamin Letham. 2018. Forecasting at Scale. The American Statistician 72, 1 (2018), 37--45.Google ScholarGoogle ScholarCross RefCross Ref
  10. Rok Češnovar, Steve Bronder, Davor Sluga, Jure Demšar, Tadej Ciglarič, Sean Talts, and Erik Štrumbelj. 2019. GPU-based Parallel Computation Support for Stan. CoRR abs/1907.01063 (2019). arXiv:1907.01063 http://arxiv.org/abs/1907.01063Google ScholarGoogle Scholar
  11. Todd Veldhuizen. 1995. Expression templates. C++ Report 7, 5 (1995), 26--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, CJ Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake Vand erPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1. 0 Contributors. 2019. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python. arXiv e-prints, Article arXiv:1907.10121 (Jul 2019), arXiv:1907.10121 pages. arXiv:cs.MS/1907.10121Google ScholarGoogle Scholar

Index Terms

  1. Automated OpenCL GPU kernel fusion for Stan Math

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          IWOCL '20: Proceedings of the International Workshop on OpenCL
          April 2020
          104 pages
          ISBN:9781450375313
          DOI:10.1145/3388333

          Copyright © 2020 ACM

          © 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 April 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          IWOCL '20 Paper Acceptance Rate21of30submissions,70%Overall Acceptance Rate84of152submissions,55%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader