skip to main content
10.1145/3293883.3297859acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
poster

Optimizing GPU programs by register demotion: poster

Published:16 February 2019Publication History

ABSTRACT

GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage of on-chip resources. If the resource demand cannot be met, GPUs will reduce the number of concurrent threads, impacting the program performance. We have observed that registers are the occupancy limiters while shared metmory tends to be underused. The de facto approach spills excessive registers to the out-of-chip memory, ignoring the shared memory and leaving the on-chip resources underutilized. To mitigate the register demand, our work presents a novel compiler technique, called register demotion, that allows data in the register to be placed into the underutilized shared memory by transforming the GPU assembly code (SASS). Register demotion achieves up to 18% speedup over the nvcc compiler, with a geometric mean of 7%.

References

  1. Shuai Che, Jeremy W. Sheaffer, Michael Boyer, Lukasz G. Szafaryn, Liang Wang, and Kevin Skadron. 2010. A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10) (IISWC '10). IEEE Computer Society, Washington, DC, USA, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S. Meredith, Philip C. Roth, Kyle Spafford, Vinod Tipparaju, and Jeffrey S. Vetter. 2010. The Scalable Heterogeneous Computing (SHOC) Benchmark Suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3). ACM, New York, NY, USA, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ari B. Hayes and Eddy Z. Zhang. 2014. Unified On-chip Memory Allocation for SIMT Architecture. In Proceedings of the 28th ACM International Conference on Supercomputing (ICS '14). ACM, New York, NY, USA, 293--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jianqiao Liu, Nikhil Hegde, and Milind Kulkarni. 2016. Hybrid CPU-GPU scheduling and execution of tree traversals. In Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, Istanbul, Turkey, June 1-3, 2016. 2:1--2:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. NVIDIA. 2017. CUDA C Best Practices Guide. http://docs.nvidia.com/cuda/cuda-c-best-practices-guide. (2017). {Online; accessed 2-April-2017}.Google ScholarGoogle Scholar
  6. NVIDIA. 2017. CUDA Toolkit Documentation - CUDA Samples. http://docs.nvidia.com/cuda/cuda-samples. (2017). {Online; accessed 1-April-2017}.Google ScholarGoogle Scholar
  7. Diogo Nunes Sampaio, Elie Gedeon, Fernando Magno Quintão Pereira, and Sylvain Collange. 2012. Spill Code Placement for SIMD Machines. Springer Berlin Heidelberg, Berlin, Heidelberg, 12--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xiaolong Xie, Yun Liang, Xiuhong Li, Yudong Wu, Guangyu Sun, Tao Wang, and Dongrui Fan. 2015. Enabling Coordinated Register Allocation and Thread-level Parallelism Optimization for GPUs. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 395--406. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing GPU programs by register demotion: poster

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming
      February 2019
      472 pages
      ISBN:9781450362252
      DOI:10.1145/3293883

      Copyright © 2019 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 February 2019

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      PPoPP '19 Paper Acceptance Rate29of152submissions,19%Overall Acceptance Rate230of1,014submissions,23%
    • Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader