skip to main content
10.1145/2884045.2884053acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

GPUpIO: the case for I/O-driven preemption on GPUs

Published:12 March 2016Publication History

ABSTRACT

As GPUs become general purpose, they are outgrowing the coprocessor model and require convenient I/O abstractions such as files and network sockets. Recent studies have shown the benefits of native GPU I/O layers, in terms of both programmability and performance. However, due to lack of hardware support, the GPU threads performing I/O calls are forced to busy-wait for the completion of I/O operations, resulting in underutilized hardware, higher power consumption, and reduced system throughput.

We argue that I/O-driven preemption improves the performance of existing solutions, despite many challenging system characteristics such as a large kernel state. We analyze the benefits of adding preemption support using a simple system performance model, and, encouraged by the results, explore the design of a software-based preemption mechanism for GPUs. In our prototype, GPUpIO, we implement a source-to-source compiler for state checkpoint and restoration, and a runtime library for scheduling preempted thread-blocks, which together enable I/O-driven preemption for GPUs.

We evaluate our prototype across a variety of system parameters and workloads to determine when preemption is worthwhile. We show that in some workloads the I/O-driven preemption approach may indeed double the effective system throughput by completely hiding the I/O latency behind computations. However, we also observe that the software-only solution is currently limited, not only due to its overheads, but also because it does not have sufficient control of the hardware scheduler queue and therefore may lead to starvation of I/O kernels. We then discuss a new hardware feature that, if added, may render a general I/O-driven preemption mechanism on GPUs practical.

References

  1. S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC '09, pages 44--54, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments. In Proceedings of the USENIX Annual Technical Conference, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Kim, S. Huh, X. Zhang, Y. Hu, A. Wated, E. Witchel, and M. Silberstein. GPUnet: Networking Abstractions for GPU Programs. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 201--216, Broomfield, CO, Oct. 2014. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. B. Kirk and W.-m. W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Nukada, H. Takizawa, and S. Matsuoka. NVCR: A Transparent Checkpoint-Restart Library for NVIDIA CUDA. In Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, pages 104--113, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. NVIDIA Corporation. NVIDIA CUDA Compute Unified Device Architecture Programming Guide. NVIDIA Corporation, 2007.Google ScholarGoogle Scholar
  7. J. J. K. Park, Y. Park, and S. Mahlke. Chimera: Collaborative Preemption for Multitasking on a Shared GPU. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 593--606, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Reyes, I. Lopez, J. Fumero, and F. de Sande. accULL: An User-directed Approach to Heterogeneous Programming. In Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on, pages 654--661, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating System Abstractions to Manage GPUs As Compute Devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 233--248, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly. Dandelion: A Compiler and Runtime for Heterogeneous Systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 49--68, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Silberstein, B. Ford, I. Keidar, and E. Witchel. GPUfs: integrating file systems with GPUs. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. E. Stone, D. Gohara, and G. Shi. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. IEEE Des. Test, 12(3):66--73, May 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Takizawa, K. Sato, K. Komatsu, and H. Kobayashi. CheCUDA: A Checkpoint/Restart Tool for CUDA Applications. In Parallel and Distributed Computing, Applications and Technologies, 2009 International Conference on, pages 408--413, Dec 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GPUpIO: the case for I/O-driven preemption on GPUs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit
        March 2016
        107 pages
        ISBN:9781450341950
        DOI:10.1145/2884045

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 March 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        GPGPU '16 Paper Acceptance Rate9of23submissions,39%Overall Acceptance Rate57of129submissions,44%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader