ABSTRACT
Recent studies have shown promising performance benefits of pipelined stencil applications. An important factor for the computing efficiency of such pipelines is the granularity of a task. We presents GOPipe, the first granularity-oblivious programming framework for efficient pipelined stencil executions. With GOPipe, programmers no longer need to specify the appropriate task granularity. GOPipe automatically finds it, and schedules tasks of that granularity while observing all inter-task and inter-stage data dependencies. In our experiments on four real-life applications, GOPipe outperforms the state-of-the-art by up to 4.57× with a much better programming productivity.
- Markus Steinberger, Michael Kenzel, Pedro Boechat, Bernhard Kerbl, Mark Dokter, and Dieter Schmalstieg. 2014. Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU. ACM Transactions on Graphics 33, 6 (2014), 1--11. Google ScholarDigital Library
- Zhen Zheng, Chanyoung Oh, Jidong Zhai, Xipeng Shen, Youngmin Yi, and Wenguang Chen. 2017. VersaPipe: A Versatile Programming Framework for Pipelined Computing on GPU. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50). Cambridge, MA, USA, 587--599. Google ScholarDigital Library
Index Terms
- GOPipe: a granularity-oblivious programming framework for pipelined stencil executions on GPU
Recommendations
GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesRecent studies have shown promising performance benefits when multiple stages of a pipelined stencil application are mapped to different parts of a GPU to run concurrently. An important factor for the computing efficiency of such pipelines is the ...
Juggler: a dependence-aware task-based execution framework for GPUs
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingScientific applications with single instruction, multiple data (SIMD) computations show considerable performance improvements when run on today's graphics processing units (GPUs). However, the existence of data dependences across thread blocks may ...
Juggler: a dependence-aware task-based execution framework for GPUs
PPoPP '18Scientific applications with single instruction, multiple data (SIMD) computations show considerable performance improvements when run on today's graphics processing units (GPUs). However, the existence of data dependences across thread blocks may ...
Comments