research-article

Compiling Python to a hybrid execution environment

Authors:

José Nelson AmaralAuthors Info & Claims

GPGPU-3: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units

Pages 19 - 30

https://doi.org/10.1145/1735688.1735695

Published: 14 March 2010 Publication History

Abstract

A new compilation framework enables the execution of numerical-intensive applications, written in Python, on a hybrid execution environment formed by a CPU and a GPU. This compiler automatically computes the set of memory locations that need to be transferred to the GPU, and produces the correct mapping between the CPU and the GPU address spaces. Thus, the programming model implements a virtual shared address space. This framework is implemented as a combination of unPython, an ahead-of-time compiler from Python/NumPy to the C programming language, and jit4GPU, a just-in-time compiler from C to the AMD CAL interface. Experimental evaluation demonstrates that for some benchmarks the generated GPU code is 50 times faster than generated OpenMP code. The GPU performance also compares favorably with optimized CPU BLAS code for single-precision computations in most cases.

References

[1]

Pypy project (2009-09-30). http://codespeak.net/pypy/dist/pypy/doc/.

[2]

D. Ancona, M. Ancona, A Cuni, and N. Matsakis. "RPython: A step towards reconciling dynamically and statically typed OO languages". In Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), pages 53--64, Montreal, QC, Canada, 2007.

Digital Library

[3]

Stefan Behnel, Robert Bradshaw, and Dag Sverre Seljebotn. Cython: C-Extensions for Python (2009-09-30). http://www.cython.org.

[4]

Carl Friedrich Bolz, Antonio Cuni, Maciej Fijalkowski, and Armin Rigo. Tracing the meta-level: Pypy's tracing jit compiler. In Workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Progr. Systems, pages 18--25, Genova, Italy, 2009.

Digital Library

[5]

Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for GPUs: stream computing on graphics hardware. ACM Transactions on Graphics (TG), 23(3):777--786, 2004.

Digital Library

[6]

Nvidia CUDA (2009-09-30). http://www.nvidia.com/cuda.

[7]

Mark Dufour. Shed Skin - An experimental (restricted) Python to C++ compiler (2009-09-30). http://code.google.com/p/shedskin/.

[8]

Alexandre E. Eichenberger, Kathryn O'Brien, Kevin O'Brien, Peng Wu, Tong Chen, Peter H. Oden, Daniel A. Prener, Janice C. Shepherd, Byoungro So, Zehra Sura, Amy Wang, Tao Zhang, Peng Zhao, and Michael Gschwind. Optimizing compiler for the cell processor. In Parallel Architectures and Compilation Techniques (PACT), pages 161--172, St. Louis, MO, USA, 2005.

Digital Library

[9]

Greg Ewing. Pyrex - a Language for Writing Python Extension Modules (2009-09-30). http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/.

[10]

Rahul Garg. A compiler for parallel execution of numerical Python programs on graphics processing units. Master's thesis, Computing Science, Univ. of Alberta, Edmonton, AB, Canada, September 2009.

[11]

Wendy Jones. Beginning DirectX 10 Game Programming. Course Technology Press, Boston, MA, USA, 1st edition, 2007.

Digital Library

[12]

M. Kandemir, J. Ramanujam, M. J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. Dynamic management of scratch-pad memory space. In Design Automation Conference (DAC), pages 690--695, 2001.

Digital Library

[13]

Francois Labonte, Peter Mattson, William Thies, Ian Buck, Christos Kozyrakis, and Mark Horowitz. The Stream Virtual Machine. In Parallel Architectures and Compilation Techniques (PACT), pages 267--277, Antibes Juan-les-Pins, France, 2004.

Digital Library

[14]

Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 101--110, Raleigh, NC, USA, 2009.

Digital Library

[15]

Lian Li, Lin Gao, and Jingling Xue. Memory coloring: A compiler approach for scratchpad memory management. In Parallel Architectures and Compilation Techniques (PACT), pages 329--338, St. Louis, MO, USA, 2005.

Digital Library

[16]

Lian Li, Hui Wu, Hui Feng, and Jingling Xue. Towards data tiling for whole programs in scratchpad memory allocation. In Asia-Pacific Conference Advances in Computer Systems Architecture (ACSAC), pages 23--25, Seoul, Korea, August 2007. Springer.

Digital Library

[17]

Michael D. McCool, Zheng Qin, and Tiberiu S. Popa. Shader metaprogramming. In ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pages 57--68, Saarbrucken, Germany, 2002. Eurographics Association.

Digital Library

[18]

OpenCL - The open standard for parallel programming of heterogeneous systems (2009-09-30). http://www.khronos.org/opencl/.

[19]

Yunheung Paek, Jay Hoeflinger, and David Padua. Efficient and precise array access analysis. ACM Transactions on Programming Languages and Systems (TOPLAS), 24(1):65--109, 2002.

Digital Library

[20]

Armin Rigo and Samuele Pedroni. PyPy's approach to virtual machine construction. In Workshop Companion to Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), pages 944--953, Portland, OR, USA, 2006.

Digital Library

[21]

Silvius Rus, Lawrence Rauchwerger, and Jay Hoeflinger. Run-time Assisted Interprocedural Analysis of Memory Access Patterns. Technical report, Department of Computer Science, Texas A&M University, 2001.

Digital Library

[22]

Silvius Rus, Lawrence Rauchwerger, and Jay Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. International Journal of Parallel Programming, 31(4):251--283, 2003.

Digital Library

[23]

Bratin Saha, Xiaocheng Zhou, Hu Chen, Ying Gao, Shoumeng Yan, Mohan Rajagopalan, Jesse Fang, Peinan Zhang, Ronny Ronen, and Avi Mendelson. Programming model for a heterogeneous x86 platform. In Conference on Programming Language Design and Implementation (PLDI), pages 431--440, Dublin, Ireland, 2009.

Digital Library

[24]

John A. Stratton, Sam S. Stone, and Wen-Mei W. Hwu. MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs. In Workshop on Languages and Compilers and Parallel Computing (LCPC), pages 16--30, Edmonton, AB, Canada, August 2008.

Digital Library

[25]

William Thies, Michael Karczmarek, and Saman Amarasinghe. StreamIt: A Language for Streaming Applications. In Compiler Construction (CC), pages 49--84, 2002.

Digital Library

[26]

Sumesh Udayakumaran and Rajeev Barua. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), pages 276--286, San Jose, California, USA, 2003.

Digital Library

[27]

A. Udupa, R. Govindarajan, and M. J Thazhuthaveetil. Software Pipelined Execution of Stream Programs on GPUs. In International Symposium on Code Generation and Optimization (CGO), pages 200--209, Seattle, WA, USA, 2009.

Digital Library

[28]

Perry H. Wang, Jamison D. Collins, Gautham N. Chinya, Hong Jiang, Xinmin Tian, Milind Girkar, Nick Y. Yang, Guei-Yuan Lueh, and Hong Wang. EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system. In Conference on Programming Language Design and Implementation (PLDI), pages 156--166, San Diego, CA, USA, 2007. ACM.

Digital Library

Cited By

Zhou TShirako JSarkar VRodríguez GSadayappan PSukumaran-Rajam A(2024)APPy: Annotated Parallelism for Python on GPUsProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641575(113-125)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641575
Zhong JHort MSarro FWagner M(2022)Py2CyProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3520304.3534037(1950-1955)Online publication date: 9-Jul-2022
https://dl.acm.org/doi/10.1145/3520304.3534037
Duan JHamlen KFerrell BWeissman JButt ASmirni E(2019)Better Late Than NeverProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3307681.3326604(207-218)Online publication date: 17-Jun-2019
https://dl.acm.org/doi/10.1145/3307681.3326604
Show More Cited By

Recommendations

A Translation Framework for Virtual Execution Environment on CPU/GPU Architecture
PAAP '10: Proceedings of the 2010 3rd International Symposium on Parallel Architectures, Algorithms and Programming

GPUs are many-core processors with tremendous computational power. However, as automatic parallelization has not been realized yet, developing high-performance parallel code for GPUs is still very challenging. The paper presents a novel translation ...
Optimizing tensor contraction expressions for hybrid CPU-GPU execution

Tensor contractions are generalized multidimensional matrix multiplication operations that widely occur in quantum chemistry. Efficient execution of tensor contractions on Graphics Processing Units (GPUs) requires several challenges to be addressed, ...
Compiling and Optimizing Java 8 Programs for GPU Execution
PACT '15: Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)

GPUs can enable significant performance improvements for certain classes of data parallel applications and are widely used in recent computer systems. However, GPU execution currently requires explicit low-level operations such as 1) managing memory ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

GPGPU-3: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units

March 2010

124 pages

ISBN:9781605589350

DOI:10.1145/1735688

General Chairs:
David Kaeli
Northeastern University, Boston, MA
,
Miriam Leeser
Northeastern University, Boston, MA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

GPGPU-3

GPGPU-3: Third Workshop on General-Purpose Computation on Graphics Processing Units

March 14, 2010

Pennsylvania, Pittsburgh, USA

Acceptance Rates

Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
745
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou TShirako JSarkar VRodríguez GSadayappan PSukumaran-Rajam A(2024)APPy: Annotated Parallelism for Python on GPUsProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641575(113-125)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641575
Zhong JHort MSarro FWagner M(2022)Py2CyProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3520304.3534037(1950-1955)Online publication date: 9-Jul-2022
https://dl.acm.org/doi/10.1145/3520304.3534037
Duan JHamlen KFerrell BWeissman JButt ASmirni E(2019)Better Late Than NeverProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3307681.3326604(207-218)Online publication date: 17-Jun-2019
https://dl.acm.org/doi/10.1145/3307681.3326604
Yang YPrestwood SBarnes C(2016)VizGenACM Transactions on Graphics10.1145/2980179.298240335:6(1-13)Online publication date: 5-Dec-2016
https://dl.acm.org/doi/10.1145/2980179.2982403
Garg RHendren LAmaral JTorrellas J(2014)VelociraptorProceedings of the 23rd international conference on Parallel architectures and compilation10.1145/2628071.2628097(317-330)Online publication date: 24-Aug-2014
https://dl.acm.org/doi/10.1145/2628071.2628097
Garg RHendren L(2014)Just-in-time shape inference for array-based languagesProceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/2627373.2627382(50-55)Online publication date: 9-Jun-2014
https://dl.acm.org/doi/10.1145/2627373.2627382
Alvanosl MAmaral JTiotto EFarreras MMartorell X(2014)Reducing Compiler-Inserted Instrumentation in Unified-Parallel-C Code GenerationProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.34(270-277)Online publication date: 22-Oct-2014
https://dl.acm.org/doi/10.1109/SBAC-PAD.2014.34
Kristensen MLund SBlum TSkovhede KVinter B(2014)BohriumProceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops10.1109/IPDPSW.2014.44(312-321)Online publication date: 19-May-2014
https://dl.acm.org/doi/10.1109/IPDPSW.2014.44
Blum TKristensen MVinter B(2014)Transparent GPU Execution of NumPy ApplicationsProceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops10.1109/IPDPSW.2014.114(1002-1010)Online publication date: 19-May-2014
https://dl.acm.org/doi/10.1109/IPDPSW.2014.114
Mueller FZhang Y(2013)HidpProceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2013.6494994(1-11)Online publication date: 23-Feb-2013
https://dl.acm.org/doi/10.1109/CGO.2013.6494994
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten