Providing Source Code Level Portability Between CPU and GPU with MapCG

Hong, Chun-Tao; Chen, De-Hao; Chen, Yu-Bei; Chen, Wen-Guang; Zheng, Wei-Min; Lin, Hai-Bo

doi:10.1007/s11390-012-1205-4

Providing Source Code Level Portability Between CPU and GPU with MapCG

Regular Paper
Published: 09 January 2012

Volume 27, pages 42–56, (2012)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Chun-Tao Hong¹,
De-Hao Chen¹,
Yu-Bei Chen²,
Wen-Guang Chen¹,
Wei-Min Zheng¹ &
…
Hai-Bo Lin³

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Graphics processing units (GPU) have taken an important role in the general purpose computing market in recent years. At present, the common approach to programming GPU units is to write GPU specific code with low level GPU APIs such as CUDA. Although this approach can achieve good performance, it creates serious portability issues as programmers are required to write a specific version of the code for each potential target architecture. This results in high development and maintenance costs. We believe it is desirable to have a programming model which provides source code portability between CPUs and GPUs, as well as different GPUs. This would allow programmers to write one version of the code, which can be compiled and executed on either CPUs or GPUs efficiently without modification. In this paper, we propose MapCG, a MapReduce framework to provide source code level portability between CPUs and GPUs. In contrast to other approaches such as OpenCL, our framework, based on MapReduce, provides a high level programming model and makes programming much easier. We describe the design of MapCG, including the MapReduce-style high-level programming framework and the runtime system on the CPU and GPU. A prototype of the MapCG runtime, supporting multi-core CPUs and NVIDIA GPUs, was implemented. Our experimental results show that this implementation can execute the same source code efficiently on multi-core CPU platforms and GPUs, achieving an average speedup of 1.6 ~ 2.5x over previous implementations of MapReduce on eight commonly used applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multi-Level Platform-Independent GPU API for High-Level Programming Models

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading

Article 29 October 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

NVIDIA. NVIDIA CUDA compute unified device architecture programming guide. http://developer.dounload.nvidia.com/compute/cuda/1-1/NVIDIA_CUDA_programming_Guide_1.1.pdf, 2007.
Eichenberger A E, O'Brien J K, O'Brien K M et al. Using advanced compiler technology to exploit the performance of the Cell Broadband Engine™ architecture. IBM Systems Journal, 2006, 45(1): 59-84.
Article Google Scholar
Zhu W R, Sreedhar V C, Hu Z, Gao G R. Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In Proc. the 34th ISCA, June 2007, pp.35-45.
Buck I, Foley T, Horn D et al. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph., 2004, 23(3): 777-786.
Article Google Scholar
Khronos Group. OpenCL specification. http://www.khronos.org/registry/cl/.
Stratton J, Stone S S, Hwu W M. MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs. In Proc. the 21th LCPC, July 31-Aug. 2, 2008, pp.16-30.
He B S, Fang W B, Luo Q, Govindaraju N K, Wang T. Mars: A mapreduce framework on graphics processors. In Proc. the 17th PACT, Oct. 2008, pp.260-269.
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C. Evaluating mapreduce for multi-core and multiprocessor systems. In Proc. the 13th HPCA, Feb. 2007, pp.13-24.
Berger E D, McKinley K S, Blumofe R D, Wilson P R. Hoard: A scalable memory allocator for multithreaded applications. SIGPLAN Not., 2000, 35(11): 117-128.
Article Google Scholar
Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In Proc. the 6th OSDI, Dec. 2004, pp.137-150.
Ekanayake J, Pallickara S, Fox G. MapReduce for data intensive scientific analyses. In Proc. the 4th IEEE International Conference on eScience, Dec. 2008, pp.277-284.
Chu C T, Kim S K, Lin Y A et al. Map-reduce for machine learning on multicore. Advances in Neural Information Processing System, 2007, 19: 281-288.
Google Scholar
Matthews S, Williams T. Mrsrf: An efficient mapreduce algorithm for analyzing large collections of evolutionary trees. BMC Bioinformatics, 2010, 11(Suppl. 1): S15.
Article Google Scholar
Panda B, Herbach J, Basu S, Bayardo R. PLANET: Massively parallel learning of tree ensembles with mapreduce. In Proc. VLDB, Aug. 2009, pp.1426-1437.
Yoo R M, Romano A, Kozyrakis C. Phoenix rebirth: Scalable mapreduce on a large-scale shared-memory system. In Proc. IISWC, Oct. 2009, pp.198-207.
Fomitchev M, Ruppert E. Lock-free linked lists and skip lists. In Proc. the 23rd PODC, Jul. 2004, pp.50-59.
Dice D, Garthwaite A. Mostly lock-free malloc. In Proc. the 3 rd ISMM, Jun. 2002, pp.163-174.
Huang X H, Rodrigues C I, Jones S et al. XMalloc: A scalable lock-free dynamic memory allocator for many-core machines. In Proc. the 10th CIT, June 29-July 1, 2010, pp.1134-1139.
Fang W B, He B S, Luo Q et al. Mars: Accelerating MapReduce with graphics processors. IEEE Transactions on Parallel and Distributed Systems, 2010, 22(4): 608-620.
Article Google Scholar
Ji F, Ma X S. Using shared memory to accelerate MapReduce on graphics processing units. In Proc. the 25th IPDPS, May 2011, pp.805-816.
Apache hadoop. http://hadoop.apache.org/.
Chen R, Chen H B, Zang B Y. Tiled-MapReduce: Optimizing resource usage of data-parallel applications on multicore with tiling. In Proc. the 19th PACT, Sep. 2010, pp.523-534.
Shan Y, Wang B, Yan J et al. Fpmr: Mapreduce framework on fpga. In Proc. the 18th FPGA, Feb. 2010, pp.93-102.
Rafique M M, Rose B, Butt A R, Nikolopoulos D S. CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters. In Proc. the 23rd IPDPS, May 2009.
Govindaraju N, Gray J, Kumar R et al. GPUTeraSort: High performance graphics co-processor sorting for large database management. In Proc. SIGMOD/PODS, Jun. 2006, pp.325-336.
AMD CTM. http://www.and.com/us/press-release/Pages/Press/Release_114147.aspx, 2011.
Yan Y H, Grossman M, Sarkar V. JCUDA: A programmer-friendly interface for accelerating Java programs with CUDA. In Proc. the 15th Euro-Par, Aug. 2009, pp.887-899.
Wang P H, Collins J P, Chinya G N et al. EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proc. PLDI, Jun. 2007, pp.156-166.
Linderman M, Collins J P, Wang H, Meng T H. Merge: A programming model for heterogeneous multi-core systems. In Proc. the 13th ASPLOS, Mar. 2008, pp.287-296.
Lee S, Min S J, Eigenmann R. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proc. the 14th PPoPP, Feb. 2009, pp.101-110.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Chun-Tao Hong (Member, CCF), De-Hao Chen (Member, CCF), Wen-Guang Chen (Member, CCF, ACM, IEEE) & Wei-Min Zheng (Member, CCF, ACM, IEEE)
Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
Yu-Bei Chen
IBM China Research Lab, Beijing, 100094, China
Hai-Bo Lin (Member, ACM, IEEE)

Authors

Chun-Tao Hong
View author publications
You can also search for this author inPubMed Google Scholar
De-Hao Chen
View author publications
You can also search for this author inPubMed Google Scholar
Yu-Bei Chen
View author publications
You can also search for this author inPubMed Google Scholar
Wen-Guang Chen
View author publications
You can also search for this author inPubMed Google Scholar
Wei-Min Zheng
View author publications
You can also search for this author inPubMed Google Scholar
Hai-Bo Lin
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Chun-Tao Hong.

Additional information

This work was supported by the National Natural Science Foundation of China under Grant No. 60973143, the National High Technology Research and Development 863 Program of China under Grant No. 2008AA01A201, and the National Basic Research 973 Program of China under Grant No. 2007CB310900.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 105 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, CT., Chen, DH., Chen, YB. et al. Providing Source Code Level Portability Between CPU and GPU with MapCG. J. Comput. Sci. Technol. 27, 42–56 (2012). https://doi.org/10.1007/s11390-012-1205-4

Download citation

Received: 23 February 2011
Revised: 27 May 2011
Published: 09 January 2012
Issue Date: January 2012
DOI: https://doi.org/10.1007/s11390-012-1205-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Providing Source Code Level Portability Between CPU and GPU with MapCG

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Multi-Level Platform-Independent GPU API for High-Level Programming Models

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 105 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now