Abstract
Graphics processing units (GPU) have taken an important role in the general purpose computing market in recent years. At present, the common approach to programming GPU units is to write GPU specific code with low level GPU APIs such as CUDA. Although this approach can achieve good performance, it creates serious portability issues as programmers are required to write a specific version of the code for each potential target architecture. This results in high development and maintenance costs. We believe it is desirable to have a programming model which provides source code portability between CPUs and GPUs, as well as different GPUs. This would allow programmers to write one version of the code, which can be compiled and executed on either CPUs or GPUs efficiently without modification. In this paper, we propose MapCG, a MapReduce framework to provide source code level portability between CPUs and GPUs. In contrast to other approaches such as OpenCL, our framework, based on MapReduce, provides a high level programming model and makes programming much easier. We describe the design of MapCG, including the MapReduce-style high-level programming framework and the runtime system on the CPU and GPU. A prototype of the MapCG runtime, supporting multi-core CPUs and NVIDIA GPUs, was implemented. Our experimental results show that this implementation can execute the same source code efficiently on multi-core CPU platforms and GPUs, achieving an average speedup of 1.6 ~ 2.5x over previous implementations of MapReduce on eight commonly used applications.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
NVIDIA. NVIDIA CUDA compute unified device architecture programming guide. http://developer.dounload.nvidia.com/compute/cuda/1-1/NVIDIA_CUDA_programming_Guide_1.1.pdf, 2007.
Eichenberger A E, O'Brien J K, O'Brien K M et al. Using advanced compiler technology to exploit the performance of the Cell Broadband Engine™ architecture. IBM Systems Journal, 2006, 45(1): 59-84.
Zhu W R, Sreedhar V C, Hu Z, Gao G R. Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In Proc. the 34th ISCA, June 2007, pp.35-45.
Buck I, Foley T, Horn D et al. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph., 2004, 23(3): 777-786.
Khronos Group. OpenCL specification. http://www.khronos.org/registry/cl/.
Stratton J, Stone S S, Hwu W M. MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs. In Proc. the 21th LCPC, July 31-Aug. 2, 2008, pp.16-30.
He B S, Fang W B, Luo Q, Govindaraju N K, Wang T. Mars: A mapreduce framework on graphics processors. In Proc. the 17th PACT, Oct. 2008, pp.260-269.
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C. Evaluating mapreduce for multi-core and multiprocessor systems. In Proc. the 13th HPCA, Feb. 2007, pp.13-24.
Berger E D, McKinley K S, Blumofe R D, Wilson P R. Hoard: A scalable memory allocator for multithreaded applications. SIGPLAN Not., 2000, 35(11): 117-128.
Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In Proc. the 6th OSDI, Dec. 2004, pp.137-150.
Ekanayake J, Pallickara S, Fox G. MapReduce for data intensive scientific analyses. In Proc. the 4th IEEE International Conference on eScience, Dec. 2008, pp.277-284.
Chu C T, Kim S K, Lin Y A et al. Map-reduce for machine learning on multicore. Advances in Neural Information Processing System, 2007, 19: 281-288.
Matthews S, Williams T. Mrsrf: An efficient mapreduce algorithm for analyzing large collections of evolutionary trees. BMC Bioinformatics, 2010, 11(Suppl. 1): S15.
Panda B, Herbach J, Basu S, Bayardo R. PLANET: Massively parallel learning of tree ensembles with mapreduce. In Proc. VLDB, Aug. 2009, pp.1426-1437.
Yoo R M, Romano A, Kozyrakis C. Phoenix rebirth: Scalable mapreduce on a large-scale shared-memory system. In Proc. IISWC, Oct. 2009, pp.198-207.
Fomitchev M, Ruppert E. Lock-free linked lists and skip lists. In Proc. the 23rd PODC, Jul. 2004, pp.50-59.
Dice D, Garthwaite A. Mostly lock-free malloc. In Proc. the 3 rd ISMM, Jun. 2002, pp.163-174.
Huang X H, Rodrigues C I, Jones S et al. XMalloc: A scalable lock-free dynamic memory allocator for many-core machines. In Proc. the 10th CIT, June 29-July 1, 2010, pp.1134-1139.
Fang W B, He B S, Luo Q et al. Mars: Accelerating MapReduce with graphics processors. IEEE Transactions on Parallel and Distributed Systems, 2010, 22(4): 608-620.
Ji F, Ma X S. Using shared memory to accelerate MapReduce on graphics processing units. In Proc. the 25th IPDPS, May 2011, pp.805-816.
Apache hadoop. http://hadoop.apache.org/.
Chen R, Chen H B, Zang B Y. Tiled-MapReduce: Optimizing resource usage of data-parallel applications on multicore with tiling. In Proc. the 19th PACT, Sep. 2010, pp.523-534.
Shan Y, Wang B, Yan J et al. Fpmr: Mapreduce framework on fpga. In Proc. the 18th FPGA, Feb. 2010, pp.93-102.
Rafique M M, Rose B, Butt A R, Nikolopoulos D S. CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters. In Proc. the 23rd IPDPS, May 2009.
Govindaraju N, Gray J, Kumar R et al. GPUTeraSort: High performance graphics co-processor sorting for large database management. In Proc. SIGMOD/PODS, Jun. 2006, pp.325-336.
AMD CTM. http://www.and.com/us/press-release/Pages/Press/Release_114147.aspx, 2011.
Yan Y H, Grossman M, Sarkar V. JCUDA: A programmer-friendly interface for accelerating Java programs with CUDA. In Proc. the 15th Euro-Par, Aug. 2009, pp.887-899.
Wang P H, Collins J P, Chinya G N et al. EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proc. PLDI, Jun. 2007, pp.156-166.
Linderman M, Collins J P, Wang H, Meng T H. Merge: A programming model for heterogeneous multi-core systems. In Proc. the 13th ASPLOS, Mar. 2008, pp.287-296.
Lee S, Min S J, Eigenmann R. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proc. the 14th PPoPP, Feb. 2009, pp.101-110.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China under Grant No. 60973143, the National High Technology Research and Development 863 Program of China under Grant No. 2008AA01A201, and the National Basic Research 973 Program of China under Grant No. 2007CB310900.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hong, CT., Chen, DH., Chen, YB. et al. Providing Source Code Level Portability Between CPU and GPU with MapCG. J. Comput. Sci. Technol. 27, 42–56 (2012). https://doi.org/10.1007/s11390-012-1205-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-012-1205-4