Abstract
Nowadays, not only CPU but also GPU goes along the trend of multi-core processors. Parallel processing presents not only an opportunity but also a challenge at the same time. To explicitly parallelize the software by programmers or compilers is the key for enhancing the performance on multi-core chip. In this paper, we first introduce some of the automatic parallel tools based OpenMP, which could save the time to rewrite codes for parallel processing on multicore system. Then we focus on ROSE and explore it in depth. And we also implement an interface to reduce its complexity of use and use some automatic parallelization for CUDA.
This study was supported in part by the National Science Council, Taiwan ROC, under grant numbers NSC 100-2218-E-029-001, NSC 101-2221-E-029-014 and NSC 101-2622-E-029-008-CC3.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yang, C.T., Huang, C.L., Lin, C.F.: Hybrid CUDA, OpenMP, and MPI Parallel Programming on Multicore GPU Clusters. Computer Physics Communications 182(1), 266–269 (2010)
Yang, C.T., Huang, C.L., Lin, C.F., Chang, T.C.: Hybrid Parallel Programming on GPU Clusters. In: International Symposium on Parallel and Distributed Processing with Applications, ISPA 2010, pp. 142–147 (September 2010)
Goddeke, D., Strzodka, R., Mohd-Yusof, J., McCormick, P., Buijssen, S., Grajewski, M., Tureka, S.: Exploring weak scalability for FEM calculations on a GPU-enhanced cluster. Parallel Computing 33(10-11), 685–699 (2007)
Bodin, F., Bihan, S.: Heterogeneous multicore parallel programming for graphics processing units. Scientific Programming 17, 325–336 (2009)
Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A hybrid multi-core parallel programming environment. In: The Proceedings of the Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2007, Boston, Massachussets, USA, October 4 (2007)
Alonso, P., Cortina, R., Martinez-Zaldivar, F.J., Ranilla, J.: Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA. J. Supercomputing, doi:10.1007/s11227-009-0360-z (SpringerLink Online Date: November 18, 2009) (in press)
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using CUDA. Journal of Parallel and Distributed Computing 68(10), 1370–1380 (2008)
Liao, C., Quinlan, D.J., Panas, T., de Supinski, B.R.: A ROSE-Based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 15–28. Springer, Heidelberg (2010)
Liao, C., Quinlan, D.J., Willcock, J.J., Panas, T.: Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions. International Journal of Parallel Programming 38(5-6), 361–378 (2010)
Carribault, P., Pérache, M., Jourdren, H.: Enabling Low-Overhead Hybrid MPI/OpenMP Parallelism with MPC. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 1–14. Springer, Heidelberg (2010)
Close To Metal wiki, http://en.wikipedia.org/wiki/Close_to_Metal
Open MP Specification, http://openmp.org/wp/about-openmp/
POSIX Threads Programming, https://computing.llnl.gov/tutorials/pthreads/
Intel® Threading Building Blocks, http://www.threadingbuildingblocks.org/
Intel, http://software.intel.com/en-us/articles/intel-parallel-studio-xe/
The Potland Group, http://www.pgroup.com/index.htm
PAR4ALL, http://www.par4all.org/
MPICH, A Portable Implementation of MPI, http://www.mcs.anl.gov/research/projects/mpi/mpich1/index.htm
The CUDA Compiler Driver NVCC, http://moss.csc.ncsu.edu/~mueller/cluster/nvidia/2.0/nvcc_2.0.pdf
Specification Tesla S1070 GPU Computing System, http://www.nvidia.com/docs/IO/43395/SP-04154-001_v02.pdf
Arm11MP Core, http://www.arm.com/products/processors/classic/arm11/arm11-mpcore.php
NVIDIA CUDA Programming Guide, http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.3.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, CT., Chang, TC., Huang, KL., Liu, JC., Chang, CH. (2012). Performance Evaluation of OpenMP and CUDA on Multicore Systems. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33065-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-33065-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33064-3
Online ISBN: 978-3-642-33065-0
eBook Packages: Computer ScienceComputer Science (R0)