Performance Evaluation of OpenMP and CUDA on Multicore Systems

Yang, Chao-Tung; Chang, Tzu-Chieh; Huang, Kuan-Lung; Liu, Jung-Chun; Chang, Chih-Hung

doi:10.1007/978-3-642-33065-0_25

Chao-Tung Yang²²,
Tzu-Chieh Chang²²,
Kuan-Lung Huang²²,
Jung-Chun Liu²² &
…
Chih-Hung Chang²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7440))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1577 Accesses
2 Citations

Abstract

Nowadays, not only CPU but also GPU goes along the trend of multi-core processors. Parallel processing presents not only an opportunity but also a challenge at the same time. To explicitly parallelize the software by programmers or compilers is the key for enhancing the performance on multi-core chip. In this paper, we first introduce some of the automatic parallel tools based OpenMP, which could save the time to rewrite codes for parallel processing on multicore system. Then we focus on ROSE and explore it in depth. And we also implement an interface to reduce its complexity of use and use some automatic parallelization for CUDA.

This study was supported in part by the National Science Council, Taiwan ROC, under grant numbers NSC 100-2218-E-029-001, NSC 101-2221-E-029-014 and NSC 101-2622-E-029-008-CC3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yang, C.T., Huang, C.L., Lin, C.F.: Hybrid CUDA, OpenMP, and MPI Parallel Programming on Multicore GPU Clusters. Computer Physics Communications 182(1), 266–269 (2010)
Article Google Scholar
Yang, C.T., Huang, C.L., Lin, C.F., Chang, T.C.: Hybrid Parallel Programming on GPU Clusters. In: International Symposium on Parallel and Distributed Processing with Applications, ISPA 2010, pp. 142–147 (September 2010)
Google Scholar
Goddeke, D., Strzodka, R., Mohd-Yusof, J., McCormick, P., Buijssen, S., Grajewski, M., Tureka, S.: Exploring weak scalability for FEM calculations on a GPU-enhanced cluster. Parallel Computing 33(10-11), 685–699 (2007)
Article Google Scholar
Bodin, F., Bihan, S.: Heterogeneous multicore parallel programming for graphics processing units. Scientific Programming 17, 325–336 (2009)
Google Scholar
Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A hybrid multi-core parallel programming environment. In: The Proceedings of the Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2007, Boston, Massachussets, USA, October 4 (2007)
Google Scholar
Alonso, P., Cortina, R., Martinez-Zaldivar, F.J., Ranilla, J.: Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA. J. Supercomputing, doi:10.1007/s11227-009-0360-z (SpringerLink Online Date: November 18, 2009) (in press)
Google Scholar
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using CUDA. Journal of Parallel and Distributed Computing 68(10), 1370–1380 (2008)
Article Google Scholar
Liao, C., Quinlan, D.J., Panas, T., de Supinski, B.R.: A ROSE-Based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 15–28. Springer, Heidelberg (2010)
Chapter Google Scholar
Liao, C., Quinlan, D.J., Willcock, J.J., Panas, T.: Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions. International Journal of Parallel Programming 38(5-6), 361–378 (2010)
Article MATH Google Scholar
Carribault, P., Pérache, M., Jourdren, H.: Enabling Low-Overhead Hybrid MPI/OpenMP Parallelism with MPC. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 1–14. Springer, Heidelberg (2010)
Chapter Google Scholar
Close To Metal wiki, http://en.wikipedia.org/wiki/Close_to_Metal
CUDA, http://en.wikipedia.org/wiki/CUDA
MPI, http://www.mcs.anl.gov/research/projects/mpi/
Open MP Specification, http://openmp.org/wp/about-openmp/
POSIX Threads Programming, https://computing.llnl.gov/tutorials/pthreads/
Intel® Threading Building Blocks, http://www.threadingbuildingblocks.org/
Intel, http://software.intel.com/en-us/articles/intel-parallel-studio-xe/
The Potland Group, http://www.pgroup.com/index.htm
PAR4ALL, http://www.par4all.org/
MPICH, A Portable Implementation of MPI, http://www.mcs.anl.gov/research/projects/mpi/mpich1/index.htm
The CUDA Compiler Driver NVCC, http://moss.csc.ncsu.edu/~mueller/cluster/nvidia/2.0/nvcc_2.0.pdf
Specification Tesla S1070 GPU Computing System, http://www.nvidia.com/docs/IO/43395/SP-04154-001_v02.pdf
Arm11MP Core, http://www.arm.com/products/processors/classic/arm11/arm11-mpcore.php
NVIDIA CUDA Programming Guide, http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.3.pdf

Download references

Author information

Authors and Affiliations

Department of Computer Science, Tunghai University, Taichung City, 40704, Taiwan
Chao-Tung Yang, Tzu-Chieh Chang, Kuan-Lung Huang & Jung-Chun Liu
Department of Information Management, Hsiuping University of Science Technology, Taichung City, 41280, Taiwan
Chih-Hung Chang

Authors

Chao-Tung Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tzu-Chieh Chang
View author publications
You can also search for this author in PubMed Google Scholar
Kuan-Lung Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jung-Chun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Hung Chang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology, Deakin University, Melbourne Burwood Campus, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Yang Xiang
SEECS, University of Ottawa, 8, King Edward Ave, K1N 6N5, Ottawa, ON, Canada
Ivan Stojmenovic
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, Higashi-ku, 813-8503, Fukuoka, Japan
Bernady O. Apduhan
School of Information Science and Engineering, Central South University, 410083, Changsha, Hunan Province, P.R. China
Guojun Wang
Department of Information Engineering, Hiroshima University, 1-4-1, Kagamiyama, 739-8527, Higashi-Hiroshima, Japan
Koji Nakano
School of Information Technologies, University of Sydney, Building J12, 2006, Sydney, NSW, Australia
Albert Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, CT., Chang, TC., Huang, KL., Liu, JC., Chang, CH. (2012). Performance Evaluation of OpenMP and CUDA on Multicore Systems. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33065-0_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-33065-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33064-3
Online ISBN: 978-3-642-33065-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics