Implementation of GPU virtualization using PCI pass-through mechanism

Yang, Chao-Tung; Liu, Jung-Chun; Wang, Hsien-Yi; Hsu, Ching-Hsien

doi:10.1007/s11227-013-1034-4

Implementation of GPU virtualization using PCI pass-through mechanism

Published: 31 October 2013

Volume 68, pages 183–213, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Chao-Tung Yang¹,
Jung-Chun Liu¹,
Hsien-Yi Wang¹ &
…
Ching-Hsien Hsu²

632 Accesses
13 Citations
Explore all metrics

Abstract

As a general purpose scalable parallel programming model for coding highly parallel applications, CUDA from NVIDIA provides several key abstractions: a hierarchy of thread blocks, shared memory, and barrier synchronization. It has proven to be rather effective at programming multithreaded many-core GPUs that scale transparently to hundreds of cores; as a result, scientists all over the industry and academia are using CUDA to dramatically expedite on production and codes. GPU-based clusters are likely to play an essential role in future cloud computing centers, because some computation-intensive applications may require GPUs as well as CPUs. In this paper, we adopted the PCI pass-through technology and set up virtual machines in a virtual environment; thus, we were able to use the NVIDIA graphics card and the CUDA high performance computing as well. In this way, the virtual machine has not only the virtual CPU but also the real GPU for computing. The performance of the virtual machine is predicted to increase dramatically. This paper measured the difference of performance between physical and virtual machines using CUDA, and investigated how virtual machines would verify CPU numbers under the influence of CUDA performance. At length, we compared CUDA performance of two open source virtualization hypervisor environments, with or without using PCI pass-through. Through experimental results, we will be able to tell which environment is most efficient in a virtual environment with CUDA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Open-Source Virtualization Layer for CUDA Applications

On the effect of using rCUDA to provide CUDA acceleration to Xen virtual machines

Article 08 September 2018

GPU Virtualization Support in Cloud System

References

TOP 500 (2013) http://www.top500.org. Accessed 17 September 2013
nVidia (2013) http://www.nvidia.com. Accessed 17 September 2013
Cloud computing (2013) http://en.wikipedia.org/wiki/Cloud_computing. Accessed 17 September 2013
GPGPU (2013) http://en.wikipedia.org/wiki/GPGPU. Accessed 17 September 2013
PCI-pass-through (2013) http://www.ibm.com/developerworks/linux/library/l-pci-passthrough. Accessed 17 September 2013
CUDA (2013) http://www.nvidia.com.tw/object/cuda_home_new_tw.html. Accessed 17 September 2013
National Institute of Standards and Technology (2013) http://www.nist.gov/index.html. Accessed 17 September
Virtualization (2013) http://en.wikipedia.org/wiki/Virtualization. Accessed 17 September 2013
Full virtualization (2013) http://en.wikipedia.org/wiki/Full_virtualization. Accessed 17 September 2013
Para virtualization (2013) http://en.wikipedia.org/wiki/Paravirtualization. Accessed 17 September 2013
Xen (2013) http://www.xen.org. Accessed 17 September 2013
KVM (2013) http://www.linux-kvm.org/page/Main_Page. Accessed 17 September 2013
NVIDIA CUDA SDK (2013) http://developer.nvidia.com/cuda-cc-sdk-code-samples. Accessed 17 September 2013
Download CUDA (2013) http://developer.nvidia.com/object/cuda.htm. Accessed 17 September 2013
NVIDIA CUDA programming guide (2013) http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#abstract. Accessed 17 September 2013
CUDA-wiki (2013) http://en.wikipedia.org/wiki/CUDA. Accessed 17 September 2013
Lionetti FV, McCulloch AD, Baden SB (2010) Source-to-source optimization of CUDA C for GPU accelerated cardiac cell modeling. In: Euro-par 2010—parallel processing. Lecture notes in computer science, vol 6271, pp 38–49
Google Scholar
Jung S (2009) Parallelized pairwise sequence alignment using CUDA on multiple GPUs. BMC Bioinform 10(Suppl 7):A3
Article Google Scholar
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380
Article Google Scholar
OpenCL (2013) http://www.khronos.org/opencl. Accessed 17 September 2013
OpenCL-wiki (2013) http://en.wikipedia.org/wiki/OpenCL. Accessed 17 September 2013
Harvey MJ, De Fabritiis G (2011) Swan: a tool for porting CUDA programs to OpenCL. Comput Phys Commun 182(4):1093–1099
Article Google Scholar
QEMU (2013) http://wiki.qemu.org/Main_Page. Accessed 17 September 2013
VirtualBox (2013) https://www.virtualbox.org. Accessed 17 September 2013
Lo C-TD, Qian K (2010) Green computing methodology for next generation computing scientists. In: Proceedings of IEEE 34th annual computer software and applications conference, pp 250–251
Google Scholar
Zhong B, Feng M, Lung C-H (2010) A green computing based architecture comparison and analysis. In: Proceedings of the 2010 IEEE/ACM int’l conference on green computing and communications & int’l conference on cyber, physical and social computing (GREENCOM-CPSCOM’10), pp 386–391
Chapter Google Scholar
Duato J, Peña AJ, Silla F, Mayo R, Quintana-Ortí ES (2010) RCUDA: reducing the number of GPUbased accelerators in high performance clusters. In: Proceedings of the 2010 international conference on high performance computing & simulation (HPCS 2010), June 2010, pp 224–231
Chapter Google Scholar
Duato J, Pena AJ, Silla F, Fernandez JC, Mayo R, Quintana-Orti ES (2011) Enabling CUDA acceleration within virtual machines using rCUDA. In: Proceedings of 18th international conference on high performance computing 2010 (HiPC), pp 1–10
Google Scholar
Duato J, Peña AJ, Silla F, Mayo R, Quintana-Orti ES (2011) Performance of CUDA virtualized remote GPUs in high performance clusters. In: Proceedings of international conference on parallel processing (ICPP), September 2011, pp 365–374
Google Scholar
Shi L, Chen H, Sun J (2009) VCUDA: GPU accelerated high performance computing in virtual machines. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS’09), pp 1–11
Google Scholar
Gupta V, Gavrilovska A, Schwan K, Kharche H, Tolia N, Talwar V, Ranganathan P (2009) GViM: GPU-accelerated virtual machines. In: 3rd workshop on system-level virtualization for high performance computing. ACM, NY, USA, pp 17–24
Google Scholar
Giunta G, Montella R, Agrillo G, Coviello G (2010) A GPGPU transparent virtualization component for high performance computing clouds. In: Ambra PD, Guarracino M, Talia D (eds) Euro-Par 2010—parallel processing. Lecture notes in computer science, vol 6271. Springer, Berlin, pp 379–391
Google Scholar
Front and back ends (2013) http://en.wikipedia.org/wiki/Front_and_back_ends. Accessed 17 September 2013
VMGL (2013) http://sysweb.cs.toronto.edu/vmgl. Accessed 17 September 2013
Amit N, Ben-Yehuda M, Yassour B-A (2012) IOMMU: strategies for mitigating the IOTLB bottleneck. In: Computer architecture. Lecture notes in computer science, vol 6161, pp 256–274
Chapter Google Scholar
NVIDIA Telsa C1060 computing processor (2012) http://www.nvidia.com/object/product_tesla_c1060_us.html. Accessed 12 May 2012
NVIDIA quadro NVS 295 (2012) http://www.nvidia.com.tw/object/product_quadro_nvs_295_tw.html. Accessed 12 May 2012
NVIDIA Telsa C2050 computing processor (2013) http://www.nvidia.com.tw/object/product_tesla_C2050_C2070_tw.html. Accessed 17 September 2013
CentOS (2013) http://www.centos.org. Accessed 17 September 2013
Lagar-Cavilla HA, Tolia N, Satyanarayanan M, de Lara E (2007) VMM-independent graphics acceleration. In: Proceedings of the 3rd international conference on virtual execution environments (VEE’07). ACM, New York, pp 33–43
Chapter Google Scholar
Yang CT, Huang CL, Lin CF (2010) Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters. Comput Phys Commun 182(1):266–269
Article Google Scholar
Yang CT, Huang CL, Lin CF, Chang TC (2010) Hybrid parallel programming on GPU clusters. In: Proceedings of international symposium on parallel and distributed processing with applications (ISPA), September 2010, pp 142–147
Chapter Google Scholar
Yang CT, Chang TC, Wang HY, Chu WCC, Chang CH (2011) Performance comparison with OpenMP parallelization for multi-core systems. In: Proceedings 2011 IEEE 9th international symposium on parallel and distributed processing with applications (ISPA), pp 232–237
Chapter Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Science Council, Taiwan, ROC, under grant numbers NSC 102-2218-E-029-002, NSC 101-2218-E-029-004, and NSC 102-2622-E-029-005-CC3. This work also supported in part by Tunghai University, Taiwan ROC, under grant number GREEnS 04-2.

Author information

Authors and Affiliations

Department of Computer Science, Tunghai University, Taichung, 40704, Taiwan
Chao-Tung Yang, Jung-Chun Liu & Hsien-Yi Wang
Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, Taiwan
Ching-Hsien Hsu

Authors

Chao-Tung Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jung-Chun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hsien-Yi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Hsien Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao-Tung Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, CT., Liu, JC., Wang, HY. et al. Implementation of GPU virtualization using PCI pass-through mechanism. J Supercomput 68, 183–213 (2014). https://doi.org/10.1007/s11227-013-1034-4

Download citation

Published: 31 October 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s11227-013-1034-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implementation of GPU virtualization using PCI pass-through mechanism

Abstract

Access this article

Similar content being viewed by others

An Open-Source Virtualization Layer for CUDA Applications

On the effect of using rCUDA to provide CUDA acceleration to Xen virtual machines

GPU Virtualization Support in Cloud System

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Implementation of GPU virtualization using PCI pass-through mechanism

Abstract

Access this article

Similar content being viewed by others

An Open-Source Virtualization Layer for CUDA Applications

On the effect of using rCUDA to provide CUDA acceleration to Xen virtual machines

GPU Virtualization Support in Cloud System

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation