Abstract
As a general purpose scalable parallel programming model for coding highly parallel applications, CUDA from NVIDIA provides several key abstractions: a hierarchy of thread blocks, shared memory, and barrier synchronization. It has proven to be rather effective at programming multithreaded many-core GPUs that scale transparently to hundreds of cores; as a result, scientists all over the industry and academia are using CUDA to dramatically expedite on production and codes. GPU-based clusters are likely to play an essential role in future cloud computing centers, because some computation-intensive applications may require GPUs as well as CPUs. In this paper, we adopted the PCI pass-through technology and set up virtual machines in a virtual environment; thus, we were able to use the NVIDIA graphics card and the CUDA high performance computing as well. In this way, the virtual machine has not only the virtual CPU but also the real GPU for computing. The performance of the virtual machine is predicted to increase dramatically. This paper measured the difference of performance between physical and virtual machines using CUDA, and investigated how virtual machines would verify CPU numbers under the influence of CUDA performance. At length, we compared CUDA performance of two open source virtualization hypervisor environments, with or without using PCI pass-through. Through experimental results, we will be able to tell which environment is most efficient in a virtual environment with CUDA.
Similar content being viewed by others
References
TOP 500 (2013) http://www.top500.org. Accessed 17 September 2013
nVidia (2013) http://www.nvidia.com. Accessed 17 September 2013
Cloud computing (2013) http://en.wikipedia.org/wiki/Cloud_computing. Accessed 17 September 2013
GPGPU (2013) http://en.wikipedia.org/wiki/GPGPU. Accessed 17 September 2013
PCI-pass-through (2013) http://www.ibm.com/developerworks/linux/library/l-pci-passthrough. Accessed 17 September 2013
CUDA (2013) http://www.nvidia.com.tw/object/cuda_home_new_tw.html. Accessed 17 September 2013
National Institute of Standards and Technology (2013) http://www.nist.gov/index.html. Accessed 17 September
Virtualization (2013) http://en.wikipedia.org/wiki/Virtualization. Accessed 17 September 2013
Full virtualization (2013) http://en.wikipedia.org/wiki/Full_virtualization. Accessed 17 September 2013
Para virtualization (2013) http://en.wikipedia.org/wiki/Paravirtualization. Accessed 17 September 2013
Xen (2013) http://www.xen.org. Accessed 17 September 2013
KVM (2013) http://www.linux-kvm.org/page/Main_Page. Accessed 17 September 2013
NVIDIA CUDA SDK (2013) http://developer.nvidia.com/cuda-cc-sdk-code-samples. Accessed 17 September 2013
Download CUDA (2013) http://developer.nvidia.com/object/cuda.htm. Accessed 17 September 2013
NVIDIA CUDA programming guide (2013) http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#abstract. Accessed 17 September 2013
CUDA-wiki (2013) http://en.wikipedia.org/wiki/CUDA. Accessed 17 September 2013
Lionetti FV, McCulloch AD, Baden SB (2010) Source-to-source optimization of CUDA C for GPU accelerated cardiac cell modeling. In: Euro-par 2010—parallel processing. Lecture notes in computer science, vol 6271, pp 38–49
Jung S (2009) Parallelized pairwise sequence alignment using CUDA on multiple GPUs. BMC Bioinform 10(Suppl 7):A3
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380
OpenCL (2013) http://www.khronos.org/opencl. Accessed 17 September 2013
OpenCL-wiki (2013) http://en.wikipedia.org/wiki/OpenCL. Accessed 17 September 2013
Harvey MJ, De Fabritiis G (2011) Swan: a tool for porting CUDA programs to OpenCL. Comput Phys Commun 182(4):1093–1099
QEMU (2013) http://wiki.qemu.org/Main_Page. Accessed 17 September 2013
VirtualBox (2013) https://www.virtualbox.org. Accessed 17 September 2013
Lo C-TD, Qian K (2010) Green computing methodology for next generation computing scientists. In: Proceedings of IEEE 34th annual computer software and applications conference, pp 250–251
Zhong B, Feng M, Lung C-H (2010) A green computing based architecture comparison and analysis. In: Proceedings of the 2010 IEEE/ACM int’l conference on green computing and communications & int’l conference on cyber, physical and social computing (GREENCOM-CPSCOM’10), pp 386–391
Duato J, Peña AJ, Silla F, Mayo R, Quintana-Ortí ES (2010) RCUDA: reducing the number of GPUbased accelerators in high performance clusters. In: Proceedings of the 2010 international conference on high performance computing & simulation (HPCS 2010), June 2010, pp 224–231
Duato J, Pena AJ, Silla F, Fernandez JC, Mayo R, Quintana-Orti ES (2011) Enabling CUDA acceleration within virtual machines using rCUDA. In: Proceedings of 18th international conference on high performance computing 2010 (HiPC), pp 1–10
Duato J, Peña AJ, Silla F, Mayo R, Quintana-Orti ES (2011) Performance of CUDA virtualized remote GPUs in high performance clusters. In: Proceedings of international conference on parallel processing (ICPP), September 2011, pp 365–374
Shi L, Chen H, Sun J (2009) VCUDA: GPU accelerated high performance computing in virtual machines. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS’09), pp 1–11
Gupta V, Gavrilovska A, Schwan K, Kharche H, Tolia N, Talwar V, Ranganathan P (2009) GViM: GPU-accelerated virtual machines. In: 3rd workshop on system-level virtualization for high performance computing. ACM, NY, USA, pp 17–24
Giunta G, Montella R, Agrillo G, Coviello G (2010) A GPGPU transparent virtualization component for high performance computing clouds. In: Ambra PD, Guarracino M, Talia D (eds) Euro-Par 2010—parallel processing. Lecture notes in computer science, vol 6271. Springer, Berlin, pp 379–391
Front and back ends (2013) http://en.wikipedia.org/wiki/Front_and_back_ends. Accessed 17 September 2013
VMGL (2013) http://sysweb.cs.toronto.edu/vmgl. Accessed 17 September 2013
Amit N, Ben-Yehuda M, Yassour B-A (2012) IOMMU: strategies for mitigating the IOTLB bottleneck. In: Computer architecture. Lecture notes in computer science, vol 6161, pp 256–274
NVIDIA Telsa C1060 computing processor (2012) http://www.nvidia.com/object/product_tesla_c1060_us.html. Accessed 12 May 2012
NVIDIA quadro NVS 295 (2012) http://www.nvidia.com.tw/object/product_quadro_nvs_295_tw.html. Accessed 12 May 2012
NVIDIA Telsa C2050 computing processor (2013) http://www.nvidia.com.tw/object/product_tesla_C2050_C2070_tw.html. Accessed 17 September 2013
CentOS (2013) http://www.centos.org. Accessed 17 September 2013
Lagar-Cavilla HA, Tolia N, Satyanarayanan M, de Lara E (2007) VMM-independent graphics acceleration. In: Proceedings of the 3rd international conference on virtual execution environments (VEE’07). ACM, New York, pp 33–43
Yang CT, Huang CL, Lin CF (2010) Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters. Comput Phys Commun 182(1):266–269
Yang CT, Huang CL, Lin CF, Chang TC (2010) Hybrid parallel programming on GPU clusters. In: Proceedings of international symposium on parallel and distributed processing with applications (ISPA), September 2010, pp 142–147
Yang CT, Chang TC, Wang HY, Chu WCC, Chang CH (2011) Performance comparison with OpenMP parallelization for multi-core systems. In: Proceedings 2011 IEEE 9th international symposium on parallel and distributed processing with applications (ISPA), pp 232–237
Acknowledgement
This work was supported in part by the National Science Council, Taiwan, ROC, under grant numbers NSC 102-2218-E-029-002, NSC 101-2218-E-029-004, and NSC 102-2622-E-029-005-CC3. This work also supported in part by Tunghai University, Taiwan ROC, under grant number GREEnS 04-2.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, CT., Liu, JC., Wang, HY. et al. Implementation of GPU virtualization using PCI pass-through mechanism. J Supercomput 68, 183–213 (2014). https://doi.org/10.1007/s11227-013-1034-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-1034-4