Skip to main content
Log in

Implementation of GPU virtualization using PCI pass-through mechanism

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

As a general purpose scalable parallel programming model for coding highly parallel applications, CUDA from NVIDIA provides several key abstractions: a hierarchy of thread blocks, shared memory, and barrier synchronization. It has proven to be rather effective at programming multithreaded many-core GPUs that scale transparently to hundreds of cores; as a result, scientists all over the industry and academia are using CUDA to dramatically expedite on production and codes. GPU-based clusters are likely to play an essential role in future cloud computing centers, because some computation-intensive applications may require GPUs as well as CPUs. In this paper, we adopted the PCI pass-through technology and set up virtual machines in a virtual environment; thus, we were able to use the NVIDIA graphics card and the CUDA high performance computing as well. In this way, the virtual machine has not only the virtual CPU but also the real GPU for computing. The performance of the virtual machine is predicted to increase dramatically. This paper measured the difference of performance between physical and virtual machines using CUDA, and investigated how virtual machines would verify CPU numbers under the influence of CUDA performance. At length, we compared CUDA performance of two open source virtualization hypervisor environments, with or without using PCI pass-through. Through experimental results, we will be able to tell which environment is most efficient in a virtual environment with CUDA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39
Fig. 40
Fig. 41
Fig. 42
Fig. 43
Fig. 44
Fig. 45
Fig. 46
Fig. 47
Fig. 48
Fig. 49
Fig. 50
Fig. 51

Similar content being viewed by others

References

  1. TOP 500 (2013) http://www.top500.org. Accessed 17 September 2013

  2. nVidia (2013) http://www.nvidia.com. Accessed 17 September 2013

  3. Cloud computing (2013) http://en.wikipedia.org/wiki/Cloud_computing. Accessed 17 September 2013

  4. GPGPU (2013) http://en.wikipedia.org/wiki/GPGPU. Accessed 17 September 2013

  5. PCI-pass-through (2013) http://www.ibm.com/developerworks/linux/library/l-pci-passthrough. Accessed 17 September 2013

  6. CUDA (2013) http://www.nvidia.com.tw/object/cuda_home_new_tw.html. Accessed 17 September 2013

  7. National Institute of Standards and Technology (2013) http://www.nist.gov/index.html. Accessed 17 September

  8. Virtualization (2013) http://en.wikipedia.org/wiki/Virtualization. Accessed 17 September 2013

  9. Full virtualization (2013) http://en.wikipedia.org/wiki/Full_virtualization. Accessed 17 September 2013

  10. Para virtualization (2013) http://en.wikipedia.org/wiki/Paravirtualization. Accessed 17 September 2013

  11. Xen (2013) http://www.xen.org. Accessed 17 September 2013

  12. KVM (2013) http://www.linux-kvm.org/page/Main_Page. Accessed 17 September 2013

  13. NVIDIA CUDA SDK (2013) http://developer.nvidia.com/cuda-cc-sdk-code-samples. Accessed 17 September 2013

  14. Download CUDA (2013) http://developer.nvidia.com/object/cuda.htm. Accessed 17 September 2013

  15. NVIDIA CUDA programming guide (2013) http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#abstract. Accessed 17 September 2013

  16. CUDA-wiki (2013) http://en.wikipedia.org/wiki/CUDA. Accessed 17 September 2013

  17. Lionetti FV, McCulloch AD, Baden SB (2010) Source-to-source optimization of CUDA C for GPU accelerated cardiac cell modeling. In: Euro-par 2010—parallel processing. Lecture notes in computer science, vol 6271, pp 38–49

    Google Scholar 

  18. Jung S (2009) Parallelized pairwise sequence alignment using CUDA on multiple GPUs. BMC Bioinform 10(Suppl 7):A3

    Article  Google Scholar 

  19. Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380

    Article  Google Scholar 

  20. OpenCL (2013) http://www.khronos.org/opencl. Accessed 17 September 2013

  21. OpenCL-wiki (2013) http://en.wikipedia.org/wiki/OpenCL. Accessed 17 September 2013

  22. Harvey MJ, De Fabritiis G (2011) Swan: a tool for porting CUDA programs to OpenCL. Comput Phys Commun 182(4):1093–1099

    Article  Google Scholar 

  23. QEMU (2013) http://wiki.qemu.org/Main_Page. Accessed 17 September 2013

  24. VirtualBox (2013) https://www.virtualbox.org. Accessed 17 September 2013

  25. Lo C-TD, Qian K (2010) Green computing methodology for next generation computing scientists. In: Proceedings of IEEE 34th annual computer software and applications conference, pp 250–251

    Google Scholar 

  26. Zhong B, Feng M, Lung C-H (2010) A green computing based architecture comparison and analysis. In: Proceedings of the 2010 IEEE/ACM int’l conference on green computing and communications & int’l conference on cyber, physical and social computing (GREENCOM-CPSCOM’10), pp 386–391

    Chapter  Google Scholar 

  27. Duato J, Peña AJ, Silla F, Mayo R, Quintana-Ortí ES (2010) RCUDA: reducing the number of GPUbased accelerators in high performance clusters. In: Proceedings of the 2010 international conference on high performance computing & simulation (HPCS 2010), June 2010, pp 224–231

    Chapter  Google Scholar 

  28. Duato J, Pena AJ, Silla F, Fernandez JC, Mayo R, Quintana-Orti ES (2011) Enabling CUDA acceleration within virtual machines using rCUDA. In: Proceedings of 18th international conference on high performance computing 2010 (HiPC), pp 1–10

    Google Scholar 

  29. Duato J, Peña AJ, Silla F, Mayo R, Quintana-Orti ES (2011) Performance of CUDA virtualized remote GPUs in high performance clusters. In: Proceedings of international conference on parallel processing (ICPP), September 2011, pp 365–374

    Google Scholar 

  30. Shi L, Chen H, Sun J (2009) VCUDA: GPU accelerated high performance computing in virtual machines. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS’09), pp 1–11

    Google Scholar 

  31. Gupta V, Gavrilovska A, Schwan K, Kharche H, Tolia N, Talwar V, Ranganathan P (2009) GViM: GPU-accelerated virtual machines. In: 3rd workshop on system-level virtualization for high performance computing. ACM, NY, USA, pp 17–24

    Google Scholar 

  32. Giunta G, Montella R, Agrillo G, Coviello G (2010) A GPGPU transparent virtualization component for high performance computing clouds. In: Ambra PD, Guarracino M, Talia D (eds) Euro-Par 2010—parallel processing. Lecture notes in computer science, vol 6271. Springer, Berlin, pp 379–391

    Google Scholar 

  33. Front and back ends (2013) http://en.wikipedia.org/wiki/Front_and_back_ends. Accessed 17 September 2013

  34. VMGL (2013) http://sysweb.cs.toronto.edu/vmgl. Accessed 17 September 2013

  35. Amit N, Ben-Yehuda M, Yassour B-A (2012) IOMMU: strategies for mitigating the IOTLB bottleneck. In: Computer architecture. Lecture notes in computer science, vol 6161, pp 256–274

    Chapter  Google Scholar 

  36. NVIDIA Telsa C1060 computing processor (2012) http://www.nvidia.com/object/product_tesla_c1060_us.html. Accessed 12 May 2012

  37. NVIDIA quadro NVS 295 (2012) http://www.nvidia.com.tw/object/product_quadro_nvs_295_tw.html. Accessed 12 May 2012

  38. NVIDIA Telsa C2050 computing processor (2013) http://www.nvidia.com.tw/object/product_tesla_C2050_C2070_tw.html. Accessed 17 September 2013

  39. CentOS (2013) http://www.centos.org. Accessed 17 September 2013

  40. Lagar-Cavilla HA, Tolia N, Satyanarayanan M, de Lara E (2007) VMM-independent graphics acceleration. In: Proceedings of the 3rd international conference on virtual execution environments (VEE’07). ACM, New York, pp 33–43

    Chapter  Google Scholar 

  41. Yang CT, Huang CL, Lin CF (2010) Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters. Comput Phys Commun 182(1):266–269

    Article  Google Scholar 

  42. Yang CT, Huang CL, Lin CF, Chang TC (2010) Hybrid parallel programming on GPU clusters. In: Proceedings of international symposium on parallel and distributed processing with applications (ISPA), September 2010, pp 142–147

    Chapter  Google Scholar 

  43. Yang CT, Chang TC, Wang HY, Chu WCC, Chang CH (2011) Performance comparison with OpenMP parallelization for multi-core systems. In: Proceedings 2011 IEEE 9th international symposium on parallel and distributed processing with applications (ISPA), pp 232–237

    Chapter  Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Science Council, Taiwan, ROC, under grant numbers NSC 102-2218-E-029-002, NSC 101-2218-E-029-004, and NSC 102-2622-E-029-005-CC3. This work also supported in part by Tunghai University, Taiwan ROC, under grant number GREEnS 04-2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao-Tung Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, CT., Liu, JC., Wang, HY. et al. Implementation of GPU virtualization using PCI pass-through mechanism. J Supercomput 68, 183–213 (2014). https://doi.org/10.1007/s11227-013-1034-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-1034-4

Keywords

Navigation