Machine Learning Using Virtualized GPUs in Cloud Environments

Kurkure, Uday; Sivaraman, Hari; Vu, Lan

doi:10.1007/978-3-319-67630-2_41

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10524))

Included in the following conference series:

International Conference on High Performance Computing

1984 Accesses
4 Citations

Abstract

Using graphic processing units (GPU) to accelerate machine learning applications has become a focus of high performance computing (HPC) in recent years. In cloud environments, many different cloud-based GPU solutions have been introduced to seamlessly and securely use GPU resources without sacrificing their performance benefits. Among them are two main approaches: using direct pass-through technologies available on hypervisors and using virtual GPU technologies introduced by GPU vendors. In this paper, we present a performance study of these two GPU virtualization solutions for machine learning in the cloud. We evaluate the advantages and disadvantages of each solution and introduce new findings of their performance impact on machine learning applications in different real-world use-case scenarios. We also examine the benefits of virtual GPUs for machine learning alone and for machine learning applications running together with other GPU-based applications like 3D-graphics on the same server with multiple GPUs to better leverage computing resources. Based on our experimental results benchmarking machine learning applications developed with TensorFlow, we discuss the scaling from one to multiple GPUs and compare the performance between two virtual GPU solutions. Finally, we show that mixing machine learning and other GPU-based workloads can help to reduce combined execution time as compared to running these workloads sequentially.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

CUDA Virtualization and Remoting for GPGPU Based Acceleration Offloading at the Edge

How Deep Learning Model Architecture and Software Stack Impacts Training Performance in the Cloud

Optimizing GPU-Enhanced HPC System and Cloud Procurements for Scientific Workloads

References

Díaz, M., Martín, C., Rubio, B.: State-of-the-art, challenges, and open issues in the integration of internet of things and cloud computing. J. Netw. Comput. Appl. 67, 99–117 (2016). doi:10.1016/j.jnca.2016.01.010
Article Google Scholar
Canny, J., Zhao, H.: Big Data analytics with small footprint—squaring the cloud. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 95–103 (2013)
Google Scholar
Jouppi, N., et al.: Datacenter performance analysis of a tensor processing unit. In: Proceedings of 44th International Symposium on Computer Architecture, Toronto, Canada (June 26, 2017)
Google Scholar
Qiu, J., Wu, Q., Ding, G., Xu, Y., Feng,S.: A Survey of Machine Learning for Big Data Processing. J. Adv. Sig. Process. (2016). doi:10.1186/s13634-016-0355-x
VMware Directpath I/O, https://communities.vmware.com/docs/DOC-11089
NVIDIA GRID virtual GPU technology, http://www.nvidia.com/object/grid-technology.html
AMD Virtualization Solution, http://www.amd.com/en-us/solutions/professional/virtualization
Bittman, T., Dawson, P., Warrilow, M.: Magic Quadrant for x86 Server Virtualization Infrastructure. In: Gartner Research Report, 3 August (2016)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)
Google Scholar
Docker Containers Performance in VMware vSphere, https://blogs.vmware.com/performance/2014/10/docker-containers-performance-vmware-vsphere.html
Vu, L., Sivaraman, H., Bidarkar, R.: GPU Virtualization for High Performance General Purpose Computing on the ESX hypervisor. In: Proceedings of the 22nd High Performance Computing Symposium (2014)
Google Scholar
Big Data Performance on vSphere 6, http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/bigdata-perf-vsphere6.pdf
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent Neural Network Regularization. arXiv:1409.2329 (2014)
Taylor, A., Marcus, M., Santorini, B.: The penn treebank: an overview. In: Abeille, A. (ed.) Treebanks: the State of the Art in Syntactically Annotated Corpora. Kluwer (2003)
Google Scholar
Tensorflow Homepage, https://www.tensorflow.org
Walters, J.P., Younge, A.J., Kang, D.I., Yao, K.T., Kang, M., Crago, S.P., Fox, G.C.: GPU passthrough performance: a comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications. In: Proceedings of 2014 IEEE 7th International Conference on Cloud Computing (2014)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Multiple Layers of Features from Tiny Images, https://www.cs.toronto.edu/~kriz/cifar.html
Pandey, A., Vu, L., Puthiyaveettil, V., Sivaraman, H., Kurkure, U., Bappanadu, A.: An automation framework for benchmarking and optimizing performance of remote desktops in the cloud. In: To appear in Proceedings of the 2017 International Conference on High Performance Computing & Simulation (2017)
Google Scholar
SPECapc for 3ds Max (2015), https://www.spec.org/gwpg/apc.static/max2015info.html

Download references

Acknowledgements

The authors would like to thank Josh Simons, Na Zhang, Julie Brodeur, Aravind Bappanadu, and Bruce Herndon for their support for this project.

Author information

Authors and Affiliations

VMware, Palo Alto, CA, 94304, USA
Uday Kurkure, Hari Sivaraman & Lan Vu

Authors

Uday Kurkure
View author publications
You can also search for this author in PubMed Google Scholar
Hari Sivaraman
View author publications
You can also search for this author in PubMed Google Scholar
Lan Vu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Uday Kurkure .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Hamburg, Germany
Julian M. Kunkel
TITECH, Tokyo, Japan
Rio Yokota
Department of Computer Science, University of Delaware, Newark, Delaware, USA
Michela Taufer
Lawrence Berkeley National Laboratory, Berkeley, California, USA
John Shalf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurkure, U., Sivaraman, H., Vu, L. (2017). Machine Learning Using Virtualized GPUs in Cloud Environments. In: Kunkel, J., Yokota, R., Taufer, M., Shalf, J. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10524. Springer, Cham. https://doi.org/10.1007/978-3-319-67630-2_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-67630-2_41
Published: 20 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67629-6
Online ISBN: 978-3-319-67630-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Machine Learning Using Virtualized GPUs in Cloud Environments

Abstract

Access this chapter

Similar content being viewed by others

CUDA Virtualization and Remoting for GPGPU Based Acceleration Offloading at the Edge

How Deep Learning Model Architecture and Software Stack Impacts Training Performance in the Cloud

Optimizing GPU-Enhanced HPC System and Cloud Procurements for Scientific Workloads

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Machine Learning Using Virtualized GPUs in Cloud Environments

Abstract

Access this chapter

Similar content being viewed by others

CUDA Virtualization and Remoting for GPGPU Based Acceleration Offloading at the Edge

How Deep Learning Model Architecture and Software Stack Impacts Training Performance in the Cloud

Optimizing GPU-Enhanced HPC System and Cloud Procurements for Scientific Workloads

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation