ABSTRACT
Applications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore's Law. These technological trends are incompatible. Cloud applications run on virtual platforms, but traditional I/O virtualization techniques have not provided production-ready solutions for accelerators. As a result, cloud providers expose accelerators by using pass-through techniques which dedicate physical devices to individual guests. The multi-tenancy that drives their business is lost as a consequence.
This paper proposes automatic generation of virtual accelerator stacks to address the fundamental tradeoffs between virtualization properties and techniques for accelerators. AvA (Automatic Virtualization of Accelerators) re-purposes a para-virtual I/O stack design based on API remoting to present virtual accelerator APIs to guest VMs. Conventional wisdom is that API remoting sacrifices interposition and compatibility. AvA forwards invocations over hypervisor-managed transport to recover interposition. AvA compensates for lost compatibility by automatically generating guest libraries, drivers, hypervisor-level schedulers, and API servers. AvA supports pluggable transport layers, allowing VMs to use disaggregated accelerators. With AvA, a single developer could virtualize a core subset of OpenCL at near-native performance in just a few days.
- {n. d.}. Amazon EC2 F1 Instances. https://aws.amazon.com/ec2/instance-types/fl. Accessed: 2018-04.Google Scholar
- {n. d.}. Amazon EC2 Instance Types. https://aws.amazon.com/ec2/instance-types. Accessed: 2018-04.Google Scholar
- {n. d.}. AMD multiuser GPU. http://www.amd.com/Documents/Multiuser-GPU-White-Paper.pdf. Accessed: 2018-07.Google Scholar
- {n. d.}. Bitfusion: The Elastic AI Infrastructure for Multi-Cloud. https://bitfusion.io/. April. 2019.Google Scholar
- {n. d.}. BrainChip Accelerator. https://www.brainchipinc.com/products/brainchip-accelerator. Accessed: 2019-04.Google Scholar
- {n. d.}. Cerebras Systems. https://www.cerebras.net/. Accessed: 2019-04.Google Scholar
- {n. d.}. Five Reasons Machine Learning Is Moving to the Cloud. https://www.entrepreneur.com/article/300713. {Published Nov 3, 2017}.Google Scholar
- {n. d.}. Genomics in the Cloud. https://aws.amazon.com/health/genomics. Accessed: 2018-08.Google Scholar
- {n. d.}. Google Cloud GPU. https://cloud.google.com/gpu. Accessed: 2018-04.Google Scholar
- {n. d.}. Google Cloud Machine Learning Engine. https://cloud.google.com/ml-engine. Accessed: 2018-04.Google Scholar
- {n. d.}. Google Cloud TPU. https://cloud.google.com/tpu. Accessed: 2019-01.Google Scholar
- {n. d.}. Graphcore Inc. https://www.graphcore.ai. Accessed: 2018-04.Google Scholar
- {n. d.}. Habana Labs. https://habana.ai/. Accessed: 2019-04.Google Scholar
- {n. d.}. Intel Movidius Myriad 2 VPU. https://www.movidius.com/solutions/vision-processing-unit. Accessed: 2018-04.Google Scholar
- {n. d.}. Intel QuickAssist Technology. https://01.org/intel-quickassist-technology. Accessed: 2019-04.Google Scholar
- {n. d.}. Nervana Neural Network Processor. https://ai.intel.com/nervana-nnp. Accessed: 2019-01.Google Scholar
- {n. d.}. NVIDIA GPU Cloud. https://www.nvidia.com/en-us/gpu-cloud. Accessed: 2018-04.Google Scholar
- {n. d.}. Olympus Cloud Services. https://olympustech.com.au/services/cloud-services. Accessed: 2018-04.Google Scholar
- {n. d.}. Project Fiddle: Fast and Efficient Infrastructure for Distributed Deep Learning. https://www.microsoft.com/en-us/research/project/fiddle. Accessed: 2019-04.Google Scholar
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265--283. Google ScholarDigital Library
- Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J Rossbach, and Onur Mutlu. 2018. Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors. ACM SIGOPS Operating Systems Review 51, 1 (2018), 27--44. Google ScholarDigital Library
- Rachata Ausavarungnirun, Vance Miller, Joshua Landgraf, Saugata Ghose, Jayneel Gandhi, Adwait Jog, Christopher J Rossbach, and Onur Mutlu. 2018. Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency. In ACM SIGPLAN Notices, Vol. 53. ACM, 503--518. Google ScholarDigital Library
- A. Barak, T. Ben-Nun, E. Levy, and A. Shiloh. 2010. A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 IEEE International Conference on. 1--7.Google Scholar
- Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. Ieee, 44--54. Google ScholarDigital Library
- Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, et al. 2018. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro 38, 2 (2018), 8--20.Google ScholarCross Ref
- Micah Dowty and Jeremy Sugerman. 2009. GPU Virtualization on VMware's Hosted I/O Architecture. SIGOPS Oper. Syst. Rev. 43, 3 (July 2009), 73--82. Google ScholarDigital Library
- Jose Duato, Antonio J. Pena, Federico Silla, Juan C. Fernandez, Rafael Mayo, and Enrique S. Quintana-Orti. 2011. Enabling CUDA acceleration within virtual machines using rCUDA. In Proceedings of the 2011 18th International Conference on High Performance Computing (HIPC '11). IEEE Computer Society, Washington, DC, USA, 1--10. Google ScholarDigital Library
- G. Giunta, R. Montella, G. Agrillo, and G. Coviello. 2010. A GPGPU Transparent Virtualization Component for High Performance Computing Clouds. Euro-Par 2010-Parallel Processing (2010), 379--391. Google ScholarDigital Library
- Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM, 17--24. Google ScholarDigital Library
- Alex Herrera. 2014. NVIDIA GRID: Graphics accelerated VDI with the visual performance of a workstation. Nvidia Corp (2014).Google Scholar
- JAIN Jayant, Anirban Sengupta, Rick Lund, Raju Koganty, Xinhua Hong, and Mohan Parthasarathy. 2018. Configuring and operating a XaaS model in a datacenter. US Patent App. 10/129,077.Google Scholar
- Feng Ji, Heshan Lin, and Xiaosong Ma. 2013. RSVM: a region-based software virtual memory for GPU. In Parallel Architectures and Compilation Techniques (PACT), 2013 22nd International Conference on. IEEE, 269--278. Google ScholarDigital Library
- Jens Kehne, Jonathan Metter, and Frank Bellosa. 2015. GPUswap: Enabling oversubscription of GPU memory through transparent swapping. In ACM SIGPLAN Notices, Vol. 50. ACM, 65--77. Google ScholarDigital Library
- J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. 2012. SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. In Proceedings of the 26th ACM international conference on Supercomputing. ACM, 341--352. Google ScholarDigital Library
- Patrick Kutch. 2011. PCI-SIG SR-IOV primer: An introduction to SR-IOV technology. Intel application note (2011), 321211--002.Google Scholar
- Tyng-Yeu Liang and Yu-Wei Chang. 2011. GridCuda: A Grid-Enabled CUDA Programming Toolkit. In Advanced Information Networking and Applications (WAINA), 2011 IEEE Workshops of International Conference on. 141--146. Google ScholarDigital Library
- Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 308--317. Google ScholarDigital Library
- Pengyu Nie, Junyi Jessy Li, Sarfraz Khurshid, Raymond Mooney, and Milos Gligoric. 2018. Natural Language Processing and Program Analysis for Supporting Todo Comments as Software Evolves. In Proceedings of the AAAI Workshop of Statistical Modeling of Natural Software Corpora.Google Scholar
- Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based pipelined query processing engine. In Proceedings of the 2016 International Conference on Management of Data. ACM, 1935--1950. Google ScholarDigital Library
- Sébastien Pinneterre, Spyros Chiotakis, Michele Paolino, and Daniel Raho. 2018. vFPGAmanager: A virtualization framework for orchestrated FPGA accelerator sharing in 5G cloud environments. In 2018 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE, 1--5.Google ScholarCross Ref
- C. Reano, A. J. Pena, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Orti. 2012. CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution. 20th Annual International Conference on High Performance Computing 0 (2012), 1--10.Google Scholar
- Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. LegoOS: A Disseminated, Distributed {OS } for Hardware Resource Disaggregation. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 69--87. Google ScholarDigital Library
- Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2012. vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines. IEEE Trans. Comput. 61, 6 (June 2012), 804--816. Google ScholarDigital Library
- Jike Song, Zhiyuan Lv, and Kevin Tian. 2014. KVMGT: A full GPU virtualization solution. In KVM Forum, Vol. 2014.Google Scholar
- Baohua Sun, Daniel Liu, Leo Yu, Jay Li, Helen Liu, Wenhan Zhang, and Terry Torng. 2018. MRAM Co-designed Processing-in-Memory CNN Accelerator for Mobile and IoT Applications. arXiv preprint arXiv:1811.12179 (2018).Google Scholar
- Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: Why not virtualizing GPUs at the hypervisor?. In USENIX Annual Technical Conference. 109--120. Google ScholarDigital Library
- Michael M. Swift, Brian N. Bershad, and Henry M. Levy. 2003. Improving the Reliability of Commodity Operating Systems. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03). ACM, New York, NY, USA, 207--222. Google ScholarDigital Library
- Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /* iComment: Bugs or bad comments?*/. In ACM SIGOPS Operating Systems Review, Vol. 41. ACM, 145--158. Google ScholarDigital Library
- Lin Tan, Ding Yuan, and Yuanyuan Zhou. 2007. Hotcomments: how to make program comments more useful?. In HotOS. Google ScholarDigital Library
- Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. 2011. aComment: mining annotations from comments and code to detect interrupt related concurrency bugs. In Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 11--20. Google ScholarDigital Library
- Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A Full GPU Virtualization Solution with Mediated Pass-through. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 121--132. http://dl.acm.org/citation.cfm?id=2643634.2643647 Google ScholarDigital Library
- Chia-Che Tsai, Bhushan Jain, Nafees Ahmed Abdul, and Donald E Porter. 2016. A Study of Modern Linux API Usage and Compatibility: What to Support when You'Re Supporting. In ACM European Conference in Computer Systems (EuroSys). London, United Kingdom. Google ScholarDigital Library
- Duy Viet Vu, Oliver Sander, Timo Sandmann, Steffen Baehr, Jan Heidelberger, and Juergen Becker. 2014. Enabling partial reconfiguration for coprocessors in mixed criticality multicore systems using PCI Express Single-Root I/O Virtualization. In ReConFigurable Computing and FPGAs (ReConFig), 2014 International Conference on. IEEE, 1--6.Google ScholarCross Ref
- Lan Vu, Hari Sivaraman, and Rishi Bidarkar. 2014. GPU Virtualization for High Performance General Purpose Computing on the ESX Hypervisor. In Proceedings of the High Performance Computing Symposium (HPC '14). Society for Computer Simulation International, San Diego, CA, USA, Article 2, 8 pages. http://dl.acm.org/citation.cfm?id=2663510.2663512 Google ScholarDigital Library
- Kaibo Wang, Xiaoning Ding, Rubao Lee, Shinpei Kato, and Xiaodong Zhang. 2014. GDM: device memory management for GPGPU computing. ACM SIGMETRICS Performance Evaluation Review 42, 1 (2014), 533--545. Google ScholarDigital Library
- Zhenning Wang, Jun Yang, Rami Melhem, Bruce Childers, Youtao Zhang, and Minyi Guo. 2016. Simultaneous multikernel GPU: Multitasking throughput processors via fine-grained sharing. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 358--369.Google ScholarCross Ref
- Hangchen Yu and Christopher J Rossbach. 2017. Full Virtualization for GPUs Reconsidered. In Proceedings of the Annual Workshop on Duplicating, Deconstructing, and Debunking.Google Scholar
- Jose Fernando Zazo, Sergio Lopez-Buedo, Yury Audzevich, and Andrew W Moore. 2015. A PCIe DMA engine to support the virtualization of 40 Gbps FPGA-accelerated network appliances. In ReConFigurable Computing and FPGAs (ReConFig), 2015 International Conference on. IEEE, 1--6.Google ScholarCross Ref
- Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 20. Google ScholarDigital Library
Recommendations
AvA: Accelerated Virtualization of Accelerators
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsApplications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore's Law. These trends are in conflict: cloud applications run on virtual platforms, but existing virtualization techniques ...
Improving machine virtualisation with 'hotplug memory'
Machine virtualisation is a key technology for server consolidation and on-demand server provisioning. To support this trend, it is essential to improve the performance of virtualisation software and enable the efficient running of many virtual ...
Comments