research-article

Automatic Virtualization of Accelerators

Authors:

Arthur M. Peters,

Amogh Akshintala,

Christopher J. RossbachAuthors Info & Claims

HotOS '19: Proceedings of the Workshop on Hot Topics in Operating Systems

Pages 58 - 65

https://doi.org/10.1145/3317550.3321423

Published: 13 May 2019 Publication History

Abstract

Applications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore's Law. These technological trends are incompatible. Cloud applications run on virtual platforms, but traditional I/O virtualization techniques have not provided production-ready solutions for accelerators. As a result, cloud providers expose accelerators by using pass-through techniques which dedicate physical devices to individual guests. The multi-tenancy that drives their business is lost as a consequence.

This paper proposes automatic generation of virtual accelerator stacks to address the fundamental tradeoffs between virtualization properties and techniques for accelerators. AvA (Automatic Virtualization of Accelerators) re-purposes a para-virtual I/O stack design based on API remoting to present virtual accelerator APIs to guest VMs. Conventional wisdom is that API remoting sacrifices interposition and compatibility. AvA forwards invocations over hypervisor-managed transport to recover interposition. AvA compensates for lost compatibility by automatically generating guest libraries, drivers, hypervisor-level schedulers, and API servers. AvA supports pluggable transport layers, allowing VMs to use disaggregated accelerators. With AvA, a single developer could virtualize a core subset of OpenCL at near-native performance in just a few days.

References

[1]

{n. d.}. Amazon EC2 F1 Instances. https://aws.amazon.com/ec2/instance-types/fl. Accessed: 2018-04.

[2]

{n. d.}. Amazon EC2 Instance Types. https://aws.amazon.com/ec2/instance-types. Accessed: 2018-04.

[3]

{n. d.}. AMD multiuser GPU. http://www.amd.com/Documents/Multiuser-GPU-White-Paper.pdf. Accessed: 2018-07.

[4]

{n. d.}. Bitfusion: The Elastic AI Infrastructure for Multi-Cloud. https://bitfusion.io/. April. 2019.

[5]

{n. d.}. BrainChip Accelerator. https://www.brainchipinc.com/products/brainchip-accelerator. Accessed: 2019-04.

[6]

{n. d.}. Cerebras Systems. https://www.cerebras.net/. Accessed: 2019-04.

[7]

{n. d.}. Five Reasons Machine Learning Is Moving to the Cloud. https://www.entrepreneur.com/article/300713. {Published Nov 3, 2017}.

[8]

{n. d.}. Genomics in the Cloud. https://aws.amazon.com/health/genomics. Accessed: 2018-08.

[9]

{n. d.}. Google Cloud GPU. https://cloud.google.com/gpu. Accessed: 2018-04.

[10]

{n. d.}. Google Cloud Machine Learning Engine. https://cloud.google.com/ml-engine. Accessed: 2018-04.

[11]

{n. d.}. Google Cloud TPU. https://cloud.google.com/tpu. Accessed: 2019-01.

[12]

{n. d.}. Graphcore Inc. https://www.graphcore.ai. Accessed: 2018-04.

[13]

{n. d.}. Habana Labs. https://habana.ai/. Accessed: 2019-04.

[14]

{n. d.}. Intel Movidius Myriad 2 VPU. https://www.movidius.com/solutions/vision-processing-unit. Accessed: 2018-04.

[15]

{n. d.}. Intel QuickAssist Technology. https://01.org/intel-quickassist-technology. Accessed: 2019-04.

[16]

{n. d.}. Nervana Neural Network Processor. https://ai.intel.com/nervana-nnp. Accessed: 2019-01.

[17]

{n. d.}. NVIDIA GPU Cloud. https://www.nvidia.com/en-us/gpu-cloud. Accessed: 2018-04.

[18]

{n. d.}. Olympus Cloud Services. https://olympustech.com.au/services/cloud-services. Accessed: 2018-04.

[19]

{n. d.}. Project Fiddle: Fast and Efficient Infrastructure for Distributed Deep Learning. https://www.microsoft.com/en-us/research/project/fiddle. Accessed: 2019-04.

[20]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, Vol. 16. 265--283.

Digital Library

[21]

Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J Rossbach, and Onur Mutlu. 2018. Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors. ACM SIGOPS Operating Systems Review 51, 1 (2018), 27--44.

Digital Library

[22]

Rachata Ausavarungnirun, Vance Miller, Joshua Landgraf, Saugata Ghose, Jayneel Gandhi, Adwait Jog, Christopher J Rossbach, and Onur Mutlu. 2018. Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency. In ACM SIGPLAN Notices, Vol. 53. ACM, 503--518.

Digital Library

[23]

A. Barak, T. Ben-Nun, E. Levy, and A. Shiloh. 2010. A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 IEEE International Conference on. 1--7.

[24]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. Ieee, 44--54.

Digital Library

[25]

Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, et al. 2018. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro 38, 2 (2018), 8--20.

[26]

Micah Dowty and Jeremy Sugerman. 2009. GPU Virtualization on VMware's Hosted I/O Architecture. SIGOPS Oper. Syst. Rev. 43, 3 (July 2009), 73--82.

Digital Library

[27]

Jose Duato, Antonio J. Pena, Federico Silla, Juan C. Fernandez, Rafael Mayo, and Enrique S. Quintana-Orti. 2011. Enabling CUDA acceleration within virtual machines using rCUDA. In Proceedings of the 2011 18th International Conference on High Performance Computing (HIPC '11). IEEE Computer Society, Washington, DC, USA, 1--10.

Digital Library

[28]

G. Giunta, R. Montella, G. Agrillo, and G. Coviello. 2010. A GPGPU Transparent Virtualization Component for High Performance Computing Clouds. Euro-Par 2010-Parallel Processing (2010), 379--391.

Digital Library

[29]

Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM, 17--24.

Digital Library

[30]

Alex Herrera. 2014. NVIDIA GRID: Graphics accelerated VDI with the visual performance of a workstation. Nvidia Corp (2014).

[31]

JAIN Jayant, Anirban Sengupta, Rick Lund, Raju Koganty, Xinhua Hong, and Mohan Parthasarathy. 2018. Configuring and operating a XaaS model in a datacenter. US Patent App. 10/129,077.

[32]

Feng Ji, Heshan Lin, and Xiaosong Ma. 2013. RSVM: a region-based software virtual memory for GPU. In Parallel Architectures and Compilation Techniques (PACT), 2013 22nd International Conference on. IEEE, 269--278.

Digital Library

[33]

Jens Kehne, Jonathan Metter, and Frank Bellosa. 2015. GPUswap: Enabling oversubscription of GPU memory through transparent swapping. In ACM SIGPLAN Notices, Vol. 50. ACM, 65--77.

Digital Library

[34]

J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. 2012. SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. In Proceedings of the 26th ACM international conference on Supercomputing. ACM, 341--352.

Digital Library

[35]

Patrick Kutch. 2011. PCI-SIG SR-IOV primer: An introduction to SR-IOV technology. Intel application note (2011), 321211--002.

[36]

Tyng-Yeu Liang and Yu-Wei Chang. 2011. GridCuda: A Grid-Enabled CUDA Programming Toolkit. In Advanced Information Networking and Applications (WAINA), 2011 IEEE Workshops of International Conference on. 141--146.

Digital Library

[37]

Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 308--317.

Digital Library

[38]

Pengyu Nie, Junyi Jessy Li, Sarfraz Khurshid, Raymond Mooney, and Milos Gligoric. 2018. Natural Language Processing and Program Analysis for Supporting Todo Comments as Software Evolves. In Proceedings of the AAAI Workshop of Statistical Modeling of Natural Software Corpora.

[39]

Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based pipelined query processing engine. In Proceedings of the 2016 International Conference on Management of Data. ACM, 1935--1950.

Digital Library

[40]

Sébastien Pinneterre, Spyros Chiotakis, Michele Paolino, and Daniel Raho. 2018. vFPGAmanager: A virtualization framework for orchestrated FPGA accelerator sharing in 5G cloud environments. In 2018 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE, 1--5.

[41]

C. Reano, A. J. Pena, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Orti. 2012. CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution. 20th Annual International Conference on High Performance Computing 0 (2012), 1--10.

[42]

Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. LegoOS: A Disseminated, Distributed {OS } for Hardware Resource Disaggregation. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 69--87.

Digital Library

[43]

Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2012. vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines. IEEE Trans. Comput. 61, 6 (June 2012), 804--816.

Digital Library

[44]

Jike Song, Zhiyuan Lv, and Kevin Tian. 2014. KVMGT: A full GPU virtualization solution. In KVM Forum, Vol. 2014.

[45]

Baohua Sun, Daniel Liu, Leo Yu, Jay Li, Helen Liu, Wenhan Zhang, and Terry Torng. 2018. MRAM Co-designed Processing-in-Memory CNN Accelerator for Mobile and IoT Applications. arXiv preprint arXiv:1811.12179 (2018).

[46]

Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: Why not virtualizing GPUs at the hypervisor?. In USENIX Annual Technical Conference. 109--120.

Digital Library

[47]

Michael M. Swift, Brian N. Bershad, and Henry M. Levy. 2003. Improving the Reliability of Commodity Operating Systems. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03). ACM, New York, NY, USA, 207--222.

Digital Library

[48]

Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /* iComment: Bugs or bad comments?*/. In ACM SIGOPS Operating Systems Review, Vol. 41. ACM, 145--158.

Digital Library

[49]

Lin Tan, Ding Yuan, and Yuanyuan Zhou. 2007. Hotcomments: how to make program comments more useful?. In HotOS.

Digital Library

[50]

Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. 2011. aComment: mining annotations from comments and code to detect interrupt related concurrency bugs. In Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 11--20.

Digital Library

[51]

Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A Full GPU Virtualization Solution with Mediated Pass-through. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 121--132. http://dl.acm.org/citation.cfm?id=2643634.2643647

Digital Library

[52]

Chia-Che Tsai, Bhushan Jain, Nafees Ahmed Abdul, and Donald E Porter. 2016. A Study of Modern Linux API Usage and Compatibility: What to Support when You'Re Supporting. In ACM European Conference in Computer Systems (EuroSys). London, United Kingdom.

Digital Library

[53]

Duy Viet Vu, Oliver Sander, Timo Sandmann, Steffen Baehr, Jan Heidelberger, and Juergen Becker. 2014. Enabling partial reconfiguration for coprocessors in mixed criticality multicore systems using PCI Express Single-Root I/O Virtualization. In ReConFigurable Computing and FPGAs (ReConFig), 2014 International Conference on. IEEE, 1--6.

[54]

Lan Vu, Hari Sivaraman, and Rishi Bidarkar. 2014. GPU Virtualization for High Performance General Purpose Computing on the ESX Hypervisor. In Proceedings of the High Performance Computing Symposium (HPC '14). Society for Computer Simulation International, San Diego, CA, USA, Article 2, 8 pages. http://dl.acm.org/citation.cfm?id=2663510.2663512

Digital Library

[55]

Kaibo Wang, Xiaoning Ding, Rubao Lee, Shinpei Kato, and Xiaodong Zhang. 2014. GDM: device memory management for GPGPU computing. ACM SIGMETRICS Performance Evaluation Review 42, 1 (2014), 533--545.

Digital Library

[56]

Zhenning Wang, Jun Yang, Rami Melhem, Bruce Childers, Youtao Zhang, and Minyi Guo. 2016. Simultaneous multikernel GPU: Multitasking throughput processors via fine-grained sharing. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 358--369.

[57]

Hangchen Yu and Christopher J Rossbach. 2017. Full Virtualization for GPUs Reconsidered. In Proceedings of the Annual Workshop on Duplicating, Deconstructing, and Debunking.

[58]

Jose Fernando Zazo, Sergio Lopez-Buedo, Yury Audzevich, and Andrew W Moore. 2015. A PCIe DMA engine to support the virtualization of 40 Gbps FPGA-accelerated network appliances. In ReConFigurable Computing and FPGAs (ReConFig), 2015 International Conference on. IEEE, 1--6.

[59]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 20.

Digital Library

Cited By

Chen LLiu SWang CMa HQiao YWang ZWu CLu YFeng XCui HLu SXu HGavrilovska ATerry D(2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691943
Stojilović MRasmussen KRegazzoni FTahoori MTessier R(2023)A Visionary Look at the Security of Reconfigurable Cloud ComputingProceedings of the IEEE10.1109/JPROC.2023.3330729111:12(1548-1571)Online publication date: Dec-2023
https://doi.org/10.1109/JPROC.2023.3330729
Diamantopoulos DRinglein BWeiss BLantz MAbel F(2023)Composability of Cloud Accelerators in Virtual World Simulations2023 IEEE 16th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD60044.2023.00038(272-274)Online publication date: Jul-2023
https://doi.org/10.1109/CLOUD60044.2023.00038
Show More Cited By

Recommendations

AvA: Accelerated Virtualization of Accelerators
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

Applications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore's Law. These trends are in conflict: cloud applications run on virtual platforms, but existing virtualization techniques ...
Improving machine virtualisation with 'hotplug memory'

Machine virtualisation is a key technology for server consolidation and on-demand server provisioning. To support this trend, it is essential to improve the performance of virtualisation software and enable the efficient running of many virtual ...
Accelerating Virtualization of Accelerators

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HotOS '19: Proceedings of the Workshop on Hot Topics in Operating Systems

May 2019

227 pages

ISBN:9781450367271

DOI:10.1145/3317550

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

HotOS '19

Sponsor:

SIGOPS

HotOS '19: Workshop on Hot Topics in Operating Systems

May 13 - 15, 2019

Bertinoro, Italy

Upcoming Conference

HOTOS '25

Sponsor:
sigops

Workshop on Hot Topics in Operating Systems

May 14 - 16, 2025

Banff , AB , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
454
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen LLiu SWang CMa HQiao YWang ZWu CLu YFeng XCui HLu SXu HGavrilovska ATerry D(2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691943
Stojilović MRasmussen KRegazzoni FTahoori MTessier R(2023)A Visionary Look at the Security of Reconfigurable Cloud ComputingProceedings of the IEEE10.1109/JPROC.2023.3330729111:12(1548-1571)Online publication date: Dec-2023
https://doi.org/10.1109/JPROC.2023.3330729
Diamantopoulos DRinglein BWeiss BLantz MAbel F(2023)Composability of Cloud Accelerators in Virtual World Simulations2023 IEEE 16th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD60044.2023.00038(272-274)Online publication date: Jul-2023
https://doi.org/10.1109/CLOUD60044.2023.00038
Glamočanin OShrivastava SYao JArdo NPayer MStojilović M(2023)Instruction-Level Power Side-Channel Leakage Evaluation of Soft-Core CPUs on Shared FPGAsJournal of Hardware and Systems Security10.1007/s41635-023-00135-17:2-3(72-99)Online publication date: 4-Oct-2023
https://doi.org/10.1007/s41635-023-00135-1
Mehrabi ASorin DLee B(2022)Spatiotemporal Strategies for Long-Term FPGA Resource Management2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS55109.2022.00026(198-209)Online publication date: May-2022
https://doi.org/10.1109/ISPASS55109.2022.00026
Diamantopoulos DPolig RRinglein BPurandare MWeiss BHagleitner CLantz MAbel F(2021)Acceleration-as-a-µService: A Cloud-native Monte-Carlo Option Pricing Engine on CPUs, GPUs and Disaggregated FPGAs2021 IEEE 14th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD53861.2021.00096(726-729)Online publication date: Sep-2021
https://doi.org/10.1109/CLOUD53861.2021.00096
Yu HPeters AAkshintala ARossbach CLarus JCeze LStrauss K(2020)AvA: Accelerated Virtualization of AcceleratorsProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378466(807-825)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378466

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten