skip to main content
10.1145/3317550.3321423acmconferencesArticle/Chapter ViewAbstractPublication PageshotosConference Proceedingsconference-collections
research-article

Automatic Virtualization of Accelerators

Published:13 May 2019Publication History

ABSTRACT

Applications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore's Law. These technological trends are incompatible. Cloud applications run on virtual platforms, but traditional I/O virtualization techniques have not provided production-ready solutions for accelerators. As a result, cloud providers expose accelerators by using pass-through techniques which dedicate physical devices to individual guests. The multi-tenancy that drives their business is lost as a consequence.

This paper proposes automatic generation of virtual accelerator stacks to address the fundamental tradeoffs between virtualization properties and techniques for accelerators. AvA (Automatic Virtualization of Accelerators) re-purposes a para-virtual I/O stack design based on API remoting to present virtual accelerator APIs to guest VMs. Conventional wisdom is that API remoting sacrifices interposition and compatibility. AvA forwards invocations over hypervisor-managed transport to recover interposition. AvA compensates for lost compatibility by automatically generating guest libraries, drivers, hypervisor-level schedulers, and API servers. AvA supports pluggable transport layers, allowing VMs to use disaggregated accelerators. With AvA, a single developer could virtualize a core subset of OpenCL at near-native performance in just a few days.

References

  1. {n. d.}. Amazon EC2 F1 Instances. https://aws.amazon.com/ec2/instance-types/fl. Accessed: 2018-04.Google ScholarGoogle Scholar
  2. {n. d.}. Amazon EC2 Instance Types. https://aws.amazon.com/ec2/instance-types. Accessed: 2018-04.Google ScholarGoogle Scholar
  3. {n. d.}. AMD multiuser GPU. http://www.amd.com/Documents/Multiuser-GPU-White-Paper.pdf. Accessed: 2018-07.Google ScholarGoogle Scholar
  4. {n. d.}. Bitfusion: The Elastic AI Infrastructure for Multi-Cloud. https://bitfusion.io/. April. 2019.Google ScholarGoogle Scholar
  5. {n. d.}. BrainChip Accelerator. https://www.brainchipinc.com/products/brainchip-accelerator. Accessed: 2019-04.Google ScholarGoogle Scholar
  6. {n. d.}. Cerebras Systems. https://www.cerebras.net/. Accessed: 2019-04.Google ScholarGoogle Scholar
  7. {n. d.}. Five Reasons Machine Learning Is Moving to the Cloud. https://www.entrepreneur.com/article/300713. {Published Nov 3, 2017}.Google ScholarGoogle Scholar
  8. {n. d.}. Genomics in the Cloud. https://aws.amazon.com/health/genomics. Accessed: 2018-08.Google ScholarGoogle Scholar
  9. {n. d.}. Google Cloud GPU. https://cloud.google.com/gpu. Accessed: 2018-04.Google ScholarGoogle Scholar
  10. {n. d.}. Google Cloud Machine Learning Engine. https://cloud.google.com/ml-engine. Accessed: 2018-04.Google ScholarGoogle Scholar
  11. {n. d.}. Google Cloud TPU. https://cloud.google.com/tpu. Accessed: 2019-01.Google ScholarGoogle Scholar
  12. {n. d.}. Graphcore Inc. https://www.graphcore.ai. Accessed: 2018-04.Google ScholarGoogle Scholar
  13. {n. d.}. Habana Labs. https://habana.ai/. Accessed: 2019-04.Google ScholarGoogle Scholar
  14. {n. d.}. Intel Movidius Myriad 2 VPU. https://www.movidius.com/solutions/vision-processing-unit. Accessed: 2018-04.Google ScholarGoogle Scholar
  15. {n. d.}. Intel QuickAssist Technology. https://01.org/intel-quickassist-technology. Accessed: 2019-04.Google ScholarGoogle Scholar
  16. {n. d.}. Nervana Neural Network Processor. https://ai.intel.com/nervana-nnp. Accessed: 2019-01.Google ScholarGoogle Scholar
  17. {n. d.}. NVIDIA GPU Cloud. https://www.nvidia.com/en-us/gpu-cloud. Accessed: 2018-04.Google ScholarGoogle Scholar
  18. {n. d.}. Olympus Cloud Services. https://olympustech.com.au/services/cloud-services. Accessed: 2018-04.Google ScholarGoogle Scholar
  19. {n. d.}. Project Fiddle: Fast and Efficient Infrastructure for Distributed Deep Learning. https://www.microsoft.com/en-us/research/project/fiddle. Accessed: 2019-04.Google ScholarGoogle Scholar
  20. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J Rossbach, and Onur Mutlu. 2018. Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors. ACM SIGOPS Operating Systems Review 51, 1 (2018), 27--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Rachata Ausavarungnirun, Vance Miller, Joshua Landgraf, Saugata Ghose, Jayneel Gandhi, Adwait Jog, Christopher J Rossbach, and Onur Mutlu. 2018. Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency. In ACM SIGPLAN Notices, Vol. 53. ACM, 503--518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Barak, T. Ben-Nun, E. Levy, and A. Shiloh. 2010. A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 IEEE International Conference on. 1--7.Google ScholarGoogle Scholar
  24. Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. Ieee, 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, et al. 2018. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro 38, 2 (2018), 8--20.Google ScholarGoogle ScholarCross RefCross Ref
  26. Micah Dowty and Jeremy Sugerman. 2009. GPU Virtualization on VMware's Hosted I/O Architecture. SIGOPS Oper. Syst. Rev. 43, 3 (July 2009), 73--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jose Duato, Antonio J. Pena, Federico Silla, Juan C. Fernandez, Rafael Mayo, and Enrique S. Quintana-Orti. 2011. Enabling CUDA acceleration within virtual machines using rCUDA. In Proceedings of the 2011 18th International Conference on High Performance Computing (HIPC '11). IEEE Computer Society, Washington, DC, USA, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Giunta, R. Montella, G. Agrillo, and G. Coviello. 2010. A GPGPU Transparent Virtualization Component for High Performance Computing Clouds. Euro-Par 2010-Parallel Processing (2010), 379--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM, 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alex Herrera. 2014. NVIDIA GRID: Graphics accelerated VDI with the visual performance of a workstation. Nvidia Corp (2014).Google ScholarGoogle Scholar
  31. JAIN Jayant, Anirban Sengupta, Rick Lund, Raju Koganty, Xinhua Hong, and Mohan Parthasarathy. 2018. Configuring and operating a XaaS model in a datacenter. US Patent App. 10/129,077.Google ScholarGoogle Scholar
  32. Feng Ji, Heshan Lin, and Xiaosong Ma. 2013. RSVM: a region-based software virtual memory for GPU. In Parallel Architectures and Compilation Techniques (PACT), 2013 22nd International Conference on. IEEE, 269--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jens Kehne, Jonathan Metter, and Frank Bellosa. 2015. GPUswap: Enabling oversubscription of GPU memory through transparent swapping. In ACM SIGPLAN Notices, Vol. 50. ACM, 65--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. 2012. SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. In Proceedings of the 26th ACM international conference on Supercomputing. ACM, 341--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Patrick Kutch. 2011. PCI-SIG SR-IOV primer: An introduction to SR-IOV technology. Intel application note (2011), 321211--002.Google ScholarGoogle Scholar
  36. Tyng-Yeu Liang and Yu-Wei Chang. 2011. GridCuda: A Grid-Enabled CUDA Programming Toolkit. In Advanced Information Networking and Applications (WAINA), 2011 IEEE Workshops of International Conference on. 141--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 308--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Pengyu Nie, Junyi Jessy Li, Sarfraz Khurshid, Raymond Mooney, and Milos Gligoric. 2018. Natural Language Processing and Program Analysis for Supporting Todo Comments as Software Evolves. In Proceedings of the AAAI Workshop of Statistical Modeling of Natural Software Corpora.Google ScholarGoogle Scholar
  39. Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based pipelined query processing engine. In Proceedings of the 2016 International Conference on Management of Data. ACM, 1935--1950. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sébastien Pinneterre, Spyros Chiotakis, Michele Paolino, and Daniel Raho. 2018. vFPGAmanager: A virtualization framework for orchestrated FPGA accelerator sharing in 5G cloud environments. In 2018 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  41. C. Reano, A. J. Pena, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Orti. 2012. CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution. 20th Annual International Conference on High Performance Computing 0 (2012), 1--10.Google ScholarGoogle Scholar
  42. Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. LegoOS: A Disseminated, Distributed {OS } for Hardware Resource Disaggregation. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 69--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2012. vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines. IEEE Trans. Comput. 61, 6 (June 2012), 804--816. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jike Song, Zhiyuan Lv, and Kevin Tian. 2014. KVMGT: A full GPU virtualization solution. In KVM Forum, Vol. 2014.Google ScholarGoogle Scholar
  45. Baohua Sun, Daniel Liu, Leo Yu, Jay Li, Helen Liu, Wenhan Zhang, and Terry Torng. 2018. MRAM Co-designed Processing-in-Memory CNN Accelerator for Mobile and IoT Applications. arXiv preprint arXiv:1811.12179 (2018).Google ScholarGoogle Scholar
  46. Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: Why not virtualizing GPUs at the hypervisor?. In USENIX Annual Technical Conference. 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Michael M. Swift, Brian N. Bershad, and Henry M. Levy. 2003. Improving the Reliability of Commodity Operating Systems. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03). ACM, New York, NY, USA, 207--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /* iComment: Bugs or bad comments?*/. In ACM SIGOPS Operating Systems Review, Vol. 41. ACM, 145--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Lin Tan, Ding Yuan, and Yuanyuan Zhou. 2007. Hotcomments: how to make program comments more useful?. In HotOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. 2011. aComment: mining annotations from comments and code to detect interrupt related concurrency bugs. In Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A Full GPU Virtualization Solution with Mediated Pass-through. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 121--132. http://dl.acm.org/citation.cfm?id=2643634.2643647 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Chia-Che Tsai, Bhushan Jain, Nafees Ahmed Abdul, and Donald E Porter. 2016. A Study of Modern Linux API Usage and Compatibility: What to Support when You'Re Supporting. In ACM European Conference in Computer Systems (EuroSys). London, United Kingdom. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Duy Viet Vu, Oliver Sander, Timo Sandmann, Steffen Baehr, Jan Heidelberger, and Juergen Becker. 2014. Enabling partial reconfiguration for coprocessors in mixed criticality multicore systems using PCI Express Single-Root I/O Virtualization. In ReConFigurable Computing and FPGAs (ReConFig), 2014 International Conference on. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  54. Lan Vu, Hari Sivaraman, and Rishi Bidarkar. 2014. GPU Virtualization for High Performance General Purpose Computing on the ESX Hypervisor. In Proceedings of the High Performance Computing Symposium (HPC '14). Society for Computer Simulation International, San Diego, CA, USA, Article 2, 8 pages. http://dl.acm.org/citation.cfm?id=2663510.2663512 Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Kaibo Wang, Xiaoning Ding, Rubao Lee, Shinpei Kato, and Xiaodong Zhang. 2014. GDM: device memory management for GPGPU computing. ACM SIGMETRICS Performance Evaluation Review 42, 1 (2014), 533--545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zhenning Wang, Jun Yang, Rami Melhem, Bruce Childers, Youtao Zhang, and Minyi Guo. 2016. Simultaneous multikernel GPU: Multitasking throughput processors via fine-grained sharing. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 358--369.Google ScholarGoogle ScholarCross RefCross Ref
  57. Hangchen Yu and Christopher J Rossbach. 2017. Full Virtualization for GPUs Reconsidered. In Proceedings of the Annual Workshop on Duplicating, Deconstructing, and Debunking.Google ScholarGoogle Scholar
  58. Jose Fernando Zazo, Sergio Lopez-Buedo, Yury Audzevich, and Andrew W Moore. 2015. A PCIe DMA engine to support the virtualization of 40 Gbps FPGA-accelerated network appliances. In ReConFigurable Computing and FPGAs (ReConFig), 2015 International Conference on. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  59. Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 20. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    HotOS '19: Proceedings of the Workshop on Hot Topics in Operating Systems
    May 2019
    227 pages
    ISBN:9781450367271
    DOI:10.1145/3317550

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 May 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader