article

Disaggregated GPU Acceleration for Serverless Applications

Authors:
Henrique Fingler

University of Texas at Austin, Austin, TX, USA

University of Texas at Austin, Austin, TX, USA
View Profile

,
Zhiting Zhu

University of Texas at Austin, Austin, TX, USA

University of Texas at Austin, Austin, TX, USA
View Profile

,
Esther Yoon

University of Texas at Austin, Austin, TX, USA

University of Texas at Austin, Austin, TX, USA
View Profile

,
Zhipeng Jia

University of Texas at Austin, Austin, TX, USA

University of Texas at Austin, Austin, TX, USA
View Profile

,
Emmett Witchel

University of Texas at Austin, Austin, TX, USA

University of Texas at Austin, Austin, TX, USA
View Profile

,
Christopher J. Rossbach

University of Texas at Austin, Austin, TX, USA

University of Texas at Austin, Austin, TX, USA
View Profile

Authors Info & Claims

ACM SIGOPS Operating Systems Review Volume 57 Issue 1June 2023pp 10–20https://doi.org/10.1145/3606557.3606560

Published:28 June 2023Publication History

ACM SIGOPS Operating Systems Review

Abstract

Serverless platforms have been attracting applications from traditional platforms because infrastructure management responsibilities are shifted from users to providers. Many applications well-suited to serverless environments could leverage GPU acceleration to enhance their performance. Unfortunately, current serverless platforms do not expose GPUs to serverless applications.

References

ArcFace. (Accessed: October 2021).Google Scholar
Best practices for GPU-accelerated instances. (Accessed: May, 2023).Google Scholar
Deploy GPU-enabled container instance - Azure Container Instances | Microsoft Learn. (Accessed: May, 2023).Google Scholar
End-to-End Solutions for AI/ML Workloads | VMware. (Accessed: October, 2021).Google Scholar
NVIDIA GRID. (Accessed: October 2021).Google Scholar
OpenFaaS - Serverless Functions Made Simple. (Accessed: January 2021).Google Scholar
ShahinSHH/COVID-CT-MD : A COVID-19 CT Scan Dataset Applicable in Machine Learning and Deep Learning. (Accessed: October, 2021).Google Scholar
Underutilizing Cloud Computing Resources. (Accessed: October 2021).Google Scholar
M. Amaral, Jordà Polo, David Carrera, N. Gonzalez, Chih-Chieh Yang, Alessandro Morari, Bruce D. D'Amora, A. Youssef, and M. Steinder. Drmaestro: orchestrating disaggregated resources on virtualized datacenters. Journal of Cloud Computing, 10:1--20, 2021.Google Scholar
Zhihao Bai, Zhen Zhang, Yibo Zhu, and Xin Jin. Pipeswitch: Fast pipelined context switching for deep learning applications. In 14th USENIX OSDI 2020, pages 499--514. USENIX Association, November 2020.Google Scholar
Chandra Chekuri and Sanjeev Khanna. On multidimensional packing problems. In Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, pages 185--194. Citeseer, 1999.Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR 09. IEEE, 2009.Google ScholarCross Ref
Jiankang Deng, Jia Guo, Xue Niannan, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In CVPR, 2019.Google ScholarCross Ref
Jiankang Deng, Jia Guo, Zhou Yuxiang, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. Retinaface: Single-stage dense face localisation in the wild. In arxiv, 2019.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.Google Scholar
K. M. Diab, M. M. Rafique, and M. Hefeeda. Dynamic sharing of gpus in cloud systems. In 2013 IEEE ISPA, Workshops and Phd Forum, pages 947--954, 2013.Google Scholar
Yaozu Dong, Xiaowei Yang, Jianhui Li, Guangdeng Liao, Kun Tian, and Haibing Guan. High Performance Network Virtualization with SR-IOV. Journal of Parallel and Distributed Computing, 72(11):1471--1480, 2012.Google ScholarDigital Library
Yaozu Dong, Zhao Yu, and Greg Rose. SR-IOV Networking in Xen: Architecture, Design and Implementation. In Workshop on I/O Virtualization, 2008.Google Scholar
Micah Dowty and Jeremy Sugerman. GPU virtualization on VMware's hosted I/O architecture. ACM SIGOPS Operating Systems Review, 43(3):73--82, 2009.Google ScholarDigital Library
Dong Du, Tianyi Yu, Yubin Xia, Binyu Zang, Guanglu Yan, Chenggang Qin, Qixuan Wu, and Haibo Chen. Catalyzer: Sub-millisecond startup for serverless comGoogle Scholar
José Duato, Antonio J. Pena, Federico Silla, Juan C. Fernandez, Rafael Mayo, and Enrique S. Quintana-Orti. Enabling CUDA Acceleration Within Virtual Machines Using rCUDA. In Proceedings of the 2011 18th HIPC, pages 1--10, Washington, DC, USA, 2011. IEEE Computer Society.Google Scholar
Henrique Fingler, Zhiting Zhu, Esther Yoon, Zhipeng Jia, EmmettWitchel, and Christopher J. Rossbach. Dgsf: Disaggregated gpus for serverless functions. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 739--750, 2022.Google ScholarCross Ref
G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A gpgpu transparent virtualization component for high performance computing clouds. Euro-Par 2010-Parallel Processing, pages 379--391, 2010.Google ScholarCross Ref
Anubhav Guleria, J Lakshmi, and Chakri Padala. Quadd: Quantifying accelerator disaggregated datacenter efficiency. In 2019 IEEE 12th International CLOUD, pages 349--357, 2019.Google ScholarCross Ref
Fan Guo, Yongkun Li, John C. S. Lui, and Yinlong Xu. Dcuda: Dynamic gpu scheduling with live migration support. In Proceedings of the ACM SoCC, page 114--125, New York, NY, USA, 2019. Association for Computing Machinery.Google Scholar
Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. GViM: GPU-accelerated Virtual Machines. In Proceedings of the 3rd ACM Workshop HPCVirt, pages 17--24, New York, NY, USA, 2009. ACM.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE CVPR, pages 770--778, 2016.Google ScholarCross Ref
B. Hu and C. J. Rossbach. Altis: Modernizing gpgpu benchmarks. In 2020 IEEE ISPASS, pages 1--11, 2020.Google ScholarCross Ref
Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07--49, University of Massachusetts, Amherst, October 2007.Google Scholar
Paras Jain, Xiangxi Mo, Ajay Jain, Harikaran Subbaraj, Rehan Sohail Durrani, Alexey Tumanov, Joseph Gonzalez, and Ion Stoica. Dynamic space-time scheduling for GPU inference. In Thirty-second Conference on Neural Information Processing Systems, 2018.Google Scholar
Tahereh Javaheri, Morteza Homayounfar, Zohreh Amoozgar, Reza Reiazi, Fatemeh Homayounieh, Engy Abbas, Azadeh Laali, Amir Reza Radmard, Mohammad Hadi Gharib, Seyed Ali Javad Mousavi, Omid Ghaemi, Rosa Babaei, Hadi Karimi Mobin, Mehdi Hosseinzadeh, Rana Jahanban-Esfahlan, Khaled Seidi, Mannudeep K. Kalra, Guanglan Zhang, L. T. Chitkushev, Benjamin Haibe-Kains, Reza Malekzadeh, and Reza Rawassizadeh. Covidctnet: an open-source deep learning approach to diagnose covid-19 using small cohort of ct images. npj Digital Medicine, 4(1), December 2021.Google Scholar
Hee Seung Jo, Myung Ho Lee, and Dong Hoon Choi. Gpu virtualization using PCI direct pass-through. In Information, Communication and Engineering, volume 311 of Applied Mechanics and Materials, pages 15--19. Trans Tech Publications Ltd, 5 2013.Google Scholar
Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. Occupy the cloud: Distributed computing for the 99%. In Proceedings SoCC 2017, pages 445--451, New York, NY, USA, 2017. ACM.Google ScholarDigital Library
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. Indatacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 1--12, June 2017.Google ScholarDigital Library
Jaewook Kim, Tae Joon Jun, Daeyoun Kang, Dohyeun Kim, and Daeyoung Kim. Gpu enabled serverless computing framework. In 2018 26th Euromicro International Conference on Parallel, Distributed and Networkbased Processing (PDP), pages 533--540, 2018.Google ScholarCross Ref
U. Kurkure, H. Sivaraman, and L. Vu. Virtualized gpus in high performance datacenters. In 2018 HPCS, pages 887--894, 2018.Google ScholarCross Ref
Kuan-Ching Li,Keunsoo Kim,WonW. Ro, Tien-Hsiung Weng, Che-Lun Hung, Chen-Hao Ku, Albert Cohen, andGoogle Scholar
Anup Mohan, Harshad Sane, Kshitij Doshi, Saikrishna Edupuganti, Naren Nayak, and Vadim Sukhomlinov. Agile cold starts for scalable serverless. In 11th USENIX HotCloud 19, Renton, WA, July 2019. USENIX Association.Google Scholar
Diana M. Naranjo, Sebastián Risco, Carlos de Alfonso, Alfonso Pérez, Ignacio Blanquer, and Germán Moltó. Accelerated serverless computing based on gpu virtualization. Journal of Parallel and Distributed Computing, 139:32--42, 2020.Google ScholarDigital Library
Bo Peng, Haozhong Zhang, Jianguo Yao, Yaozu Dong, Yu Xu, and Haibing Guan. MDev-NVMe: a NVMe storage virtualization solution with mediated pass-through. In 2018 USENIX ATC, pages 665--676, 2018.Google Scholar
Javier Prades and Federico Silla. Gpu-job migration: The rcuda case. IEEE Transactions on Parallel and Distributed Systems, 30(12):2718--2729, 2019.Google ScholarCross Ref
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100, 000+ questions for machine comprehension of text. CoRR, abs/1606.05250, 2016.Google Scholar
Vignesh T. Ravi, Michela Becchi, Gagan Agrawal, and Srimat Chakradhar. Supporting gpu sharing in cloud environments with a transparent runtime consolidation framework. In Proceedings of the 20th HPDC, page 217--228, New York, NY, USA, 2011. Association for Computing Machinery.Google ScholarDigital Library
Carlos Reaño, Antonio J. Peña, Federico Silla, José Duato, Rafael Mayo, and Enrique S. Quintana-Ortí. CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution. 20th Annual International Conference on High Performance Computing, 0:1--10, 2012.Google Scholar
Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. Mlperf inference benchmark, 2019.Google Scholar
Mehdi Sheikhalishahi, Richard M. Wallace, Lucio Grandinetti, José Luis Vazquez-Poletti, and Francesca Guerriero. A multi-dimensional job scheduling. Future Generation Computer Systems, 54:123--131, 2016.Google ScholarDigital Library
Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines. IEEE Trans. Comput., 61(6):804--816, June 2012.Google ScholarDigital Library
Jike Song, Zhiyuan Lv, and Kevin Tian. KVMGT: a Full GPU Virtualization Solution. In KVM Forum, volume 2014, 2014.Google Scholar
State of the cloud report. https://www.rightscale.com/lp/state-of-the-cloud. (Accessed: January, 2021).Google Scholar
Yusuke Suzuki, Hiroshi Yamada, Shinpei Kato, and Kenji Kono. Gloop: An event-driven runtime for consolidating gpgpu applications. In Proceedings SoCC 2017, page 80--93, New York, NY, USA, 2017. Association for Computing Machinery.Google Scholar
Kun Tian, Yaozu Dong, and David Cowperthwaite. A Full GPU Virtualization Solution with Mediated Pass- Through. In 2014 USENIX ATC, pages 121--132. USENIX Association, June 2014.Google Scholar
Alexey Tumanov, James Cipar, Gregory R. Ganger, and Michael A. Kozuch. Alsched: Algebraic scheduling of mixed workloads in heterogeneous clouds. In Proceedings of the Third ACM Symposium on Cloud Computing, New York, NY, USA, 2012. Association for Computing Machinery.Google ScholarDigital Library
Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A. Kozuch, Mor Harchol-Balter, and Gregory R. Ganger. Tetrisched: Global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In Proceedings of the Eleventh ACM European Conference in Computer Systems (EuroSys), New York, NY, USA, 2016. Association for Computing Machinery.Google ScholarDigital Library
Lan Vu, Hari Sivaraman, and Rishi Bidarkar. GPU Virtualization for High Performance General Purpose Computing on the ESX Hypervisor. In Proceedings of HPC Symposium, pages 2:1--2:8, 2014.Google Scholar
Lei Xia, Jack Lange, Peter Dinda, and Chang Bae. Investigating virtual passthrough I/O on commodity devices. ACM SIGOPS Operating Systems Review, 43(3):83--94, 2009. 19Google ScholarDigital Library
Shucai Xiao, Pavan Balaji, James Dinan, Qian Zhu, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, and Wu-chun Feng. Transparent accelerator migration in a virtualized GPU environment. In Proceedings of the 12th IEEE/ACM CCGrid, pages 124--131, 2012.Google ScholarDigital Library
Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. Gandiva: Introspective cluster scheduling for deep learning. In 13th USENIX 2018 OSDI, pages 595--610, Carlsbad, CA, October 2018. USENIX Association.Google Scholar
Mengting Yan, Paul Castro, Perry Cheng, and Vatche Ishakian. Building a chatbot with serverless computing. In Proceedings of the 1st MOTA, New York, NY, USA, 2016. Association for Computing Machinery.Google ScholarDigital Library
Shuo Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. Wider face: A face detection benchmark. In 2016 IEEE CVPR, pages 5525--5533, 2016.Google Scholar
Hangchen Yu, Arthur Michener Peters, Amogh Akshintala, and Christopher J. Rossbach. AvA: Accelerated virtualization of accelerators. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 807-- 825. ACM, 2020.Google ScholarDigital Library
Hangchen Yu and Christopher J Rossbach. Full Virtualization for GPUs Reconsidered. In 14th WDDD, ISCA, 2017.Google Scholar
Peifeng Yu and Mosharaf Chowdhury. Fine-grained gpu sharing primitives for deep learning applications. In I. Dhillon, D. Papailiopoulos, and V. Sze, editors, PLMR 20, volume 2, pages 98--111, 2020.Google Scholar

Recommendations

Docker for Serverless Applications: Containerize and orchestrate functions using OpenFaas, OpenWhisk, and Fn
Read More
Accelerated serverless computing based on GPU virtualization
Abstract
This paper introduces a platform to support serverless computing for scalable event-driven data processing that features a multi-level elasticity approach combined with virtualization of GPUs. The platform supports the execution of ...
Highlights
- Several GPU virtualization approaches are assessed in an on-premises serverless computing scenario.
Read More
DistributedFaaS: Execution of Containerized Serverless Applications in Multi-Cloud Infrastructures
CLOSER 2019: Proceedings of the 9th International Conference on Cloud Computing and Services Science

The adoption of cloud computing is continuously increasing due to the attractiveness of low costs of infrastructure acquisition and maintenance, as well as having virtually infinite resources available for scaling applications based on demand. Due to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGOPS Operating Systems Review Volume 57, Issue 1
SIGOPS
June 2023
53 pages
ISSN:0163-5980
DOI:10.1145/3606557
Editors:
Christopher J. Rossbach
Stop D9500, Austin, TX
,
Kishore Pusukuri
1910 Nantucket Cir Santa Clara, CA, USA
,
Harvard D. Johansen
The Arctic University of Norway
,
John Chandy
University of Connecticut
,
Antônio Fröhlich
Dederal Univ. of Santa Catarina
,
Ashvin Goel
University of Toronto
Issue’s Table of Contents
Copyright © 2023 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2023
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 356
  Total Downloads
- Downloads (Last 12 months)356
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Disaggregated GPU Acceleration for Serverless Applications

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Recommendations

Docker for Serverless Applications: Containerize and orchestrate functions using OpenFaas, OpenWhisk, and Fn

Accelerated serverless computing based on GPU virtualization

DistributedFaaS: Execution of Containerized Serverless Applications in Multi-Cloud Infrastructures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Disaggregated GPU Acceleration for Serverless Applications

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Recommendations

Docker for Serverless Applications: Containerize and orchestrate functions using OpenFaas, OpenWhisk, and Fn

Accelerated serverless computing based on GPU virtualization

DistributedFaaS: Execution of Containerized Serverless Applications in Multi-Cloud Infrastructures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media