skip to main content
10.1145/3590140.3629115acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Kernel-as-a-Service: A Serverless Programming Model for Heterogeneous Hardware Accelerators

Published:27 November 2023Publication History

ABSTRACT

With the slowing of Moore's law and decline of Dennard scaling, computing systems increasingly rely on specialized hardware accelerators in addition to general-purpose compute units. Increased hardware heterogeneity necessitates disaggregating applications into workflows of fine-grained tasks that run on a diverse set of CPUs and accelerators. Current accelerator delivery models cannot support such applications efficiently, as (1) the overhead of managing accelerators erases performance benefits for fine-grained tasks; (2) exclusive accelerator use per task leads to underutilization; and (3) specialization increases complexity for developers.

We propose adopting concepts from Function-as-a-Service (FaaS), which has solved these challenges for general-purpose CPUs in cloud computing. Kernel-as-a-Service (KaaS) is a novel serverless programming model for generic compute accelerators that aids heterogeneous workflows by combining the ease-of-use of higher-level abstractions with the performance of low-level hand-tuned code. We evaluate KaaS with a focus on the breadth of the idea and its generality to diverse architectures rather than on an in-depth implementation for a single accelerator. Using proof-of-concept prototypes, we show that this programming model provides performance, performance efficiency, and ease-of-use benefits across a diverse range of compute accelerators. Despite increased levels of abstraction, when compared to a naive accelerator implementation, KaaS reduces completion times for fine-grained tasks by up to 96.0% (GPU), 68.4% (FPGA), 98.6% (TPU), and 34.9% (QPU) in our experiments.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Google Research. Retrieved May 10, 2023 from https://www.tensorflow.org/Google ScholarGoogle Scholar
  2. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2022. tf.nn.conv2d / TensorFlow v2.11.0. Google Research. Retrieved December 1, 2022 from https://www.tensorflow.org/api_docs/python/tf/nn/conv2dGoogle ScholarGoogle Scholar
  3. Giovanni Agosta, William Fornaciari, Giuseppe Massari, Anna Pupykina, Federico Reghenzani, and Michele Zanella. 2018. Managing Heterogeneous Resources in HPC Systems. In Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (Manchester, United Kingdom) (PARAM-DITAM '18). Association for Computing Machinery (ACM), New York, NY, USA, 7--12. https://doi.org/10.1145/3183767.3183769Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. AMD Xilinx. 2022. Pynq: Python Productivity for Zynq. Retrieved December 2, 2022 from http://pynq.ioGoogle ScholarGoogle Scholar
  5. Krste Asanović. 2014. FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (Santa Clara, CA, USA) (FAST '14). USENIX, Berkeley, CA, USA.Google ScholarGoogle Scholar
  6. Jose Antonio Ayala-Barbosa and Paul Erick Mendez-Monroy. 2022. A new preemptive task scheduling framework for heterogeneous embedded systems. In Proceedings of the 2022 8th International Conference on Computer Technology Applications (Vienna, Austria) (ICCTA '22). Association for Computing Machinery (ACM), New York, NY, USA, 77--84. https://doi.org/10.1145/3543712.3543756Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ioana Baldini, Perry Cheng, Stephen J. Fink, Nick Mitchell, Vinod Muthusamy, Rodric Rabbah, Philippe Suter, and Olivier Tardieu. 2017. The Serverless Trilemma: Function Composition for Serverless Computing. In Proceedings of the 2017 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Vancouver, BC, Canada) (Onward! 2017). Association for Computing Machinery (ACM), New York, NY, USA, 89--103. https://doi.org/10.1145/3133850.3133855Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kirk M. Bresniker, Paolo Faraboschi, Avi Mendelson, Dejan Milojicic, Timothy Roscoe, and Robert N. M. Watson. 2019. Rack-Scale Capabilities: Fine-Grained Protection for Large-Scale Memories. Computer 52, 2 (Feb. 2019), 52--62. https://doi.org/10.1109/MC.2018.2888769Google ScholarGoogle ScholarCross RefCross Ref
  9. Carlos Campos, Richard Elvira, Juan J. Gómez Rodríguez, José MM Montiel, and Juan D. Tardós. 2021. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multimap SLAM. IEEE Transactions on Robotics 37, 6 (May 2021), 1874--1890. https://doi.org/10.1109/TRO.2021.3075644Google ScholarGoogle ScholarCross RefCross Ref
  10. Yudong Cao, Jonathan Romero, Jonathan P. Olson, Matthias Degroote, Peter D. Johnson, Mária Kieferová, Ian D. Kivlichan, Tim Menke, Borja Peropadre, Nicolas P. D. Sawaya, Sukin Sim, Libor Veis, and Alán Aspuru-Guzik. 2019. Quantum Chemistry in the Age of Quantum Computing. Chemical Reviews 119, 19 (Aug. 2019), 10856--10915. https://doi.org/10.1021/acs.chemrev.8b00803Google ScholarGoogle ScholarCross RefCross Ref
  11. Adrian Caulfield, Paolo Costa, and Monia Ghobadi. 2018. Beyond SmartNICs: Towards a Fully Programmable Cloud. In Proceedings of the 19th International Conference on High Performance Switching and Routing (Bucharest, Romania) (HPSR '18). IEEE, New York, NY, USA, 1--6. https://doi.org/10.1109/HPSR.2018.8850757Google ScholarGoogle ScholarCross RefCross Ref
  12. Ryan Chard, Yadu Babuji, Zhuozhao Li, Tyler Skluzacek, Anna Woodard, Ben Blaiszik, Ian Foster, and Kyle Chard. 2020. funcX: A Federated Function Serving Fabric for Science. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (Virtual Event, USA) (HPDC '20). Association for Computing Machinery (ACM), New York, NY, USA, 65--76. https://doi.org/10.1145/3369583.3392683Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Marcin Copik, Marcin Chrapek, Alexandru Calotoiu, and Torsten Hoefler. 2022. Software Resource Disaggregation for HPC with Serverless Computing. Technical Report. Scalable Parallel Computing Lab, ETH Zürich, Zurich, Switzerland.Google ScholarGoogle Scholar
  14. Marcin Copik, Konstantin Taranov, Alexandru Calotoiu, and Torsten Hoefler. 2023. rFaaS: Enabling High Performance Serverless with RDMA and Leases. In Proceedings of the 37th IEEE International Parallel & Distributed Processing Symposium (St. Petersburg, FL, USA) (IPDPDS '23). IEEE, New York, NY, USA.Google ScholarGoogle ScholarCross RefCross Ref
  15. Marco Cuturi and Mathieu Blondel. 2017. Soft-DTW: A Differentiable Loss Function for Time-Series. In Proceedings of the 34th International Conference on Machine Learning (Sydney, NSW, Australia) (ICML '17). Journal of Machine Learning Research, 894--903.Google ScholarGoogle Scholar
  16. Bradley Denby and Brandon Lucia. 2020. Orbital Edge Computing: Nanosatellite Constellations as a New Class of Computer System. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery (ACM), New York, NY, USA, 939--954. https://doi.org/10.1145/3373376.3378473Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Aditya Dhakal, Sameer G. Kulkarni, and K. K. Ramakrishnan. 2020. GSLICE: Controlled Spatial Sharing of GPUs for a Scalable Inference Platform. In Proceedings of the 11th ACM Symposium on Cloud Computing (Virtual Event, USA) (SoCC '20). Association for Computing Machinery (ACM), New York, NY, USA, 492--506. https://doi.org/10.1145/3419111.3421284Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Aditya Dhakal, Xukan Ran, Yunshu Wang, Jiasi Chen, and K. K. Ramakrishnan. 2022. SLAM-Share: Visual Simultaneous Localization and Mapping for Real-Time Multi-User Augmented Reality. In Proceedings of the 18th International Conference on Emerging Networking EXperiments and Technologies (Rome, Italy) (CoNEXT '22). Association for Computing Machinery (ACM), New York, NY, USA, 293--306. https://doi.org/10.1145/3555050.3569142Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dong Du, Qingyuan Liu, Xueqiang Jiang, Yubin Xia, Binyu Zang, and Haibo Chen. 2022. Serverless Computing on Heterogeneous Computers. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '22). Association for Computing Machinery (ACM), New York, NY, USA, 797--813. https://doi.org/10.1145/3503222.3507732Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nicolas Dube, Duncan Roweth, Paolo Faraboschi, and Dejan Milojicic. 2021. Future of HPC: The Internet of Workflows. IEEE Internet Computing 25, 5 (Aug. 2021), 26--34. https://doi.org/10.1109/MIC.2021.3103236Google ScholarGoogle ScholarCross RefCross Ref
  21. Jorge Ejarque, Rosa M. Badia, Loïc Albertin, Giovanni Aloisio, Enrico Baglione, Yolanda Becerra, Stefan Boschert, Julian R. Berlin, Alessandro D'Anca, Donatello Elia, et al. 2022. Enabling dynamic and intelligent workflows for HPC, data analytics, and AI convergence. Future generation computer systems 134 (Sept. 2022), 414--429. https://doi.org/10.1016/j.future.2022.04.014Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Donatello Elia, Sandro Fiore, and Giovanni Aloisio. 2021. Towards HPC and Big Data Analytics Convergence: Design and Experimental Evaluation of a HPDA Framework for eScience at Scale. IEEE Access 9 (May 2021), 73307--73326. https://doi.org/10.1109/ACCESS.2021.3079139Google ScholarGoogle ScholarCross RefCross Ref
  23. Kayvon Fatahalian, Jeremy Sugerman, and Pat Hanrahan. 2004. Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (Grenoble, France) (HWWS '04). Association for Computing Machinery (ACM), New York, NY, USA, 133--137. https://doi.org/10.1145/1058129.1058148Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Marcel Flottmann, Marc Eisoldt, Julian Gaal, Marc Rothmann, Marco Tassemeier, Thomas Wiemann, and Mario Porrmann. 2021. Energy-efficient FPGA-accelerated LiDAR-based SLAM for embedded robotics. In Proceedings of the 2021 International Conference on Field-Programmable Technology (Auckland, New Zealand) (ICFPT '21). IEEE, New York, NY, USA, 1--6. https://doi.org/10.1109/ICFPT52863.2021.9609934Google ScholarGoogle ScholarCross RefCross Ref
  25. Eitan Frachtenberg. 2021. Experience and Practice Teaching an Undergraduate Course on Diverse Heterogeneous Architectures. In Proceedings of the 2021 IEEE/ACM Ninth Workshop on Education for High Performance Computing (St. Louis, MO, USA) (EduHPC '21). IEEE, New York, NY, USA, 1--8. https://doi.org/10.1109/EduHPC54835.2021.00006Google ScholarGoogle ScholarCross RefCross Ref
  26. Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. 2020. Sparse GPU Kernels for Deep Learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, GA, USA) (SC '20). IEEE, New York, NY, USA, 1--14. https://doi.org/10.1109/SC41405.2020.00021Google ScholarGoogle ScholarCross RefCross Ref
  27. Rajesh Gandham, Yongpeng Zhang, Kenneth Esler, and Vincent Natoli. 2021. Improving GPU throughput of reservoir simulations using NVIDIA MPS and MIG. In Proceedings of the Fifth EAGE Workshop on High Performance Computing for Upstream (Online). European Association of Geoscientists & Engineers, Houten, The Netherlands, 1--5. https://doi.org/10.3997/2214-4609.2021612025Google ScholarGoogle ScholarCross RefCross Ref
  28. Andreas Gerstmayr, Ken McDonell, Lukas Berk, Mark Goodwin, Marko Myllynen, and Nathan Scott. 2022. Performance Co-Pilot. Red Hat, Inc. Retrieved October 1, 2022 from https://pcp.io/Google ScholarGoogle Scholar
  29. Nicholas Gordon, Kevin Pedretti, and John R. Lange. 2022. Porting the Kitten Lightweight Kernel Operating System to RISC-V. In Proceedings of the International Workshop on Runtime and Operating Systems for Supercomputers (Dallas, TX, USA) (ROSS '22). IEEE, New York, NY, USA, 1--7. https://doi.org/10.1109/ROSS56639.2022.00008Google ScholarGoogle ScholarCross RefCross Ref
  30. Jashwant Raj Gunasekaran, Prashanth Thinakaran, Nachiappan Chidambaram, Mahmut T. Kandemir, and Chita R. Das. 2020. Fifer: Tackling Underutilization in the Serverless Era. (Aug. 2020). arXiv:2008.12819Google ScholarGoogle Scholar
  31. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Las Vegas, NV, USA) (CVPR 2016). IEEE, New York, NY, USA, 770--778. https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle ScholarCross RefCross Ref
  32. Scott Hendrickson, Stephen Sturdevant, Tyler Harter, Venkateshwaran Venkataramani, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Serverless Computation with OpenLambda. In Proceedings of the 8th USENIX Workshop on Hot Topics in Cloud Computing (Denver, CO, USA) (HotCloud '16). USENIX Association, Berkeley, CA, USA.Google ScholarGoogle Scholar
  33. Hewlett Packard Enterprise. 2020. Enabling GPU as a Service -- A Cloud-Like Experience for GPU Infrastructure using Containers (Solution Brief). Retrieved September 11, 2023 from https://www.hpe.com/psnow/doc/a00075067enwGoogle ScholarGoogle Scholar
  34. Anahita Hosseinkhani and Behnam Ghavami. 2021. Improving Soft Error Reliability of FPGA-based Deep Neural Networks with Reduced Approximate TMR. In Proceedings of the 2021 11th International Conference on Computer Engineering and Knowledge (Mashhad, Iran) (ICCKE '21). IEEE, New York, NY, USA, 459--464. https://doi.org/10.1109/ICCKE54056.2021.9721442Google ScholarGoogle ScholarCross RefCross Ref
  35. Sitao Huang, Kun Wu, Hyunmin Jeong, Chengyue Wang, Deming Chen, and Wen-Mei Hwu. 2021. Pylog: An algorithm-centric python-based FPGA programming and synthesis flow. IEEE Trans. Comput. 70, 12 (Oct. 2021), 2015--2028. https://doi.org/10.1109/TC.2021.3123465Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. IBM Quantum. 2021. IBM Quantum Processor Types. Retrieved May 24, 2023 from https://quantum-computing.ibm.com/services/resources/docs/resources/manage/systems/processorsGoogle ScholarGoogle Scholar
  37. IBM Quantum. 2022. Qiskit. Retrieved December 2, 2022 from https://qiskit.org/Google ScholarGoogle Scholar
  38. Al Amjad Tawfiq Isstaif and Richard Mortier. 2023. Towards Latency-Aware Linux Scheduling for Serverless Workloads. In Proceedings of the 1st Workshop on SErverless Systems, Applications and MEthodologies (Rome, Italy) (SESAME '23). Association for Computing Machinery (ACM), New York, NY, USA, 19--26. https://doi.org/10.1145/3592533.3592807Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. 2019. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In Proceedings of the 2019 USENIX Annual Technical Conference (Renton, WA, USA) (ATC '19). USENIX Association, Berkeley, CA, USA, 947--960.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Fauzi Mohd Johar, Farah Ayuni Azmin, Mohamad Kadim Suaidi, Abdul Samad Shibghatullah, Badrul Hisham Ahmad, Siti Nadzirah Salleh, Mohamad Zoinol Abidin Abd Aziz, and Mahfuzah Md Shukor. 2013. A review of genetic algorithms and parallel genetic algorithms on graphics processing unit (GPU). In Proceedings of the 2013 International Conference on Control System, Computing and Engineering (Penang, Malaysia) (ICCSCE '13). IEEE, New York, NY, USA, 264--269. https://doi.org/10.1109/ICCSCE.2013.6719971Google ScholarGoogle ScholarCross RefCross Ref
  41. Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Yadwadkar, et al. 2019. Cloud Programming Simplified: A Berkeley View on Serverless Computing. Technical Report UCB/EECS-2019-3. EECS Department, University of California, Berkeley, Berkeley, CA, USA. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-3.htmlGoogle ScholarGoogle Scholar
  42. Norman P. Jouppi, Doe Hyun Yoon, George Kurian, Sheng Li, Nishant Patil, James Laudon, Cliff Young, and David Patterson. 2020. A Domain-Specific Supercomputer for Training Deep Neural Networks. Commun. ACM 63, 7 (June 2020), 67--78. https://doi.org/10.1145/3360307Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hamidreza Khaleghzadeh, Ziming Zhong, Ravi Reddy, and Alexey Lastovetsky. 2017. Out-of-core implementation for accelerator kernels on heterogeneous clouds. The Journal of Supercomputing 74, 2 (Sept. 2017), 551--568. https://doi.org/10.1007/s11227-017-2141-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Dario Korolija, Timothy Roscoe, and Gustavo Alonso. 2020. Do OS abstractions make sense on FPGAs?. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (Online) (OSDI '20). USENIX Association, Berkeley, CA, USA, 991--1010.Google ScholarGoogle Scholar
  45. Jörn Kuhlenkamp, Sebastian Werner, Maria C. Borges, Dominik Ernst, and Daniel Wenzel. 2020. Benchmarking Elasticity of FaaS Platforms as a Foundation for Objective-driven Design of Serverless Applications. In Proceedings of the 35th Annual ACM Symposium on Applied Computing (Brno, Czech Republic) (SAC '20). Association for Computing Machinery (ACM), New York, NY, USA, 1576--1585. https://doi.org/10.1145/3341105.3373948Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. 2015. Numba: A LLVM-Based Python JIT Compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC (Austin, TX, USA) (LLVM '15). Association for Computing Machinery (ACM), New York, NY, USA, 1--6. https://doi.org/10.1145/2833157.2833162Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Baolin Li, Tirthak Patel, Siddarth Samsi, Vijay Gadepally, and Devesh Tiwari. 2022. Using Multi-Instance GPU for Efficient Operation of Multi-Tenant GPU Clusters. (July 2022). arXiv:2207.11428Google ScholarGoogle Scholar
  48. Junfeng Li, Sameer G. Kulkarni, K. K. Ramakrishnan, and Dan Li. 2019. Understanding Open Source Serverless Platforms: Design Considerations and Performance. In Proceedings of the 5th International Workshop on Serverless Computing (Davis, CA, USA) (WoSC '19). Association for Computing Machinery (ACM), New York, NY, USA, 37--42. https://doi.org/10.1145/3366623.3368139Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Teng Li, Vikram K. Narayana, Esam El-Araby, and Tarek El-Ghazawi. 2011. GPU Resource Sharing and Virtualization on High Performance Computing Systems. In Proceedings of the 2011 International Conference on Parallel Processing (Taipei, Taiwan) (ICPP '11). IEEE, New York, NY, USA, 733--742. https://doi.org/10.1109/ICPP.2011.88Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Fabio Maschi, Dario Korolija, and Gustavo Alonso. 2023. Serverless FPGA: Work-In-Progress. In Proceedings of the 1st Workshop on SErverless Systems, Applications and MEthodologies (Rome, Italy) (SESAME '23). Association for Computing Machinery (ACM), New York, NY, USA, 1--4. https://doi.org/10.1145/3592533.3592804Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Anil Mathew, Vasilios Andrikopoulos, and Frank J. Blaauw. 2021. Exploring the cost and performance benefits of AWS Step Functions using a data processing pipeline. In Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing (Leicester, United Kingdom) (UCC '21). Association for Computing Machinery (ACM), New York, NY, USA, 1--10. https://doi.org/10.1145/3468737.3494084Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Dejan Milojicic, Paolo Faraboschi, Nicolas Dube, and Duncan Roweth. 2021. Future of HPC: Diversifying Heterogeneity. In Proceedings of the 2021 Design, Automation & Test in Europe Conference & Exhibition (Grenoble, France) (DATE '21). IEEE, New York, NY, USA, 276--281. https://doi.org/10.23919/DATE51398.2021.9474063Google ScholarGoogle ScholarCross RefCross Ref
  53. Diana M. Naranjo, Sebastián Risco, Carlos de Alfonso, Alfonso Pérez, Ignacio Blanquer, and Germán Moltó. 2020. Accelerated serverless computing based on GPU virtualization. J. Parallel and Distrib. Comput. 139 (May 2020), 32--42. https://doi.org/10.1016/j.jpdc.2020.01.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Anna Maria Nestorov, Josep Lluís Berral, Claudia Misale, Chen Wang, David Carrera, and Alaa Youssef. 2022. Floki: A Proactive Data Forwarding System for Direct Inter-Function Communication for Serverless Workflows. In Proceedings of the Eighth International Workshop on Container Technologies and Container Clouds (Quebec City, QC, Canada) (WoC '22). Association for Computing Machinery (ACM), New York, NY, USA, 13--18. https://doi.org/10.1145/3565384.3565890Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Sam Newman. 2015. Building Microservices. O'Reilly Media, Inc., Sebastopol, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Kim Nguyen and Sam Chung. 2021. Low Maintenance, Low Cost, Highly Secure, and Highly Manageable Serverless Solutions for Software Reverse Engineering. In Proceedings of the Conference on Information Systems Applied Research (Washington, DC, USA) (CONISAR '21). Information Systems and Computing Academic Professionals, 1--10.Google ScholarGoogle Scholar
  57. Kyndylan Nienhuis, Alexandre Joannou, Thomas Bauereiss, Anthony Fox, Michael Roe, Brian Campbell, Matthew Naylor, Robert M. Norton, Simon W. Moore, Peter G. Neumann, Ian Stark, Robert N. M. Watson, and Peter Sewell. 2020. Rigorous engineering for hardware security: Formal modelling and proof in the CHERI design and implementation process. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (San Francisco, CA, USA) (SP '20). IEEE, New York, NY, USA, 1003--1020. https://doi.org/10.1109/SP40000.2020.00055Google ScholarGoogle ScholarCross RefCross Ref
  58. NVIDIA. 2023. Multi-Process Service. Retrieved May 25, 2023 from https://docs.nvidia.com/deploy/mps/index.htmlGoogle ScholarGoogle Scholar
  59. NVIDIA. 2023. NVIDIA Multi-Instance GPU. Retrieved May 25, 2023 from https://www.nvidia.com/en-us/technologies/multi-instance-gpu/Google ScholarGoogle Scholar
  60. Jacob Pan. 2013. RAPL (Running Average Power Limit) driver. Intel Corporation. Retrieved December 2, 2022 from https://lwn.net/Articles/545745/Google ScholarGoogle Scholar
  61. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Neural Information Processing Systems Foundation, 8024--8035.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Nathan Pemberton. 2022. The Serverless Datacenter: Hardware and Software Techniques for Resource Disaggregation. Ph.D. Dissertation. University of California, Berkeley, Berkeley, CA, USA. Advisor(s) Randy Katz.Google ScholarGoogle Scholar
  63. Nathan Pemberton and Johann Schleier-Smith. 2019. The Serverless Data Center: Hardware Disaggregation Meets Serverless Computing. In Proceedings of the First Workshop on Resource Disaggregation (Providence, RI, USA) (WORD '19).Google ScholarGoogle Scholar
  64. Nathan Pemberton, Anton Zabreyko, Zhoujie Ding, Randy Katz, and Joseph Gonzalez. 2022. Kernel-as-a-Service: A Serverless Interface to GPUs. (Dec. 2022). arXiv:2212.08146Google ScholarGoogle Scholar
  65. Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. Love, Alán Aspuru-Guzik, and Jeremy L. O'brien. 2014. A variational eigenvalue solver on a photonic quantum processor. Nature communications 5, 1, Article 4213 (July 2014), 7 pages. https://doi.org/10.1038/ncomms5213Google ScholarGoogle ScholarCross RefCross Ref
  66. Murad Qasaimeh, Kristof Denolf, Jack Lo, Kees Vissers, Joseph Zambreno, and Phillip H. Jones. 2019. Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels. In Proceedings of the International 2019 IEEE International Conference on Embedded Software and Systems (Las Vegas, NV, USA) (ICESS '19). IEEE, New York, NY, USA, 1--8. https://doi.org/10.1109/ICESS.2019.8782524Google ScholarGoogle ScholarCross RefCross Ref
  67. Shixiong Qi, Leslie Monis, Ziteng Zeng, Ian-chin Wang, and K. K. Ramakrishnan. 2022. SPRIGHT: Extracting the Server from Serverless Computing! High-Performance EBPF-Based Event-Driven, Shared-Memory Processing. In Proceedings of the ACM SIGCOMM 2022 Conference (Amsterdam, Netherlands) (SIGCOMM '22). Association for Computing Machinery (ACM), New York, NY, USA, 780--794. https://doi.org/10.1145/3544216.3544259Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Issam Raïs, Anne-Cécile Orgerie, and Martin Quinson. 2016. Impact of Shutdown Techniques for Energy-Efficient Cloud Data Centers. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing (Granada, Spain) (ICA3PP '16). Springer, Heidelberg, Germany, 203--210. https://doi.org/10.1007/978-3-319-49583-5_15Google ScholarGoogle ScholarCross RefCross Ref
  69. Gourav Rattihalli, Ninad Hogade, Aditya Dhakal, Eitan Frachtenberg, Rolando Pablo Hong Enriquez, Pedro Bruel, Alok Mishra, and Dejan Milojicic. 2023. Fine-Grained Heterogeneous Execution Framework with Energy Aware Scheduling. In Proceedings of the 2023 IEEE 16th International Conference on Cloud Computing (Chicago, IL, USA) (CLOUD '23). IEEE, New York, NY, USA, 35--44. https://doi.org/10.1109/CLOUD60044.2023.00014Google ScholarGoogle ScholarCross RefCross Ref
  70. Sebastián Risco and Germán Moltó. 2021. GPU-Enabled Serverless Workflows for Efficient Multimedia Processing. Journal of Applied Sciences 11, 4 (Feb. 2021), 1438. https://doi.org/10.3390/app11041438Google ScholarGoogle ScholarCross RefCross Ref
  71. Felix Ritter, Tobias Boskamp, A. Homeyer, Hendrik Laue, Michael Schwier, Florian Link, and H.-O. Peitgen. 2011. Medical Image Analysis. IEEE Pulse 2, 6 (Dec. 2011), 60--70. https://doi.org/10.1109/MPUL.2011.942929Google ScholarGoogle ScholarCross RefCross Ref
  72. Andrea Sabbioni, Lorenzo Rosa, Armir Bujari, Luca Foschini, and Antonio Corradi. 2021. A Shared Memory Approach for Function Chaining in Serverless Platforms. In Proceedings of the 2021 IEEE Symposium on Computers and Communications (Athens, Greece) (ISCC '21). IEEE, New York, NY, USA, 1--6. https://doi.org/10.1109/ISCC53001.2021.9631385Google ScholarGoogle ScholarCross RefCross Ref
  73. Marc Sánchez-Artigas and Germán T. Eizaguirre. 2022. A Seer Knows Best: Optimized Object Storage Shuffling for Serverless Analytics. In Proceedings of the 23rd ACM/IFIP International Middleware Conference (Quebec City, QC, Canada) (Middleware '22). Association for Computing Machinery (ACM), New York, NY, USA, 148--160. https://doi.org/10.1145/3528535.3565241Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Trever Schirmer, Joel Scheuner, Tobias Pfandzelter, and David Bermbach. 2022. Fusionize: Improving Serverless Application Performance through Feedback-Driven Function Fusion. In Proceedings of the 10th IEEE International Conference on Cloud Engineering (Asilomar, CA, USA) (IC2E 2022). IEEE, New York, NY, USA, 85--95. https://doi.org/10.1109/IC2E55432.2022.00017Google ScholarGoogle ScholarCross RefCross Ref
  75. Hossein Shafiei, Ahmad Khonsari, and Payam Mousavi. 2022. Serverless Computing: A Survey of Opportunities, Challenges, and Applications. Comput. Surveys 54, 11s (Jan. 2022), 1--32. https://doi.org/10.1145/3510611Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Mohammad Shahrad, Rodrigo Fonseca, Íñigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. 2020. Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider. In Proceedings of the 2020 USENIX Annual Technical Conference (Virtual Event, USA) (ATC '20). USENIX Association, Berkeley, CA, USA, 205--218.Google ScholarGoogle Scholar
  77. John Shalf. 2020. The future of computing beyond Moore's Law. Philosophical Transactions of the Royal Society A 378, 2166 (Jan. 2020), 20190061. https://doi.org/10.1098/rsta.2019.0061Google ScholarGoogle ScholarCross RefCross Ref
  78. Prateek Sharma. 2022. Challenges and Opportunities in Sustainable Serverless Computing. In Proceedings of the 1st Workshop on Sustainable Computer Systems Design and Implementation (La Jolla, CA, USA) (HotCarbon '22). USENIX Association, Berkeley, CA, USA.Google ScholarGoogle Scholar
  79. Sushant Sharma, Chung-Hsing Hsu, and Wu-chun Feng. 2006. Making a Case for a Green500 List. In Proceedings of the Proceedings 20th IEEE International Parallel & Distributed Processing Symposium (Rhodes, Greece) (IPDPS '06). IEEE, New York, NY, USA. https://doi.org/10.1109/IPDPS.2006.1639600Google ScholarGoogle ScholarCross RefCross Ref
  80. Prasoon Sinha, Akhil Guliani, Rutwik Jain, Brandon Tran, Matthew D. Sinclair, and Shivaram Venkataraman. 2022. Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (Dallas, TX, USA) (SC '22). IEEE, New York, NY, USA, 1--15. https://doi.org/10.1109/SC41404.2022.00070Google ScholarGoogle ScholarCross RefCross Ref
  81. Sebastian Thrun. 2007. Simultaneous localization and mapping. In Robotics and cognitive approaches to spatial mapping. Springer, 13--41.Google ScholarGoogle Scholar
  82. Paramita Basak Upama, Md Jobair Hossain Faruk, Mohammad Nazim, Mohammad Masum, Hossain Shahriar, Gias Uddin, Shabir Barzanjeh, Sheikh Iqbal Ahamed, and Akond Rahman. 2022. Evolution of Quantum Computing: A Systematic Survey on the Use of Quantum Computing Tools. In Proceedings of the 46th Annual Computers, Software, and Applications Conference (Virtual Event, USA) (COMPSAC '22). IEEE, New York, NY, USA, 520--529. https://doi.org/10.1109/COMPSAC54236.2022.00096Google ScholarGoogle ScholarCross RefCross Ref
  83. Ava Vali, Sara Comai, and Matteo Matteucci. 2020. Deep Learning for Land Use and Land Cover Classification Based on Hyperspectral and Multispectral Earth Observation Data: A Review. Remote Sensing 12, 15 (Aug. 2020), 2495. https://doi.org/10.3390/rs12152495Google ScholarGoogle ScholarCross RefCross Ref
  84. Blesson Varghese and Rajkumar Buyya. 2018. Next generation cloud computing: New trends and research directions. Future Generation Computer Systems 79 (Feb. 2018), 849--861. https://doi.org/10.1016/j.future.2017.09.020Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Ao Wang, Shuai Chang, Huangshi Tian, Hongqi Wang, Haoran Yang, Huiba Li, Rui Du, and Yue Cheng. 2021. FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute. In Proceedings of the 2021 USENIX Annual Technical Conference (Virtual Event, USA) (ATC '21). USENIX Association, Berkeley, CA, USA, 443--457.Google ScholarGoogle Scholar
  86. Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, and Zheng Zhang. 2019. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. (Sept. 2019). arXiv:1909.01315Google ScholarGoogle Scholar
  87. Zhenning Wang, Jun Yang, Rami Melhem, Bruce Childers, Youtao Zhang, and Minyi Guo. 2015. Simultaneous Multikernel: Fine-Grained Sharing of GPUs. IEEE Computer Architecture Letters 15, 2 (Sept. 2015), 113--116. https://doi.org/10.1109/LCA.2015.2477405Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A. Curtiss, Rajeev Thakur, and Ian Foster. 2021. Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing. In Proceedings of the 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (St. Louis, MO, USA) (MLHPC '21). IEEE, New York, NY, USA, 9--20. https://doi.org/10.1109/MLHPC54614.2021.00007Google ScholarGoogle ScholarCross RefCross Ref
  89. Stefan Weinzierl. 2000. Introduction to Monte Carlo methods. (June 2000). arXiv:hep-ph/0006269Google ScholarGoogle Scholar
  90. Sebastian Werner and Trever Schirmer. 2022. Hardless: A Generalized Serverless Compute Architecture for Hardware Processing Accelerators. In Proceedings of the 10th IEEE International Conference on Cloud Engineering (Asilomar, CA, USA) (IC2E 2022). IEEE, New York, NY, USA, 79--84. https://doi.org/10.1109/IC2E55432.2022.00016Google ScholarGoogle ScholarCross RefCross Ref
  91. Robert Wille, Rod Van Meter, and Yehuda Naveh. 2019. IBM's Qiskit tool chain: Working with and developing for real quantum computers. In Proceedings of the 2019 Design, Automation & Test in Europe Conference & Exhibition (Florence, Italy) (DATE '19). IEEE, New York, NY, USA, 1234--1240. https://doi.org/10.23919/DATE.2019.8715261Google ScholarGoogle ScholarCross RefCross Ref
  92. Bo Wu, Xu Liu, Xiaobo Zhou, and Changjun Jiang. 2017. FLEP: Enabling Flexible and Efficient Preemption on GPUs. ACM SIGPLAN Notices 52, 4 (April 2017), 483--496. https://doi.org/10.1145/3093336.3037742Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Tsung Tai Yeh, Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann, and Timothy G. Rogers. 2017. Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks. ACM SIGPLAN Notices 52, 8 (Aug. 2017), 221--234. https://doi.org/10.1145/3155284.3018754Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Mohamed Zahran. 2016. Heterogeneous Computing: Here to Stay: Hardware and Software Perspectives. Queue 14, 6 (Nov. 2016), 31--42. https://doi.org/10.1145/3028687.3038873Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Yue Zha and Jing Li. 2020. Virtualizing FPGAs in the Cloud. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery (ACM), New York, NY, USA, 845--858. https://doi.org/10.1145/3373376.3378491Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Peng Zhang, Jianbin Fang, Canqun Yang, Chun Huang, Tao Tang, and Zheng Wang. 2020. Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures. IEEE Transactions on Parallel and Distributed Systems 31, 8 (March 2020), 1878--1896. https://doi.org/10.1109/TPDS.2020.2978045Google ScholarGoogle ScholarCross RefCross Ref
  97. Wei Zhang, Quan Chen, Ningxin Zheng, Weihao Cui, Kaihua Fu, and Minyi Guo. 2021. Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs. IEEE Trans. Comput. 71, 4 (March 2021), 866--879. https://doi.org/10.1109/TC.2021.3064352Google ScholarGoogle ScholarCross RefCross Ref
  98. Chen Zhao, Wu Gao, Feiping Nie, and Huiyang Zhou. 2021. A Survey of GPU Multitasking Methods Supported by Hardware Architecture. Transactions on Parallel and Distributed Systems 33, 6 (Sept. 2021), 1451--1463. https://doi.org/10.1109/TPDS.2021.3115630Google ScholarGoogle ScholarCross RefCross Ref
  99. Haidong Zhao, Zakaria Benomar, Tobias Pfandzelter, and Nikolaos Georgantas. 2022. Supporting Multi-Cloud in Serverless Computing. In Proceedings of the 15th IEEE/ACM International Conference on Utility and Cloud Computing Companion (Vancouver, WA, USA) (UCC '22). IEEE, New York, NY, USA, 285--290. https://doi.org/10.1109/UCC56403.2022.00051Google ScholarGoogle ScholarCross RefCross Ref
  100. Laiping Zhao, Yanan Yang, Yiming Li, Xian Zhou, and Keqiu Li. 2021. Understanding, Predicting and Scheduling Serverless Workloads under Partial Interference. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, MO, USA) (SC '21). Association for Computing Machinery (ACM), New York, NY, USA, 1--15. https://doi.org/10.1145/3458817.3476215Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Kernel-as-a-Service: A Serverless Programming Model for Heterogeneous Hardware Accelerators

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          Middleware '23: Proceedings of the 24th International Middleware Conference
          November 2023
          334 pages
          ISBN:9798400701771
          DOI:10.1145/3590140

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 November 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate203of948submissions,21%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader