skip to main content
10.1145/3624062.3624238acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Fine-grained accelerator partitioning for Machine Learning and Scientific Computing in Function as a Service Platform

Published:12 November 2023Publication History

ABSTRACT

Function-as-a-service (FaaS) is a promising execution environment for high-performance computing (HPC) and machine learning (ML) applications as it offers developers a simple way to write and deploy programs. Nowadays, GPUs and other accelerators are indispensable for HPC and ML workloads. These accelerators are expensive to acquire and operate; consequently, multiplexing them can increase their financial profitability. However, we have observed that state-of-the-art FaaS frameworks usually treat accelerator as a single device to run single workload and have little support for multiplexing accelerators.

In this work, we have presented techniques to multiplex GPUs with Parsl, a popular FaaS framework. We demonstrate why GPU multiplexing is beneficial for certain applications and how we have implemented GPU multiplexing in Parsl. With our enhancements, we show up to 60% lower task completion time and 250% improvement in the inference throughput of a large language model when multiplexing a GPU compared to running a single instance without multiplexing. We plan to extend the support for GPU multiplexing in FaaS platforms by tackling the challenges of changing compute resources in the partition and approximating how to right-size a GPU partition for a function.

References

  1. 2023. Multi-Site Active Learning for IP Optimization. https://github.com/exalearn/multi-site-campaigns/tree/main/molecular-design. Accessed: 16/08/2023.Google ScholarGoogle Scholar
  2. Yadu Babuji, Anna Woodard, Zhuozhao Li, Daniel S. Katz, Ben Clifford, Rohan Kumar, Lukasz Lacinski, Ryan Chard, Justin M. Wozniak, Ian Foster, Michael Wilde, and Kyle Chard. 2019. Parsl: Pervasive Parallel Programming in Python. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (Phoenix, AZ, USA) (HPDC ’19). Association for Computing Machinery, New York, NY, USA, 25–36. https://doi.org/10.1145/3307681.3325400Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sathwika Bavikadi, Abhijitt Dhavlle, Amlan Ganguly, Anand Haridass, Hagar Hendy, Cory Merkel, Vijay Janapa Reddi, Purab Ranjan Sutradhar, Arun Joseph, and Sai Manoj Pudukotai Dinakarrao. 2022. A Survey on Machine Learning Accelerators and Evolutionary Hardware Platforms. IEEE Design & Test 39, 3 (2022), 91–116. https://doi.org/10.1109/MDAT.2022.3161126Google ScholarGoogle ScholarCross RefCross Ref
  4. Ryan Chard, Yadu Babuji, Zhuozhao Li, Tyler Skluzacek, Anna Woodard, Ben Blaiszik, Ian Foster, and Kyle Chard. 2020. FuncX: A Federated Function Serving Fabric for Science. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (Stockholm, Sweden) (HPDC ’20). Association for Computing Machinery, New York, NY, USA, 65–76. https://doi.org/10.1145/3369583.3392683Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Junguk Cho, Diman Zad Tootaghaj, Lianjie Cao, and Puneet Sharma. 2022. SLA-Driven ML INFERENCE FRAMEWORK FOR CLOUDS WITH HETEROGENEOUS ACCELERATORS. In Proceedings of Machine Learning and Systems, D. Marculescu, Y. Chi, and C. Wu (Eds.). Vol. 4. 20–32. https://proceedings.mlsys.org/paper_files/paper/2022/file/bcf9bef61a534d0ce4a3c55f09dfcc29-Paper.pdfGoogle ScholarGoogle Scholar
  6. Gregor Daiß, Patrick Diehl, Dominic Marcello, Alireza Kheirkhahan, Hartmut Kaiser, and Dirk Pflüger. 2022. From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels. In 2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 89–99. https://doi.org/10.1109/P3HPC56579.2022.00014Google ScholarGoogle ScholarCross RefCross Ref
  7. Abdul Dakkak, Cheng Li, Simon Garcia De Gonzalo, Jinjun Xiong, and Wen-mei Hwu. 2019. Trims: Transparent and isolated model sharing for low latency deep learning inference in function-as-a-service. In 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). IEEE, 372–382.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.Google ScholarGoogle Scholar
  9. Aditya Dhakal, Sameer G Kulkarni, and KK Ramakrishnan. 2020. Gslice: controlled spatial sharing of gpus for a scalable inference platform. In Proceedings of the 11th ACM Symposium on Cloud Computing. 492–506.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Aditya Dhakal, Sameer G Kulkarni, and KK Ramakrishnan. 2023. D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs. arXiv preprint arXiv:2304.13541 (2023).Google ScholarGoogle Scholar
  11. Aditya Dhakal, Sameer G Kulkarni, and K. K. Ramakrishnan. 2020. ECML: Improving Efficiency of Machine Learning in Edge Clouds. In 2020 IEEE 9th International Conference on Cloud Networking (CloudNet). 1–6. https://doi.org/10.1109/CloudNet51028.2020.9335804Google ScholarGoogle ScholarCross RefCross Ref
  12. Aditya Dhakal, Sameer G Kulkarni, and K. K. Ramakrishnan. 2020. Machine Learning at the Edge: Efficient Utilization of Limited CPU/GPU Resources by Multiplexing. In 2020 IEEE 28th International Conference on Network Protocols (ICNP). 1–6. https://doi.org/10.1109/ICNP49622.2020.9259361Google ScholarGoogle ScholarCross RefCross Ref
  13. Henrique Fingler, Zhiting Zhu, Esther Yoon, Zhipeng Jia, and Emmett Witchel. [n. d.]. DGSF: Disaggregated GPUs for Serverless Functions. IEEE International Parallel and Distributed Processing Symposium ([n. d.]). https://doi.org/10.1109/IPDPS53621.2022.00077Google ScholarGoogle ScholarCross RefCross Ref
  14. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385 (2015).Google ScholarGoogle Scholar
  15. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arxiv:1502.01852 [cs.CV]Google ScholarGoogle Scholar
  16. Myung-Hyun Kim, Jaehak Lee, Heonchang Yu, and Eunyoung Lee. 2023. Improving Memory Utilization by Sharing DNN Models for Serverless Inference. In 2023 IEEE International Conference on Consumer Electronics (ICCE). IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  17. Cheng Li, Abdul Dakkak, Jinjun Xiong, Wei Wei, Lingjie Xu, and Wen-mei Hwu. 2020. XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 326–327. https://doi.org/10.1109/IPDPS47924.2020.00042Google ScholarGoogle ScholarCross RefCross Ref
  18. Jie Li, George Michelogiannakis, Brandon Cook, Dulanya Cooray, and Yong Chen. 2023. Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter. In High Performance Computing, Abhinav Bhatele, Jeff Hammond, Marc Baboulin, and Carola Kruse (Eds.). Springer Nature Switzerland, Cham, 297–316.Google ScholarGoogle Scholar
  19. Jie Li, Laiping Zhao, Yanan Yang, Kunlin Zhan, and Keqiu Li. 2022. Tetris: Memory-efficient serverless inference through tensor sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22).Google ScholarGoogle Scholar
  20. NVIDIA. 2023. Multiprocess Service. https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf. Accessed: 15/08/2023.Google ScholarGoogle Scholar
  21. Biagio Peccerillo, Mirco Mannino, Andrea Mondelli, and Sandro Bartolini. 2022. A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives. Journal of Systems Architecture 129 (2022), 102561. https://doi.org/10.1016/j.sysarc.2022.102561Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, Artur Kadurin, Simon Johansson, Hongming Chen, Sergey Nikolenko, Alan Aspuru-Guzik, and Alex Zhavoronkov. 2020. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Frontiers in Pharmacology (2020).Google ScholarGoogle Scholar
  23. Philipp Raith, Stefan Nastic, and Schahram Dustdar. 2023. Serverless Edge Computing—Where We Are and What Lies Ahead. IEEE Internet Computing 27, 3 (2023), 50–64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Philipp Raith, Thomas Rausch, Schahram Dustdar, Fabiana Rossi, Valeria Cardellini, and Rajiv Ranjan. 2022. Mobility-aware serverless function adaptations across the edge-cloud continuum. In 2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing (UCC). IEEE, 123–132.Google ScholarGoogle ScholarCross RefCross Ref
  25. Rohan Basu Roy, Tirthak Patel, Vijay Gadepally, and Devesh Tiwari. 2022. Mashup: making serverless computing useful for hpc workflows via hybrid execution. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 46–60.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rohan Basu Roy, Tirthak Patel, and Devesh Tiwari. 2022. Icebreaker: Warming serverless functions better with heterogeneity. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 753–767.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lukas Tobler. 2022. GPUless – Serverless GPU Functions. Master’s thesis. ETH.Google ScholarGoogle Scholar
  29. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).Google ScholarGoogle Scholar
  30. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  31. Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A. Curtiss, Rajeev Thakur, and Ian Foster. 2021. Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing. In 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC). 9–20. https://doi.org/10.1109/MLHPC54614.2021.00007Google ScholarGoogle ScholarCross RefCross Ref
  32. Bingyang Wu, Zili Zhang, Zhihao Bai, Xuanzhe Liu, and Xin Jin. 2023. Transparent { GPU} Sharing in Container Clouds for Deep Learning Workloads. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 69–85.Google ScholarGoogle Scholar

Index Terms

  1. Fine-grained accelerator partitioning for Machine Learning and Scientific Computing in Function as a Service Platform

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
            November 2023
            2180 pages
            ISBN:9798400707858
            DOI:10.1145/3624062

            Copyright © 2023 ACM

            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 November 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format