research-article

Fine-grained accelerator partitioning for Machine Learning and Scientific Computing in Function as a Service Platform

Authors:
Aditya Dhakal

Hewlett Packard Labs, United States of America

Hewlett Packard Labs, United States of America

0000-0002-8297-8525
View Profile

,
Philipp Raith

Hewlett Packard Labs, Austria

Hewlett Packard Labs, Austria

0000-0003-3293-9437
View Profile

,
Logan Ward

Argonne National Laboratory, United States of America

Argonne National Laboratory, United States of America

0000-0002-1323-5939
View Profile

,
Rolando P. Hong Enriquez

Hewlett Packard Labs, United Kingdom

Hewlett Packard Labs, United Kingdom

0009-0008-5652-4408
View Profile

,
Gourav Rattihalli

Hewlett Packard Labs, United States of America

Hewlett Packard Labs, United States of America

0000-0002-0373-1867
View Profile

,
Kyle Chard

University of Chicago, United States of America

University of Chicago, United States of America

0000-0002-7370-4805
View Profile

,
Ian Foster

Argonne National Laboratory, United States of America

Argonne National Laboratory, United States of America

0000-0003-2129-5269
View Profile

,
Dejan Milojicic

Hewlett Packard Labs, United States of America

Hewlett Packard Labs, United States of America

0000-0001-9830-8588
View Profile

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023Pages 1606–1613https://doi.org/10.1145/3624062.3624238

Published:12 November 2023Publication History

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Pages 1606–1613

ABSTRACT

Function-as-a-service (FaaS) is a promising execution environment for high-performance computing (HPC) and machine learning (ML) applications as it offers developers a simple way to write and deploy programs. Nowadays, GPUs and other accelerators are indispensable for HPC and ML workloads. These accelerators are expensive to acquire and operate; consequently, multiplexing them can increase their financial profitability. However, we have observed that state-of-the-art FaaS frameworks usually treat accelerator as a single device to run single workload and have little support for multiplexing accelerators.

In this work, we have presented techniques to multiplex GPUs with Parsl, a popular FaaS framework. We demonstrate why GPU multiplexing is beneficial for certain applications and how we have implemented GPU multiplexing in Parsl. With our enhancements, we show up to 60% lower task completion time and 250% improvement in the inference throughput of a large language model when multiplexing a GPU compared to running a single instance without multiplexing. We plan to extend the support for GPU multiplexing in FaaS platforms by tackling the challenges of changing compute resources in the partition and approximating how to right-size a GPU partition for a function.

References

2023. Multi-Site Active Learning for IP Optimization. https://github.com/exalearn/multi-site-campaigns/tree/main/molecular-design. Accessed: 16/08/2023.Google Scholar
Yadu Babuji, Anna Woodard, Zhuozhao Li, Daniel S. Katz, Ben Clifford, Rohan Kumar, Lukasz Lacinski, Ryan Chard, Justin M. Wozniak, Ian Foster, Michael Wilde, and Kyle Chard. 2019. Parsl: Pervasive Parallel Programming in Python. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (Phoenix, AZ, USA) (HPDC ’19). Association for Computing Machinery, New York, NY, USA, 25–36. https://doi.org/10.1145/3307681.3325400Google ScholarDigital Library
Sathwika Bavikadi, Abhijitt Dhavlle, Amlan Ganguly, Anand Haridass, Hagar Hendy, Cory Merkel, Vijay Janapa Reddi, Purab Ranjan Sutradhar, Arun Joseph, and Sai Manoj Pudukotai Dinakarrao. 2022. A Survey on Machine Learning Accelerators and Evolutionary Hardware Platforms. IEEE Design & Test 39, 3 (2022), 91–116. https://doi.org/10.1109/MDAT.2022.3161126Google ScholarCross Ref
Ryan Chard, Yadu Babuji, Zhuozhao Li, Tyler Skluzacek, Anna Woodard, Ben Blaiszik, Ian Foster, and Kyle Chard. 2020. FuncX: A Federated Function Serving Fabric for Science. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (Stockholm, Sweden) (HPDC ’20). Association for Computing Machinery, New York, NY, USA, 65–76. https://doi.org/10.1145/3369583.3392683Google ScholarDigital Library
Junguk Cho, Diman Zad Tootaghaj, Lianjie Cao, and Puneet Sharma. 2022. SLA-Driven ML INFERENCE FRAMEWORK FOR CLOUDS WITH HETEROGENEOUS ACCELERATORS. In Proceedings of Machine Learning and Systems, D. Marculescu, Y. Chi, and C. Wu (Eds.). Vol. 4. 20–32. https://proceedings.mlsys.org/paper_files/paper/2022/file/bcf9bef61a534d0ce4a3c55f09dfcc29-Paper.pdfGoogle Scholar
Gregor Daiß, Patrick Diehl, Dominic Marcello, Alireza Kheirkhahan, Hartmut Kaiser, and Dirk Pflüger. 2022. From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels. In 2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 89–99. https://doi.org/10.1109/P3HPC56579.2022.00014Google ScholarCross Ref
Abdul Dakkak, Cheng Li, Simon Garcia De Gonzalo, Jinjun Xiong, and Wen-mei Hwu. 2019. Trims: Transparent and isolated model sharing for low latency deep learning inference in function-as-a-service. In 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). IEEE, 372–382.Google ScholarCross Ref
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.Google Scholar
Aditya Dhakal, Sameer G Kulkarni, and KK Ramakrishnan. 2020. Gslice: controlled spatial sharing of gpus for a scalable inference platform. In Proceedings of the 11th ACM Symposium on Cloud Computing. 492–506.Google ScholarDigital Library
Aditya Dhakal, Sameer G Kulkarni, and KK Ramakrishnan. 2023. D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs. arXiv preprint arXiv:2304.13541 (2023).Google Scholar
Aditya Dhakal, Sameer G Kulkarni, and K. K. Ramakrishnan. 2020. ECML: Improving Efficiency of Machine Learning in Edge Clouds. In 2020 IEEE 9th International Conference on Cloud Networking (CloudNet). 1–6. https://doi.org/10.1109/CloudNet51028.2020.9335804Google ScholarCross Ref
Aditya Dhakal, Sameer G Kulkarni, and K. K. Ramakrishnan. 2020. Machine Learning at the Edge: Efficient Utilization of Limited CPU/GPU Resources by Multiplexing. In 2020 IEEE 28th International Conference on Network Protocols (ICNP). 1–6. https://doi.org/10.1109/ICNP49622.2020.9259361Google ScholarCross Ref
Henrique Fingler, Zhiting Zhu, Esther Yoon, Zhipeng Jia, and Emmett Witchel. [n. d.]. DGSF: Disaggregated GPUs for Serverless Functions. IEEE International Parallel and Distributed Processing Symposium ([n. d.]). https://doi.org/10.1109/IPDPS53621.2022.00077Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385 (2015).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arxiv:1502.01852 [cs.CV]Google Scholar
Myung-Hyun Kim, Jaehak Lee, Heonchang Yu, and Eunyoung Lee. 2023. Improving Memory Utilization by Sharing DNN Models for Serverless Inference. In 2023 IEEE International Conference on Consumer Electronics (ICCE). IEEE, 1–6.Google ScholarCross Ref
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wei Wei, Lingjie Xu, and Wen-mei Hwu. 2020. XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 326–327. https://doi.org/10.1109/IPDPS47924.2020.00042Google ScholarCross Ref
Jie Li, George Michelogiannakis, Brandon Cook, Dulanya Cooray, and Yong Chen. 2023. Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter. In High Performance Computing, Abhinav Bhatele, Jeff Hammond, Marc Baboulin, and Carola Kruse (Eds.). Springer Nature Switzerland, Cham, 297–316.Google Scholar
Jie Li, Laiping Zhao, Yanan Yang, Kunlin Zhan, and Keqiu Li. 2022. Tetris: Memory-efficient serverless inference through tensor sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22).Google Scholar
NVIDIA. 2023. Multiprocess Service. https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf. Accessed: 15/08/2023.Google Scholar
Biagio Peccerillo, Mirco Mannino, Andrea Mondelli, and Sandro Bartolini. 2022. A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives. Journal of Systems Architecture 129 (2022), 102561. https://doi.org/10.1016/j.sysarc.2022.102561Google ScholarDigital Library
Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, Artur Kadurin, Simon Johansson, Hongming Chen, Sergey Nikolenko, Alan Aspuru-Guzik, and Alex Zhavoronkov. 2020. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Frontiers in Pharmacology (2020).Google Scholar
Philipp Raith, Stefan Nastic, and Schahram Dustdar. 2023. Serverless Edge Computing—Where We Are and What Lies Ahead. IEEE Internet Computing 27, 3 (2023), 50–64.Google ScholarDigital Library
Philipp Raith, Thomas Rausch, Schahram Dustdar, Fabiana Rossi, Valeria Cardellini, and Rajiv Ranjan. 2022. Mobility-aware serverless function adaptations across the edge-cloud continuum. In 2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing (UCC). IEEE, 123–132.Google ScholarCross Ref
Rohan Basu Roy, Tirthak Patel, Vijay Gadepally, and Devesh Tiwari. 2022. Mashup: making serverless computing useful for hpc workflows via hybrid execution. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 46–60.Google ScholarDigital Library
Rohan Basu Roy, Tirthak Patel, and Devesh Tiwari. 2022. Icebreaker: Warming serverless functions better with heterogeneity. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 753–767.Google ScholarDigital Library
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-yGoogle ScholarDigital Library
Lukas Tobler. 2022. GPUless – Serverless GPU Functions. Master’s thesis. ETH.Google Scholar
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A. Curtiss, Rajeev Thakur, and Ian Foster. 2021. Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing. In 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC). 9–20. https://doi.org/10.1109/MLHPC54614.2021.00007Google ScholarCross Ref
Bingyang Wu, Zili Zhang, Zhihao Bai, Xuanzhe Liu, and Xin Jin. 2023. Transparent { GPU} Sharing in Container Clouds for Deep Learning Workloads. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 69–85.Google Scholar

Index Terms

Fine-grained accelerator partitioning for Machine Learning and Scientific Computing in Function as a Service Platform
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures
CGO '17: Proceedings of the 2017 International Symposium on Code Generation and Optimization

The integrated architecture that features both CPU and GPU on the same die is an emerging and promising architecture for fine-grained CPU-GPU collaboration. However, the integration also brings forward several programming and system optimization ...
Read More
Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors
Abstract
Emerging many-core processors feature very high memory bandwidth and computational power. For example, Intel Xeon Phi many-core processors of the Knights Corner (KNC) and Knights Landing (KNL) architectures embrace 60 to 64 x86-based ...
Highlights
- We find that the state-of-the-art implementations of in-memory database operators suffer severely from memory stalls. Also, such implementations under-...
Read More
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

In this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062

Copyright © 2023 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 117
  Total Downloads
- Downloads (Last 12 months)117
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Fine-grained accelerator partitioning for Machine Learning and Scientific Computing in Function as a Service Platform

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures

Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Fine-grained accelerator partitioning for Machine Learning and Scientific Computing in Function as a Service Platform

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures

Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media