skip to main content
10.1145/3631295.3631400acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

When Serverless Computing Meets Different Degrees of Customization for DNN Inference

Published:11 December 2023Publication History

ABSTRACT

Serverless computing provides a method to develop application services without the burden of run-time execution environment management overhead. Since the initial offerings of serverless computing using function-as-a-service (FaaS), other variants of execution environments have been proposed, such as a special-purpose FaaS (SPF) for deep neural network (DNN) inference and a serverless container service (SCS) for general web applications. This paper qualitatively summarizes the characteristics of a general-purpose FaaS (GPF), SPF, and SCS from the perspective of customizability when setting up execution environments. To judge whether various serverless computing environments can be feasible solutions for an interactive DNN model inference application, we conduct extensive experiments and conclude that there are rooms for performance improvement serverless DNN inference, and allowing a custom environment setup can make the serverless computing platform for an interactive DNN application.

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: a system for large-scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265--283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jaeghang Choi and Kyungyong Lee. 2020. Evaluation of Network File System as a Shared Data Storage in Serverless Computing. In Proceedings of the 2020 Sixth International Workshop on Serverless Computing (Delft, Netherlands) (WoSC'20). Association for Computing Machinery, New York, NY, USA, 25--30. https://doi.org/10.1145/3429880.3430096Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. U. Choi and K. Lee. 2022. Dense or Sparse: Elastic SPMM Implementation for Optimal Big-Data Processing. IEEE Transactions on Big Data 01 (aug 2022), 1--17. https://doi.org/10.1109/TBDATA.2022.3199197Google ScholarGoogle ScholarCross RefCross Ref
  4. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL 2019.Google ScholarGoogle Scholar
  5. Joseph M. Hellerstein, Jose M. Faleiro, Joseph Gonzalez, Johann Schleier-Smith, Vikram Sreekanti, Alexey Tumanov, and Chenggang Wu. 2019. Serverless Computing: One Step Forward, Two Steps Back. In 9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13-16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/papers/p119-hellerstein-cidr19.pdfGoogle ScholarGoogle Scholar
  6. Glenn Jocher, Ayush Chaurasia, Alex Stoken, Jirka Borovec, NanoCode012, Yonghye Kwon, Kalen Michael, TaoXie, Jiacong Fang, imyhxy, Lorna, Zeng Yifu, Colin Wong, Abhiram V, Diego Montes, Zhiqiang Wang, Cristi Fati, Jebastin Nadar, Laughing, UnglvKitDe, Victor Sonck, tkianai, yxNONG, Piotr Skalski, Adam Hogan, Dhruv Nair, Max Strobel, and Mrinal Jain. 2022. ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation. https://doi.org/10.5281/zenodo.7347926Google ScholarGoogle ScholarCross RefCross Ref
  7. Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the Cloud: Distributed Computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing (Santa Clara, California) (SoCC '17). ACM, New York, NY, USA, 445--451. https://doi.org/10.1145/3127479.3128601Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Jayant Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, and David A. Patterson. 2019. Cloud Programming Simplified: A Berkeley View on Serverless Computing. CoRR abs/1902.03383 (2019). arXiv:1902.03383 http://arxiv.org/abs/1902.03383Google ScholarGoogle Scholar
  9. J. Kim and K. Lee. 2019. FunctionBench: A Suite of Workloads for Serverless Cloud Function Service. In 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). https://doi.org/10.1109/CLOUD.2019.00091Google ScholarGoogle ScholarCross RefCross Ref
  10. Jeongchul Kim and Kyungyong Lee. 2019. Practical Cloud Workloads for Serverless FaaS. In Proceedings of the ACM Symposium on Cloud Computing (Santa Cruz, CA, USA) (SoCC '19). ACM, New York, NY, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Josep Sampé, Marc Sánchez-Artigas, Gil Vernik, Ido Yehekzel, and Pedro García-López. 2023. Outsourcing Data Processing Jobs With Lithops. IEEE Transactions on Cloud Computing 11, 1 (2023), 1026--1037. https://doi.org/10.1109/TCC.2021.3129000Google ScholarGoogle ScholarCross RefCross Ref
  12. Marc Sánchez-Artigas and Germán T. Eizaguirre. 2022. A Seer Knows Best: Optimized Object Storage Shuffling for Serverless Analytics. In Proceedings of the 23rd ACM/IFIP International Middleware Conference (Quebec, QC, Canada) (Middleware '22). Association for Computing Machinery, New York, NY, USA, 148--160. https://doi.org/10.1145/3528535.3565241Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4510--4520. https://doi.org/10.1109/CVPR.2018.00474Google ScholarGoogle ScholarCross RefCross Ref
  14. Johann Schleier-Smith, Vikram Sreekanti, Anurag Khandelwal, Joao Carreira, Neeraja J Yadwadkar, Raluca Ada Popa, Joseph E Gonzalez, Ion Stoica, and David A Patterson. 2021. What serverless computing is and should become: The next phase of cloud computing. Commun. ACM 64, 5 (2021), 76--84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Son and K. Lee. 2018. Distributed Matrix Multiplication Performance Estimator for Machine Learning Jobs in Cloud Computing. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), Vol. 00. 638--645. https://doi.org/10.1109/CLOUD.2018.00088Google ScholarGoogle ScholarCross RefCross Ref
  16. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. CoRR abs/1512.00567 (2015). arXiv:1512.00567 http://arxiv.org/abs/1512.00567Google ScholarGoogle Scholar
  17. Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. 2018. Peeking Behind the Curtains of Serverless Platforms. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 133--146. https://www.usenix.org/conference/atc18/presentation/wang-liangGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  18. Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, and Yuxiong He. 2022. DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. Technical Report MSR-TR-2022-21. Microsoft. https://www.microsoft.com/en-us/research/publication/deepspeed-inference-enabling-efficient-inference-of-transformer-models-at-unprecedented-scale/Google ScholarGoogle Scholar

Index Terms

  1. When Serverless Computing Meets Different Degrees of Customization for DNN Inference

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WoSC '23: Proceedings of the 9th International Workshop on Serverless Computing
      December 2023
      68 pages
      ISBN:9798400704550
      DOI:10.1145/3631295

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 December 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Upcoming Conference

      MIDDLEWARE '24
      25th International Middleware Conference
      December 2 - 6, 2024
      Hong Kong , Hong Kong
    • Article Metrics

      • Downloads (Last 12 months)73
      • Downloads (Last 6 weeks)9

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader