ABSTRACT
This paper introduces Golgi, a novel scheduling system designed for serverless functions, with the goal of minimizing resource provisioning costs while meeting the function latency requirements. To achieve this, Golgi judiciously over-commits functions based on their past resource usage. To ensure overcommitment does not cause significant performance degradation, Golgi identifies nine low-level metrics to capture the runtime performance of functions, encompassing factors like request load, resource allocation, and contention on shared resources. These metrics enable accurate prediction of function performance using the Mondrian Forest, a classification model that is continuously updated in real-time for optimal accuracy without extensive offline training. Golgi employs a conservative exploration-exploitation strategy for request routing. By default, it routes requests to non-overcommitted instances to ensure satisfactory performance. However, it actively explores opportunities for using more resource-efficient overcommitted instances, while maintaining the specified latency SLOs. Golgi also performs vertical scaling to dynamically adjust the concurrency of overcommitted instances, maximizing request throughput and enhancing system robustness to prediction errors. We have prototyped Golgi and evaluated it in both EC2 cluster and a small production cluster. The results show that Golgi can meet the SLOs while reducing the resource provisioning cost by 42% (30%) in EC2 cluster (our production cluster).
- Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Firecracker: Lightweight Virtualization for Serverless Applications. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20).Google Scholar
- George Amvrosiadis, Jun Woo Park, Gregory R. Ganger, Garth A. Gibson, Elisabeth Baseman, and Nathan DeBardeleben. 2018. On the diversity of cluster workloads and its impact on research results. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC 18).Google Scholar
- The Kubernetes Authors. 2023. Kubernetes Scheduling Framework. https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/.Google Scholar
- Microsoft Azure. 2022. Azure Functions Pricing. https://azure.microsoft.com/en-us/pricing/details/functions/.Google Scholar
- Microsoft Azure. 2022. Concurrency in Azure Functions. https://docs.microsoft.com/en-us/azure/azure-functions/functions-concurrency.Google Scholar
- Microsoft Azure. 2022. What are Durable Functions? https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=csharp.Google Scholar
- Bharathan Balaji, Christopher Kakovitch, and Balakrishnan (Murali) Narayanaswamy. 2020. FirePlace: Placing FireCracker virtual machines with hindsight imitation. In Proceedings of the MLSys 2021, NeurIPS 2020 Workshop on Machine Learning for Systems.Google Scholar
- Noman Bashir, Nan Deng, Krzysztof Rzadca, David Irwin, Sree Kodak, and Rohit Jnagal. 2021. Take It to the Limit: Peak Prediction-Driven Resource Overcommitment in Datacenters. In Proceedings of the Sixteenth European Conference on Computer Systems (EuroSys 21).Google ScholarDigital Library
- Alibaba Cloud. 2022. Aliyun Function Compute Pricing. https://www.alibabacloud.com/help/en/doc-detail/54301.html.Google Scholar
- Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP 17).Google ScholarDigital Library
- Alex Ellis. 2022. OpenFaaS: Server Functions, Made Simple. https://www.openfaas.com/.Google Scholar
- Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, and Sriram Rao. 2018. Medea: Scheduling of Long Running Applications in Shared Production Clusters. In Proceedings of the Thirteenth EuroSys Conference (EuroSys 18).Google ScholarDigital Library
- Google. 2022. Overcommitting CPUs on sole-tenant VMs. https://cloud.google.com/compute/docs/nodes/overcommitting-cpus-sole-tenant-vms.Google Scholar
- Google. 2022. Vertical Pod autoscaling. https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler.Google Scholar
- Mingzhe Hao, Levent Toksoz, Nanqinqin Li, Edward Edberg Halim, Henry Hoffmann, and Haryadi S. Gunawi. 2020. LinnOS: Predictability on Unpredictable Flash Storage with a Light Neural Network. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation.Google Scholar
- Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Yadwadkar, et al. 2019. Cloud programming simplified: A berkeley view on serverless computing. arXiv preprint arXiv:1902.03383 (2019).Google Scholar
- Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. 2018. Pocket: Elastic Ephemeral Storage for Serverless Analytics. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18).Google Scholar
- Balaji Lakshminarayanan, Daniel M Roy, and Yee Whye Teh. 2014. Mondrian Forests: Efficient Online Random Forests. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 14).Google Scholar
- AWS Lambda. 2022. AWS Lambda Pricing. https://aws.amazon.com/lambda/pricing/.Google Scholar
- AWS Lambda. 2022. How do I request a concurrency limit increase for my Lambda function? https://aws.amazon.com/premiumsupport/knowledge-center/lambda-concurrency-limit-increase/.Google Scholar
- AWS Lambda. 2022. Lambda function scaling. https://docs.aws.amazon.com/lambda/latest/dg/invocation-scaling.html.Google Scholar
- Suyi Li, Luping Wang, Wei Wang, Yinghao Yu, and Bo Li. 2021. George: Learning to Place Long-Lived Containers in Large Clusters with Operation Constraints. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 21).Google ScholarDigital Library
- Ashraf Mahgoub, Karthick Shankar, Subrata Mitra, Ana Klimovic, Somali Chaterji, and Saurabh Bagchi. 2021. SONIC: Application-aware Data Passing for Chained Serverless Applications. In Proceedings of the 2021 USENIX Annual Technical Conference (ATC 21).Google Scholar
- Ashraf Mahgoub, Edgardo Barsallo Yi, Karthick Shankar, Sameh Elnikety, Somali Chaterji, and Saurabh Bagchi. 2022. ORION and the Three Rights: Sizing, Bundling, and Prewarming for Serverless DAGs. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22).Google Scholar
- Michael Mitzenmacher. 2001. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems (2001).Google ScholarDigital Library
- Djob Mvondo, Mathieu Bacou, Kevin Nguetchouang, Lucien Ngale, Stéphane Pouget, Josiane Kouam, Renaud Lachaize, Jinho Hwang, Tim Wood, Daniel Hagimont, Noël De Palma, Bernabé Batchakui, and Alain Tchana. 2021. OFC: An Opportunistic Caching System for FaaS Platforms. In Proceedings of the Sixteenth European Conference on Computer Systems (EuroSys 21).Google ScholarDigital Library
- Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. 2017. Tensorflow-serving: Flexible, high-performance ml serving. arXiv preprint arXiv:1712.06139 (2017).Google Scholar
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (2011).Google ScholarDigital Library
- Qifan Pu, Shivaram Venkataraman, and Ion Stoica. 2019. Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19).Google Scholar
- Ran Ribenzaft. 2019. What AWS Lambda's Performance Stats Reveal. https://epsagon.com/observability/what-aws-lambda-performance-stats-reveal-key-metrics/.Google Scholar
- Francisco Romero, Gohar Irfan Chaudhry, Íñigo Goiri, Pragna Gopa, Paul Batum, Neeraja J. Yadwadkar, Rodrigo Fonseca, Christos Kozyrakis, and Ricardo Bianchini. 2021. Faa$T: A Transparent Auto-Scaling Cache for Serverless Applications. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 21).Google ScholarDigital Library
- Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemyslaw Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, and John Wilkes. 2020. Autopilot: Workload Autoscaling at Google. In Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys 20).Google ScholarDigital Library
- Johann Schleier-Smith, Vikram Sreekanti, Anurag Khandelwal, Joao Carreira, Neeraja J Yadwadkar, Raluca Ada Popa, Joseph E Gonzalez, Ion Stoica, and David A Patterson. 2021. What serverless computing is and should become: The next phase of cloud computing. Commun. ACM (2021).Google Scholar
- Mohammad Shahrad, Rodrigo Fonseca, Inigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. 2020. Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC 20).Google Scholar
- Arjun Singhvi, Arjun Balasubramanian, Kevin Houck, Mohammed Danish Shaikh, Shivaram Venkataraman, and Aditya Akella. 2021. Atoll: A Scalable Low-Latency Serverless Platform. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 21).Google ScholarDigital Library
- Vikram Sreekanti, Chenggang Wu, Xiayue Charles Lin, Johann Schleier-Smith, Joseph E. Gonzalez, Joseph M. Hellerstein, and Alexey Tumanov. 2020. Cloudburst: Stateful Functions-as-a-Service. In Proc. VLDB Endow. (2020).Google ScholarDigital Library
- Huangshi Tian, Suyi Li, Ao Wang, Wei Wang, Tianlong Wu, and Haoran Yang. 2022. Owl: Performance-Aware Scheduling for Resource-Efficient Function-as-a-Service Cloud. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 22).Google ScholarDigital Library
- Huangshi Tian, Suyi Li, Ao Wang, Wei Wang, Tianlong Wu, and Haoran Yang. 2022. Owl: Performance-Aware Scheduling for Resource-Efficient Function-as-a-Service Cloud. https://www.cse.ust.hk/~weiwa/papers/owl-techreport.pdf.Google Scholar
- Jeffrey S Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS) (1985).Google Scholar
- Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. 2018. Peeking Behind the Curtains of Serverless Platforms. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC 18).Google Scholar
- Luping Wang, Qizhen Weng, Wei Wang, Chen Chen, and Bo Li. 2020. Metis: Learning to schedule long-running applications in shared container clusters at scale. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.Google ScholarCross Ref
- Zhaojie Wen, Yishuo Wang, and Fangming Liu. 2022. StepConf: SLO-Aware Dynamic Resource Configuration for Serverless Function Workflows. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications.Google Scholar
- Renyu Yang, Chunming Hu, Xiaoyang Sun, Peter Garraghan, Tianyu Wo, Zhenyu Wen, Hao Peng, Jie Xu, and Chao Li. 2020. Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters. IEEE Transactions on Parallel and Distributed Systems (2020).Google ScholarCross Ref
- Minchen Yu, Tingjia Cao, Wei Wang, and Ruichuan Chen. 2023. Following the Data, Not the Function: Rethinking Function Orchestration in Serverless Computing. In Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23).Google Scholar
- Tianyi Yu, Qingyuan Liu, Dong Du, Yubin Xia, Binyu Zang, Ziqian Lu, Pingchao Yang, Chenggang Qin, and Haibo Chen. 2020. Characterizing Serverless Platforms with Serverlessbench. In Proceedings of the 11th ACM Symposium on Cloud Computing (SoCC 20).Google ScholarDigital Library
- Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC 19).Google Scholar
- Hong Zhang, Yupeng Tang, Anurag Khandelwal, Jingrong Chen, and Ion Stoica. 2021. Caerus: NIMBLE Task Scheduling for Serverless Analytics. In Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21).Google Scholar
- Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI2: CPU performance isolation for shared compute clusters. In Proceedings of the SIGOPS European Conference on Computer Systems (EuroSys 13).Google Scholar
Index Terms
- Golgi: Performance-Aware, Resource-Efficient Function Scheduling for Serverless Computing
Recommendations
Supporting Multi-Provider Serverless Computing on the Edge
ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel ProcessingServerless computing has recently emerged as a new execution model for cloud computing, in which service providers offer compute runtimes, also known as Function-as-a-Service (FaaS) platforms, allowing users to develop, execute and manage application ...
Is Function-as-a-Service a Good Fit for Latency-Critical Services?
WoSC '21: Proceedings of the Seventh International Workshop on Serverless Computing (WoSC7) 2021Function-as-a-Service (FaaS) is becoming an increasingly popular cloud-deployment paradigm for serverless computing that frees application developers from managing the infrastructure. At the same time, it allows cloud providers to assert control in ...
Harnessing Cloud Technologies for a Virtualized Distributed Computing Infrastructure
The InterGrid system aims to provide an execution environment for running applications on top of interconnected infrastructures. The system uses virtual machines as building blocks to construct execution environments that span multiple computing sites. ...
Comments