research-article

GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks

Authors:

Ram Srivatsa Kannan,

Lavanya Subramanian,

Lingjia TangAuthors Info & Claims

EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019

Article No.: 34, Pages 1 - 16

https://doi.org/10.1145/3302424.3303958

Published: 25 March 2019 Publication History

Abstract

The microservice architecture has dramatically reduced user effort in adopting and maintaining servers by providing a catalog of functions as services that can be used as building blocks to construct applications. This has enabled datacenter operators to look at managing datacenter hosting microservices quite differently from traditional infrastructures. Such a paradigm shift calls for a need to rethink resource management strategies employed in such execution environments. We observe that the visibility enabled by a microservices execution framework can be exploited to achieve high throughput and resource utilization while still meeting Service Level Agreements, especially in multi-tenant execution scenarios.

In this study, we present GrandSLAm, a microservice execution framework that improves utilization of datacenters hosting microservices. GrandSLAm estimates time of completion of requests propagating through individual microservice stages within an application. It then leverages this estimate to drive a runtime system that dynamically batches and reorders requests at each microservice in a manner where individual jobs meet their respective target latency while achieving high throughput. GrandSLAm significantly increases throughput by up to 3x compared to the our baseline, without violating SLAs for a wide range of real-world AI and ML applications.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).

Digital Library

[2]

Paula Aguilera, Katherine Morrow, and Nam Sung Kim. 2014. Fair share: Allocation of GPU resources for both performance and fairness. In Proceedings of the IEEE 32nd International Conference on Computer Design (ICCD 14).

[3]

Amazon. 2019. What is AWS Lambda? https://docs.aws.amazon.com/lambda/latest/dg/welcome.html. (2019).

[4]

Amazon. 2019. What is AWS Step Functions? http://docs.aws.amazon.com/step-functions/latest/dg/welcome.html. (2019).

[5]

Microsoft Azure. 2019. Azure Functions Serverless Architecture. https://azure.microsoft.com/en-us/services/functions/. (2019).

[6]

Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture 8, 3 (2013), 1--154.

[7]

Quan Chen, Hailong Yang, Minyi Guo, Ram Srivatsa Kannan, Jason Mars, and Lingjia Tang. 2017. Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 17).

Digital Library

[8]

Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 16).

Digital Library

[9]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. 2011. Natural Language Processing (almost) from Scratch. CoRR abs/1103.0398 (2011). http://arxiv.org/abs/1103.0398

[10]

Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware Scheduling for Heterogeneous Datacenters. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 13).

Digital Library

[11]

Tarek Elgamal, Atul Sandur, Klara Nahrstedt, and Gul Agha. 2018. Costless: Optimizing Cost of Serverless Computing through Function Fusion and Placement. CoRR abs/1811.09721 (2018). arXiv:1811.09721 http://arxiv.org/abs/1811.09721

[12]

Sameh Elnikety, Erich Nahum, John Tracey, and Willy Zwaenepoel. 2004. A Method for Transparent Admission Control and Request Scheduling in e-Commerce Web Sites. In Proceedings of the 13th International Conference on World Wide Web (WWW 04).

Digital Library

[13]

A. Gheith, R. Rajamony, P. Bohrer, K. Agarwal, M. Kistler, B. L. White Eagle, C. A. Hambridge, J. B. Carter, and T. Kaplinger. 2016. IBM Bluemix Mobile Cloud Services. IBM Journal of Research and Development 60, 2-3 (March 2016), 7:1--7:12.

Digital Library

[14]

Google. 2019. Serverless Environment to Build and Connect Cloud Services. https://cloud.google.com/functions/. (2019).

[15]

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).

[16]

Johann Hauswald, Yiping Kang, Michael A. Laurenzano, Quan Chen, Cheng Li, Ronald Dreslinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2015. Djinn and Tonic: DNN as a Service and Its Implications for Future Warehouse Scale Computers. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 15).

Digital Library

[17]

Johann Hauswald, Michael A. Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski, Arjun Khurana, Ron Dreslinski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, and Jason Mars. 2015. Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 15).

Digital Library

[18]

Yuxiong He, Sameh Elnikety, James Larus, and Chenyu Yan. 2012. Zeta: Scheduling Interactive Services with Partial Execution. In Proceedings of the Third ACM Symposium on Cloud Computing (SoCC 12).

Digital Library

[19]

Scott Hendrickson, Stephen Sturdevant, Tyler Harter, Venkateshwaran Venkataramani, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Serverless Computation with OpenLambda. In 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 16).

Digital Library

[20]

IBM. 2019. IBM Cloud Functions. https://www.ibm.com/cloud/functions. (2019).

[21]

Muhammad Hussain Iqbal and Tariq Rahim Soomro. 2015. Big data analysis: Apache storm perspective. International journal of computer trends and technology 19, 1 (2015), 9--14.

[22]

Virajith Jalaparti, Peter Bodik, Srikanth Kandula, Ishai Menache, Mikhail Rybalkin, and Chenyu Yan. 2013. Speeding Up Distributed Request-response Workflows. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM.

Digital Library

[23]

Adwait Jog, Evgeny Bolotin, Zvika Guz, Mike Parker, Stephen W. Keckler, Mahmut T. Kandemir, and Chita R. Das. 2014. Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications. In Proceedings of Workshop on General Purpose Processing Using GPUs (GPGPU 14).

[24]

Evangelia Kalyvianaki, Marco Fiscato, Theodoros Salonidis, and Peter Pietzuch. 2016. THEMIS: Fairness in Federated Stream Processing Under Overload. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD 16).

Digital Library

[25]

Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a Warehouse-scale Computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 15).

Digital Library

[26]

S. Kanev, K. Hazelwood, G. Y. Wei, and D. Brooks. 2014. Tradeoffs between power management and tail latency in warehouse-scale applications. In IEEE International Symposium on Workload Characterization (IISWC 14).

[27]

R. S. Kannan, A. Jain, M. A. Laurenzano, L. Tang, and J. Mars. 2018. Proctor: Detecting and Investigating Interference in Shared Datacenters. In 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 18).

[28]

O. Kayiran, N. C. Nachiappan, A. Jog, R. Ausavarungnirun, M. T. Kandemir, G. H. Loh, O. Mutlu, and C. R. Das. 2014. Managing GPU Concurrency in Heterogeneous Architectures. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 14).

Digital Library

[29]

Kris Kobylinski. 2015. Agile Software Development for Bluemix with IBM DevOps Services. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering (CASCON 15).

Digital Library

[30]

Jason Mars and Lingjia Tang. 2013. Whare-map: Heterogeneity in "Homogeneous" Warehouse-scale Computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA 13).

Digital Library

[31]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 11).

Digital Library

[32]

Sean Marston, Zhi Li, Subhajyoti Bandyopadhyay, Juheng Zhang, and Anand Ghalsasi. 2011. Cloud Computing - The Business Perspective. Decis. Support Syst. 51, 1 (April 2011), 14.

Digital Library

[33]

David Meisner and Thomas F. Wenisch. 2012. DreamWeaver: Architectural Support for Deep Sleep. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 12).

Digital Library

[34]

V. Nagarajan, R. Hariharan, V. Srinivasan, R. S. Kannan, P. Thinakaran, V. Sankaran, B. Vasudevan, R. Mukundrajan, N. C. Nachiappan, A. Sridharan, K. P. Saravanan, V. Adhinarayanan, and V. V. Sankaranarayanan. 2012. SCOC IP Cores for Custom Built Supercomputing Nodes. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI 12).

Digital Library

[35]

V. Nagarajan, K. Lakshminarasimhan, A. Sridhar, P. Thinakaran, R. Hariharan, V. Srinivasan, R. S. Kannan, and A. Sridharan. 2013. Performance and energy efficient cache system design: Simultaneous execution of multiple applications on heterogeneous cores. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI 13).

[36]

V. Nagarajan, V. Srinivasan, R. Kannan, P. Thinakaran, R. Hariharan, B. Vasudevan, N. C. Nachiappan, K. P. Saravanan, A. Sridharan, V. Sankaran, V. Adhinarayanan, V. S. Vignesh, and R. Mukundrajan. 2012. Compilation Accelerator on Silicon. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI 12).

Digital Library

[37]

Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et al. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding.

[38]

Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA 14).

Digital Library

[39]

Amazon Web Services. 2017. The Image Recognition and Processing Backend reference architecture demonstrates how to use AWS Step Functions to orchestrate a serverless processing workflow using AWS Lambda, Amazon S3, Amazon DynamoDB and Amazon Rekognition. https://github.com/aws-samples/lambda-refarch-imagerecognition. (2017).

[40]

Samuel L Smith, Pieter-Jan Kindermans, and Quoc V Le. 2017. Don't Decay the Learning Rate, Increase the Batch Size. arXiv preprint arXiv:1711.00489 (2017).

[41]

Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-application Interference at Shared Caches and Main Memory. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO 15).

Digital Library

[42]

Lalith Suresh, Peter Bodik, Ishai Menache, Marco Canini, and Florin Ciucu. 2017. Distributed Resource Management Across Process Boundaries. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC 17).

Digital Library

[43]

Shanjiang Tang, BingSheng He, Shuhao Zhang, and Zhaojie Niu. 2016. Elastic Multi-resource Fairness: Balancing Fairness and Efficiency in Coupled CPU-GPU Architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 16).

Digital Library

[44]

Prashanth Thinakaran, Jashwant Raj Gunasekaran, Bikash Sharma, Mahmut Taylan Kandemir, and Chita R Das. 2017. Phoenix: a constraint-aware scheduler for heterogeneous datacenters. In IEEE 37th International Conference on Distributed Computing Systems (ICDCS 17).

[45]

Prashanth Thinakaran, Jashwant Raj, Bikash Sharma, Mahmut T Kandemir, and Chita R Das. 2018. The Curious Case of Container Orchestration and Scheduling in GPU-based Datacenters. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 18).

Digital Library

[46]

T. Ueda, T. Nakaike, and M. Ohara. 2016. Workload characterization for microservices. In IEEE International Symposium on Workload Characterization (IISWC 16).

[47]

Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar. 2015. TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO 15).

Digital Library

[48]

Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA 13).

Digital Library

[49]

Hailong Yang, Quan Chen, Moeiz Riaz, Zhongzhi Luan, Lingjia Tang, and Jason Mars. 2017. PowerChief: Intelligent Power Allocation for Multi-Stage Applications to Improve Responsiveness on Power Constrained CMP. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA 17).

Digital Library

[50]

Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM 59, 11 (Oct. 2016), 10.

Digital Library

[51]

Yilei Zhang, Zibin Zheng, and M.R. Lyu. 2011. Exploring Latent Features for Memory-Based QoS Prediction in Cloud Computing. In IEEE Symposium on Reliable Distributed Systems (SRDS 11).

Digital Library

Cited By

Liao HLiu TGuo JHuang BYang DDing J(2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
https://doi.org/10.1109/TPDS.2024.3494879
Faisal AMartin NBashir HLamelas SDogar FGavrilovska ATerry D(2024)When will my ML job finish? toward providing completion time estimates through predictability-centric schedulingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691964(487-505)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691964
Dakić P(2024)Software compliance in various industries using CI/CD, dynamic microservices, and containersOpen Computer Science10.1515/comp-2024-001314:1Online publication date: 12-Jul-2024
https://doi.org/10.1515/comp-2024-0013
Show More Cited By

Index Terms

GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks
1. Software and its engineering
  1. Software notations and tools
    1. Development frameworks and environments
      1. Software as a service orchestration system

Recommendations

Performance Modeling for Cloud Microservice Applications
ICPE '19: Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering

Microservices enable a fine-grained control over the cloud applications that they constitute and thus became widely-used in the industry. Each microservice implements its own functionality and communicates with other microservices through language- and ...
Architectural Principles for Cloud Software
Special Issue on Internetware and Devops and Regular Papers

A cloud is a distributed Internet-based software system providing resources as tiered services. Through service-orientation and virtualization for resource provisioning, cloud applications can be deployed and managed dynamically. We discuss the building ...
SmartVM: a SLA-aware microservice deployment framework

Software-as-a-Service is becoming the prevalent way of software delivery. The popularisation of microservices architecture and containers has facilitated the efficient development of complex SaaS applications. Yet, for average SaaS vendors, there are a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019

March 2019

714 pages

ISBN:9781450362818

DOI:10.1145/3302424

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

EuroSys '19

Sponsor:

SIGOPS

EuroSys '19: Fourteenth EuroSys Conference 2019

March 25 - 28, 2019

Dresden, Germany

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

106
Total Citations
View Citations
1,427
Total Downloads

Downloads (Last 12 months)165
Downloads (Last 6 weeks)8

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liao HLiu TGuo JHuang BYang DDing J(2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
https://doi.org/10.1109/TPDS.2024.3494879
Faisal AMartin NBashir HLamelas SDogar FGavrilovska ATerry D(2024)When will my ML job finish? toward providing completion time estimates through predictability-centric schedulingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691964(487-505)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691964
Dakić P(2024)Software compliance in various industries using CI/CD, dynamic microservices, and containersOpen Computer Science10.1515/comp-2024-001314:1Online publication date: 12-Jul-2024
https://doi.org/10.1515/comp-2024-0013
Gao YCasale GSinghal R(2024)Performance Modeling of Distributed Data Processing in Microservice ApplicationsACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/36872659:4(1-30)Online publication date: 20-Aug-2024
https://dl.acm.org/doi/10.1145/3687265
Razavi KGhafouri SMühlhäuser MJamshidi PWang L(2024)SpongeProceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655833(184-191)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3642970.3655833
Hui XXu YGuo ZShen XMencagli GDazzi PLowenthal DBadia R(2024)ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658657(42-55)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658657
Bai HXu MYe KBuyya RXu C(2024)DRPC: Distributed Reinforcement Learning Approach for Scalable Resource Provisioning in Container-based ClustersIEEE Transactions on Services Computing10.1109/TSC.2024.3433388(1-12)Online publication date: 2024
https://doi.org/10.1109/TSC.2024.3433388
Shi JFu KWang JChen QZeng DGuo M(2024)Adaptive QoS-Aware Microservice Deployment With Excessive Loads via Intra- and Inter-Datacenter SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342593135:9(1565-1582)Online publication date: Sep-2024
https://doi.org/10.1109/TPDS.2024.3425931
Wen ZChen QNiu YSong ZDeng QLiu F(2024)Joint Optimization of Parallelism and Resource Configuration for Serverless Function StepsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.336513435:4(560-576)Online publication date: Apr-2024
https://doi.org/10.1109/TPDS.2024.3365134
Wu JWang LJin QLiu F(2024)Graft: Efficient Inference Serving for Hybrid Deep Learning With SLO Guarantees via DNN Re-AlignmentIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334051835:2(280-296)Online publication date: Feb-2024
https://doi.org/10.1109/TPDS.2023.3340518
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents