skip to main content
10.1145/3302424.3303958acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks

Published: 25 March 2019 Publication History

Abstract

The microservice architecture has dramatically reduced user effort in adopting and maintaining servers by providing a catalog of functions as services that can be used as building blocks to construct applications. This has enabled datacenter operators to look at managing datacenter hosting microservices quite differently from traditional infrastructures. Such a paradigm shift calls for a need to rethink resource management strategies employed in such execution environments. We observe that the visibility enabled by a microservices execution framework can be exploited to achieve high throughput and resource utilization while still meeting Service Level Agreements, especially in multi-tenant execution scenarios.
In this study, we present GrandSLAm, a microservice execution framework that improves utilization of datacenters hosting microservices. GrandSLAm estimates time of completion of requests propagating through individual microservice stages within an application. It then leverages this estimate to drive a runtime system that dynamically batches and reorders requests at each microservice in a manner where individual jobs meet their respective target latency while achieving high throughput. GrandSLAm significantly increases throughput by up to 3x compared to the our baseline, without violating SLAs for a wide range of real-world AI and ML applications.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).
[2]
Paula Aguilera, Katherine Morrow, and Nam Sung Kim. 2014. Fair share: Allocation of GPU resources for both performance and fairness. In Proceedings of the IEEE 32nd International Conference on Computer Design (ICCD 14).
[3]
Amazon. 2019. What is AWS Lambda? https://docs.aws.amazon.com/lambda/latest/dg/welcome.html. (2019).
[4]
Amazon. 2019. What is AWS Step Functions? http://docs.aws.amazon.com/step-functions/latest/dg/welcome.html. (2019).
[5]
Microsoft Azure. 2019. Azure Functions Serverless Architecture. https://azure.microsoft.com/en-us/services/functions/. (2019).
[6]
Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture 8, 3 (2013), 1--154.
[7]
Quan Chen, Hailong Yang, Minyi Guo, Ram Srivatsa Kannan, Jason Mars, and Lingjia Tang. 2017. Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 17).
[8]
Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 16).
[9]
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. 2011. Natural Language Processing (almost) from Scratch. CoRR abs/1103.0398 (2011). http://arxiv.org/abs/1103.0398
[10]
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware Scheduling for Heterogeneous Datacenters. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 13).
[11]
Tarek Elgamal, Atul Sandur, Klara Nahrstedt, and Gul Agha. 2018. Costless: Optimizing Cost of Serverless Computing through Function Fusion and Placement. CoRR abs/1811.09721 (2018). arXiv:1811.09721 http://arxiv.org/abs/1811.09721
[12]
Sameh Elnikety, Erich Nahum, John Tracey, and Willy Zwaenepoel. 2004. A Method for Transparent Admission Control and Request Scheduling in e-Commerce Web Sites. In Proceedings of the 13th International Conference on World Wide Web (WWW 04).
[13]
A. Gheith, R. Rajamony, P. Bohrer, K. Agarwal, M. Kistler, B. L. White Eagle, C. A. Hambridge, J. B. Carter, and T. Kaplinger. 2016. IBM Bluemix Mobile Cloud Services. IBM Journal of Research and Development 60, 2-3 (March 2016), 7:1--7:12.
[14]
Google. 2019. Serverless Environment to Build and Connect Cloud Services. https://cloud.google.com/functions/. (2019).
[15]
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).
[16]
Johann Hauswald, Yiping Kang, Michael A. Laurenzano, Quan Chen, Cheng Li, Ronald Dreslinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2015. Djinn and Tonic: DNN as a Service and Its Implications for Future Warehouse Scale Computers. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 15).
[17]
Johann Hauswald, Michael A. Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski, Arjun Khurana, Ron Dreslinski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, and Jason Mars. 2015. Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 15).
[18]
Yuxiong He, Sameh Elnikety, James Larus, and Chenyu Yan. 2012. Zeta: Scheduling Interactive Services with Partial Execution. In Proceedings of the Third ACM Symposium on Cloud Computing (SoCC 12).
[19]
Scott Hendrickson, Stephen Sturdevant, Tyler Harter, Venkateshwaran Venkataramani, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Serverless Computation with OpenLambda. In 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 16).
[20]
IBM. 2019. IBM Cloud Functions. https://www.ibm.com/cloud/functions. (2019).
[21]
Muhammad Hussain Iqbal and Tariq Rahim Soomro. 2015. Big data analysis: Apache storm perspective. International journal of computer trends and technology 19, 1 (2015), 9--14.
[22]
Virajith Jalaparti, Peter Bodik, Srikanth Kandula, Ishai Menache, Mikhail Rybalkin, and Chenyu Yan. 2013. Speeding Up Distributed Request-response Workflows. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM.
[23]
Adwait Jog, Evgeny Bolotin, Zvika Guz, Mike Parker, Stephen W. Keckler, Mahmut T. Kandemir, and Chita R. Das. 2014. Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications. In Proceedings of Workshop on General Purpose Processing Using GPUs (GPGPU 14).
[24]
Evangelia Kalyvianaki, Marco Fiscato, Theodoros Salonidis, and Peter Pietzuch. 2016. THEMIS: Fairness in Federated Stream Processing Under Overload. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD 16).
[25]
Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a Warehouse-scale Computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 15).
[26]
S. Kanev, K. Hazelwood, G. Y. Wei, and D. Brooks. 2014. Tradeoffs between power management and tail latency in warehouse-scale applications. In IEEE International Symposium on Workload Characterization (IISWC 14).
[27]
R. S. Kannan, A. Jain, M. A. Laurenzano, L. Tang, and J. Mars. 2018. Proctor: Detecting and Investigating Interference in Shared Datacenters. In 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 18).
[28]
O. Kayiran, N. C. Nachiappan, A. Jog, R. Ausavarungnirun, M. T. Kandemir, G. H. Loh, O. Mutlu, and C. R. Das. 2014. Managing GPU Concurrency in Heterogeneous Architectures. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 14).
[29]
Kris Kobylinski. 2015. Agile Software Development for Bluemix with IBM DevOps Services. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering (CASCON 15).
[30]
Jason Mars and Lingjia Tang. 2013. Whare-map: Heterogeneity in "Homogeneous" Warehouse-scale Computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA 13).
[31]
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 11).
[32]
Sean Marston, Zhi Li, Subhajyoti Bandyopadhyay, Juheng Zhang, and Anand Ghalsasi. 2011. Cloud Computing - The Business Perspective. Decis. Support Syst. 51, 1 (April 2011), 14.
[33]
David Meisner and Thomas F. Wenisch. 2012. DreamWeaver: Architectural Support for Deep Sleep. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 12).
[34]
V. Nagarajan, R. Hariharan, V. Srinivasan, R. S. Kannan, P. Thinakaran, V. Sankaran, B. Vasudevan, R. Mukundrajan, N. C. Nachiappan, A. Sridharan, K. P. Saravanan, V. Adhinarayanan, and V. V. Sankaranarayanan. 2012. SCOC IP Cores for Custom Built Supercomputing Nodes. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI 12).
[35]
V. Nagarajan, K. Lakshminarasimhan, A. Sridhar, P. Thinakaran, R. Hariharan, V. Srinivasan, R. S. Kannan, and A. Sridharan. 2013. Performance and energy efficient cache system design: Simultaneous execution of multiple applications on heterogeneous cores. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI 13).
[36]
V. Nagarajan, V. Srinivasan, R. Kannan, P. Thinakaran, R. Hariharan, B. Vasudevan, N. C. Nachiappan, K. P. Saravanan, A. Sridharan, V. Sankaran, V. Adhinarayanan, V. S. Vignesh, and R. Mukundrajan. 2012. Compilation Accelerator on Silicon. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI 12).
[37]
Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et al. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding.
[38]
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA 14).
[39]
Amazon Web Services. 2017. The Image Recognition and Processing Backend reference architecture demonstrates how to use AWS Step Functions to orchestrate a serverless processing workflow using AWS Lambda, Amazon S3, Amazon DynamoDB and Amazon Rekognition. https://github.com/aws-samples/lambda-refarch-imagerecognition. (2017).
[40]
Samuel L Smith, Pieter-Jan Kindermans, and Quoc V Le. 2017. Don't Decay the Learning Rate, Increase the Batch Size. arXiv preprint arXiv:1711.00489 (2017).
[41]
Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-application Interference at Shared Caches and Main Memory. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO 15).
[42]
Lalith Suresh, Peter Bodik, Ishai Menache, Marco Canini, and Florin Ciucu. 2017. Distributed Resource Management Across Process Boundaries. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC 17).
[43]
Shanjiang Tang, BingSheng He, Shuhao Zhang, and Zhaojie Niu. 2016. Elastic Multi-resource Fairness: Balancing Fairness and Efficiency in Coupled CPU-GPU Architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 16).
[44]
Prashanth Thinakaran, Jashwant Raj Gunasekaran, Bikash Sharma, Mahmut Taylan Kandemir, and Chita R Das. 2017. Phoenix: a constraint-aware scheduler for heterogeneous datacenters. In IEEE 37th International Conference on Distributed Computing Systems (ICDCS 17).
[45]
Prashanth Thinakaran, Jashwant Raj, Bikash Sharma, Mahmut T Kandemir, and Chita R Das. 2018. The Curious Case of Container Orchestration and Scheduling in GPU-based Datacenters. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 18).
[46]
T. Ueda, T. Nakaike, and M. Ohara. 2016. Workload characterization for microservices. In IEEE International Symposium on Workload Characterization (IISWC 16).
[47]
Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar. 2015. TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO 15).
[48]
Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA 13).
[49]
Hailong Yang, Quan Chen, Moeiz Riaz, Zhongzhi Luan, Lingjia Tang, and Jason Mars. 2017. PowerChief: Intelligent Power Allocation for Multi-Stage Applications to Improve Responsiveness on Power Constrained CMP. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA 17).
[50]
Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM 59, 11 (Oct. 2016), 10.
[51]
Yilei Zhang, Zibin Zheng, and M.R. Lyu. 2011. Exploring Latent Features for Memory-Based QoS Prediction in Cloud Computing. In IEEE Symposium on Reliable Distributed Systems (SRDS 11).

Cited By

View all
  • (2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
  • (2024)When will my ML job finish? toward providing completion time estimates through predictability-centric schedulingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691964(487-505)Online publication date: 10-Jul-2024
  • (2024)Software compliance in various industries using CI/CD, dynamic microservices, and containersOpen Computer Science10.1515/comp-2024-001314:1Online publication date: 12-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019
March 2019
714 pages
ISBN:9781450362818
DOI:10.1145/3302424
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Microservice
  2. Systems and Machine Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroSys '19
Sponsor:
EuroSys '19: Fourteenth EuroSys Conference 2019
March 25 - 28, 2019
Dresden, Germany

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)165
  • Downloads (Last 6 weeks)8
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
  • (2024)When will my ML job finish? toward providing completion time estimates through predictability-centric schedulingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691964(487-505)Online publication date: 10-Jul-2024
  • (2024)Software compliance in various industries using CI/CD, dynamic microservices, and containersOpen Computer Science10.1515/comp-2024-001314:1Online publication date: 12-Jul-2024
  • (2024)Performance Modeling of Distributed Data Processing in Microservice ApplicationsACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/36872659:4(1-30)Online publication date: 20-Aug-2024
  • (2024)SpongeProceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655833(184-191)Online publication date: 22-Apr-2024
  • (2024)ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658657(42-55)Online publication date: 3-Jun-2024
  • (2024)DRPC: Distributed Reinforcement Learning Approach for Scalable Resource Provisioning in Container-based ClustersIEEE Transactions on Services Computing10.1109/TSC.2024.3433388(1-12)Online publication date: 2024
  • (2024)Adaptive QoS-Aware Microservice Deployment With Excessive Loads via Intra- and Inter-Datacenter SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342593135:9(1565-1582)Online publication date: Sep-2024
  • (2024)Joint Optimization of Parallelism and Resource Configuration for Serverless Function StepsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.336513435:4(560-576)Online publication date: Apr-2024
  • (2024)Graft: Efficient Inference Serving for Hybrid Deep Learning With SLO Guarantees via DNN Re-AlignmentIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334051835:2(280-296)Online publication date: Feb-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media