ABSTRACT
Containerized deployment of microservices has been becoming prevalent, as it provides flexible deployment and elastic resource configuration. For high concurrency and fault tolerance, multiple container replicas are often deployed for each microservice component, but this may induce heavy cross-machine traffic and degrades the performance of microservice applications. Traffic localization tries to put containers with heavy communication traffic on the same machine to reduce cross-machine traffic. However, it is still very common to have the containers with heavy traffic on different machines, especially under multi-replica deployment, due to the insufficient resources of a physical machine. To this end, we develop a network-aware scheduling system OptTraffic, which realizes optimized traffic scheduling for containerized microservices. OptTraffic estimates the traffic between each pair of containers in a lightweight manner by combining a simple math calculation with coarse-grained monitoring, then it proposes an efficient traffic allocation algorithm and leverages dynamic scheduling with multiple optimizations to minimize the cross-machine traffic without sacrificing resource usage balance. Experiments show that under multi-replica deployment, OptTraffic can save up to 47% of the network bandwidth, while reducing the P99 latency by 28%-45%, compared to Kubernetes and existing traffic localization designs for real-world microservice applications.
- 2023. Amazon Microservices. https://aws.amazon.com/microservices/.Google Scholar
- Marcelo Amaral, Tatsuhiro Chiba, Scott Trent, Takeshi Yoshimura, and Sunyanan Choochotkaew. 2022. MicroLens: A Performance Analysis Framework for Microservices Using Hidden Metrics With BPF. In IEEE CLOUD.Google Scholar
- Apple Microservices 2022. Apple Microservices. https://www.apple.com/.Google Scholar
- Ataollah Fatahi Baarzi and George Kesidis. 2021. SHOWAR: Right-Sizing And Efficient Scheduling of Microservices. In Proc. of the ACM SoCC.Google ScholarDigital Library
- Liang Bao, Chase Wu, Xiaoxuan Bu, Nana Ren, and Mengqing Shen. 2019. Performance modeling and workflow scheduling of microservice-based applications in clouds. IEEE Trans. Parallel Distributed Syst. (2019).Google ScholarDigital Library
- Lianjie Cao and Puneet Sharma. 2021. Co-Locating Containerized Workload Using Service Mesh Telemetry. In Proc. of the ACM CoNEXT.Google ScholarDigital Library
- Quan Chen, Zhenning Wang, Jingwen Leng, Chao Li, Wenli Zheng, and Minyi Guo. 2019. Avalon: Towards QoS Awareness and Improved Utilization through Multi-Resource Management in Datacenters. In Proc. of the ACM ICS.Google ScholarDigital Library
- Quan Chen, Shuai Xue, Shang Zhao, Shanpei Chen, Yihao Wu, Yu Xu, Zhuo Song, Tao Ma, Yong Yang, and Minyi Guo. 2020. Alita: Comprehensive Performance Isolation through Bias Resource Management for Public Clouds. In IEEE SC.Google Scholar
- Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers. In Proc. of the ACM ASPLOS.Google ScholarDigital Library
- Shuang Chen, Christina Delimitrou, and José F. Martínez. 2019. PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services. In ACM ASPLOS.Google Scholar
- Docker Swarm 2022. Docker Swarm. https://docs.docker.com/engine/swarm/.Google Scholar
- eBPF 2023. The Linux Foundation.https://www.ebpf.io/.Google Scholar
- Kaihua Fu, Wei Zhang, Quan Chen, Deze Zeng, and Minyi Guo. 2022. Adaptive Resource Efficient Microservice Deployment in Cloud-Edge Continuum. IEEE Trans. Parallel Distributed Syst. (2022).Google ScholarDigital Library
- Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla, and Christina Delimitrou. 2019. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems. In Proc. of the ACM ASPLOS.Google ScholarDigital Library
- Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, and Sriram Rao. 2018. Medea: Scheduling of Long Running Applications in Shared Production Clusters. In Proc. of the ACM EuroSys.Google Scholar
- Alim Ul Gias, Giuliano Casale, and Murray Woodside. 2019. ATOM: Model-Driven Autoscaling for Microservices. In Proc. of the IEEE ICDCS.Google ScholarCross Ref
- Kavya Govindarajan, Chander Govindarajan, and Mudit Verma. 2022. Network Aware Container Orchestration for Telco Workloads. In IEEE CLOUD.Google Scholar
- Ori Hadary, Luke Marshall, Ishai Menache, Abhisek Pan, Esaias E Greeff, David Dion, Star Dorminey, Shailesh Joshi, Yang Chen, Mark Russinovich, and Thomas Moscibroda. 2020. Protean: VM Allocation Service at Scale. In Proc. of the OSDI.Google Scholar
- iftop 2023. iftop. https://github.com/soarpenguin/iftop/.Google Scholar
- Calin Iorgulescu, Reza Azimi, Youngjin Kwon, Sameh Elnikety, Manoj Syamala, Vivek Narasayya, Herodotos Herodotou, Paulo Tomita, Alex Chen, Jack Zhang, and Junhua Wang. 2018. PerfIso: Performance Isolation for Commercial Latency-Sensitive Services. In Proc. of the USENIX ATC.Google Scholar
- Istio 2022. Istio. https://istio.io/.Google Scholar
- Seyyed Ahmad Javadi, Amoghavarsha Suresh, Muhammad Wajahat, and Anshul Gandhi. 2019. Scavenger: A Black-Box Batch Workload Resource Manager for Improving Utilization in Cloud Environments. In Proc. of the ACM SoCC.Google ScholarDigital Library
- Zhipeng Jia and Emmett Witchel. 2021. Nightcore: Efficient and Scalable Serverless Computing for Latency-Sensitive, Interactive Microservices. Proc. of the ACM ASPLOS (2021).Google Scholar
- Shweta Khare, Hongyang Sun, Julien Gascon-Samson, Kaiwen Zhang, Aniruddha Gokhale, Yogesh Barve, Anirban Bhattacharjee, and Xenofon Koutsoukos. 2019. Linearize, Predict and Place: Minimizing the Makespan for Edge-Based Stream Processing of Directed Acyclic Graphs. In Proc. of the ACM/IEEE SEC.Google ScholarDigital Library
- Kubernetes 2022. Kubernetes. https://kubernetes.io/.Google Scholar
- Neeraj Kulkarni, Gonzalo Gonzalez-Pumariega, Amulya Khurana, Christine A. Shoemaker, Christina Delimitrou, and David H. Albonesi. 2020. CuttleSys: Data-Driven Resource Management for Interactive Services on Reconfigurable Multicores. In Proc. of the ACM/IEEE MICRO.Google ScholarCross Ref
- Jiaxin Lei, Manish Munikar, Kun Suo, Hui Lu, and Jia Rao. 2021. Parallelizing Packet Processing in Container Overlay Networks. In ACM EuroSys.Google Scholar
- Suyi Li, Luping Wang, Wei Wang, Yinghao Yu, and Bo Li. 2021. George: Learning to Place Long-Lived Containers in Large Clusters with Operation Constraints. In Proc. of the ACM SoCC.Google ScholarDigital Library
- Shutian Luo, Huanle Xu, Chengzhi Lu, Kejiang Ye, Guoyao Xu, Liping Zhang, Jian He, and Cheng-Zhong Xu. 2022. An In-Depth Study of Microservice Call Graph and Runtime Performance. IEEE Trans. Parallel Distributed Syst. (2022).Google ScholarCross Ref
- Liang Lv, Yuchao Zhang, Yusen Li, Ke Xu, Dan Wang, Wendong Wang, Minghui Li, Xuan Cao, and Qingqing Liang. 2019. Communication-aware container placement and reassignment in large-scale internet data centers. IEEE JSAC (2019).Google ScholarCross Ref
- Kasper Grud Skat Madsen, Yongluan Zhou, and Jianneng Cao. 2017. Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine. In IEEE ICDE.Google Scholar
- Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource Management with Deep Reinforcement Learning. In ACM HotNets.Google Scholar
- Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning Scheduling Algorithms for Data Processing Clusters. In Proc. of the ACM SIGCOMM.Google ScholarDigital Library
- Shanka Subhra Mondal, Nikhil Sheoran, and Subrata Mitra. 2021. Scheduling of Time-Varying Workloads Using Reinforcement Learning. AAAI (2021).Google Scholar
- Netflix Microservices 2022. Netflix Microservices. https://www.netflix.com/.Google Scholar
- Nginx 2022. Nginx. https://www.nginx.com/.Google Scholar
- Rajiv Nishtala, Vinicius Petrucci, Paul Carpenter, and Magnus Sjalander. 2020. Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services. In Proc. of the IEEE HPCA.Google ScholarCross Ref
- Open Shift 2022. Open Shift. https://www.redhat.com/en/technologies/cloud-computing/openshift.Google Scholar
- Pu Pang, Quan Chen, Deze Zeng, and Minyi Guo. 2021. Adaptive Preference-Aware Co-Location for Improving Resource Utilization of Power Constrained Datacenters. IEEE Trans. Parallel Distributed Syst. (2021).Google ScholarCross Ref
- Tirthak Patel and Devesh Tiwari. 2020. CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers. In Proc. of the IEEE HPCA.Google ScholarCross Ref
- Prometheus 2022. Prometheus. https://prometheus.io/.Google Scholar
- Haoran Qiu, Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. 2020. FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices. In Proc. of the USENIX OSDI.Google Scholar
- Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proc. of the ACM SoCC.Google ScholarDigital Library
- Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemyslaw Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, and John Wilkes. 2020. Autopilot: Workload Autoscaling at Google. In Proc. of the ACM EuroSys.Google ScholarDigital Library
- Jiuchen Shi, Jiawen Wang, Kaihua Fu, Quan Chen, Deze Zeng, and Minyi Guo. 2021. QoS-awareness of Microservices with Excessive Loads via Inter-Datacenter Scheduling. In Proc. of the IEEE IPDPS.Google Scholar
- sockperf 2023. sockperf. https://github.com/Mellanox/sockperf.Google Scholar
- Akshitha Sriraman and Thomas F. Wenisch. 2018. μ Suite: A Benchmark Suite for Microservices. In Proc. of the IEEE IISWC.Google ScholarCross Ref
- Kun Suo, Yong Zhao, Wei Chen, and Jia Rao. 2018. An Analysis and Empirical Study of Container Networks. In Proc. of the IEEE INFOCOM.Google ScholarDigital Library
- Cory Thoma, Alexandros Labrinidis, and Adam J. Lee. 2014. Automated operator placement in distributed Data Stream Management Systems subject to user constraints. In Proc. of the IEEE ICDEW. IEEE Computer Society.Google Scholar
- Muhammad Tirmazi, Adam Barker, Nan Deng, Md E. Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. 2020. Borg: The next Generation. In Proc. of the ACM EuroSys.Google ScholarDigital Library
- Sheng Wang, Zhijun Ding, and Changjun Jiang. 2021. Elastic Scheduling for Microservice Applications in Clouds. IEEE Trans. Parallel Distributed Syst. (2021).Google ScholarCross Ref
- Xinkai Wang, Chao Li, Lu Zhang, Xiaofeng Hou, Quan Chen, and Minyi Guo. 2022. Exploring Efficient Microservice Level Parallelism. In IEEE IPDPS.Google Scholar
- Xiaodong Wang and José F. Martínez. 2015. XChange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures. In Proc. of the IEEE HPCA.Google ScholarCross Ref
- Łukasz Wojciechowski, Krzysztof Opasiak, Jakub Latusek, Maciej Wereski, Victor Morales, Taewan Kim, and Moonki Hong. 2021. NetMARKS: Network metrics-AwaRe kubernetes scheduler powered by service mesh. In IEEE INFOCOM.Google Scholar
- wrk2 2022. wrk2. https://github.com/giltene/wrk2.Google Scholar
- Zhaorui Wu, Yuhui Deng, Hao Feng, Yi Zhou, and Geyong Min. 2021. Blender: A traffic-aware container placement for containerized data centers. In IEEE DATE.Google Scholar
- Guoyao Xu Xu, Cheng-Zhong Xu, and Song Jiang. 2016. Prophet: Scheduling Executors with Time-Varying Resource Demands on Data-Parallel Computation Frameworks. In Proc. of the IEEE ICAC.Google ScholarCross Ref
- Tianlong Yu, Shadi Abdollahian Noghabi, Shachar Raindel, Hongqiang Liu, Jitu Padhye, and Vyas Sekar. 2016. FreeFlow: High Performance Container Networking. In Proc. of the ACM HotNets.Google ScholarDigital Library
- Wei Zhang, Quan Chen, Ningxin Zheng, Weihao Cui, Kaihua Fu, and Minyi Guo. 2022. Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs. IEEE Trans. Comput. (2022).Google ScholarCross Ref
- Laiping Zhao, Yanan Yang, Kaixuan Zhang, Xiaobo Zhou, Tie Qiu, Keqiu Li, and Yungang Bao. 2020. Rhythm: Component-Distinguishable Workload Deployment in Datacenters. In Proc. of the ACM EuroSys.Google ScholarDigital Library
- Diyu Zhou and Yuval Tamir. 2022. RRC: Responsive Replicated Containers. In Proc. of the USENIX ATC.Google Scholar
- Danyang Zhuo, Kaiyuan Zhang, Yibo Zhu, Hongqiang Harry Liu, Matthew Rockett, Arvind Krishnamurthy, and Thomas Anderson. 2019. Slim: OS Kernel Support for a Low-Overhead Container Overlay Network. In Proc. of the USENIX ATC.Google Scholar
Index Terms
- On Optimizing Traffic Scheduling for Multi-replica Containerized Microservices
Recommendations
Scheduling of Containerized Resources for Microservices in Cloud
Distributed Computing and Intelligent TechnologyAbstractMost developers consider that microservice-based application design and development can improve scalability and maintainability. The microservices are developed as small independent modules and deployed in containers. The containers are deployed ...
Cost-efficient scheduling algorithms based on beetle antennae search for containerized applications in Kubernetes clouds
AbstractWith the development of cloud-native technologies, Kubernetes becomes the standard of fact for container scheduling. Kubernetes provides service discovery and scheduling of containers, load balancing, service self-healing, elastic scaling, storage ...
Research on Kubernetes' Resource Scheduling Scheme
ICCNS '18: Proceedings of the 8th International Conference on Communication and Network SecurityCurrently, Google's open source container orchestration tool Kubernetes (K8s for short) has become the standard of fact for deploying containerized applications on a large scale in private, public, and hybrid cloud environments. By studying the ...
Comments