research-article

On Optimizing Traffic Scheduling for Multi-replica Containerized Microservices

Authors:
Xianzhi Zhu

Department of Computer Science and Technology, University of Science and Technology of China, China

Department of Computer Science and Technology, University of Science and Technology of China, China

0000-0002-5120-5917
View Profile

,
Yongkun Li

University of Science and Technology of China, China

University of Science and Technology of China, China

0000-0002-3743-8511
View Profile

,
Lulu Yao

University of Science and Technology of China, China

University of Science and Technology of China, China

0000-0001-9116-0330
View Profile

,
Zhihao Qi

University of Science and Technology of China, China

University of Science and Technology of China, China

0009-0005-4501-3539
View Profile

,
Yinlong Xu

Anhui Province Key Laboratory of High Performance Computing, China and University of Science and Technology of China, China

Anhui Province Key Laboratory of High Performance Computing, China and University of Science and Technology of China, China

0000-0001-9586-0561
View Profile

,
Pengcheng Wang

Huawei, China

Huawei, China

0009-0009-8765-2856
View Profile

,
Weiguang Wang

Huawei, China

Huawei, China

0009-0004-3301-465X
View Profile

,
Xia Zhu

Huawei, China

Huawei, China

0009-0006-7228-637X
View Profile

ICPP '23: Proceedings of the 52nd International Conference on Parallel ProcessingAugust 2023Pages 358–368https://doi.org/10.1145/3605573.3605646

Published:13 September 2023Publication History

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

Pages 358–368

ABSTRACT

Containerized deployment of microservices has been becoming prevalent, as it provides flexible deployment and elastic resource configuration. For high concurrency and fault tolerance, multiple container replicas are often deployed for each microservice component, but this may induce heavy cross-machine traffic and degrades the performance of microservice applications. Traffic localization tries to put containers with heavy communication traffic on the same machine to reduce cross-machine traffic. However, it is still very common to have the containers with heavy traffic on different machines, especially under multi-replica deployment, due to the insufficient resources of a physical machine. To this end, we develop a network-aware scheduling system OptTraffic, which realizes optimized traffic scheduling for containerized microservices. OptTraffic estimates the traffic between each pair of containers in a lightweight manner by combining a simple math calculation with coarse-grained monitoring, then it proposes an efficient traffic allocation algorithm and leverages dynamic scheduling with multiple optimizations to minimize the cross-machine traffic without sacrificing resource usage balance. Experiments show that under multi-replica deployment, OptTraffic can save up to 47% of the network bandwidth, while reducing the P99 latency by 28%-45%, compared to Kubernetes and existing traffic localization designs for real-world microservice applications.

References

2023. Amazon Microservices. https://aws.amazon.com/microservices/.Google Scholar
Marcelo Amaral, Tatsuhiro Chiba, Scott Trent, Takeshi Yoshimura, and Sunyanan Choochotkaew. 2022. MicroLens: A Performance Analysis Framework for Microservices Using Hidden Metrics With BPF. In IEEE CLOUD.Google Scholar
Apple Microservices 2022. Apple Microservices. https://www.apple.com/.Google Scholar
Ataollah Fatahi Baarzi and George Kesidis. 2021. SHOWAR: Right-Sizing And Efficient Scheduling of Microservices. In Proc. of the ACM SoCC.Google ScholarDigital Library
Liang Bao, Chase Wu, Xiaoxuan Bu, Nana Ren, and Mengqing Shen. 2019. Performance modeling and workflow scheduling of microservice-based applications in clouds. IEEE Trans. Parallel Distributed Syst. (2019).Google ScholarDigital Library
Lianjie Cao and Puneet Sharma. 2021. Co-Locating Containerized Workload Using Service Mesh Telemetry. In Proc. of the ACM CoNEXT.Google ScholarDigital Library
Quan Chen, Zhenning Wang, Jingwen Leng, Chao Li, Wenli Zheng, and Minyi Guo. 2019. Avalon: Towards QoS Awareness and Improved Utilization through Multi-Resource Management in Datacenters. In Proc. of the ACM ICS.Google ScholarDigital Library
Quan Chen, Shuai Xue, Shang Zhao, Shanpei Chen, Yihao Wu, Yu Xu, Zhuo Song, Tao Ma, Yong Yang, and Minyi Guo. 2020. Alita: Comprehensive Performance Isolation through Bias Resource Management for Public Clouds. In IEEE SC.Google Scholar
Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers. In Proc. of the ACM ASPLOS.Google ScholarDigital Library
Shuang Chen, Christina Delimitrou, and José F. Martínez. 2019. PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services. In ACM ASPLOS.Google Scholar
Docker Swarm 2022. Docker Swarm. https://docs.docker.com/engine/swarm/.Google Scholar
eBPF 2023. The Linux Foundation.https://www.ebpf.io/.Google Scholar
Kaihua Fu, Wei Zhang, Quan Chen, Deze Zeng, and Minyi Guo. 2022. Adaptive Resource Efficient Microservice Deployment in Cloud-Edge Continuum. IEEE Trans. Parallel Distributed Syst. (2022).Google ScholarDigital Library
Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla, and Christina Delimitrou. 2019. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems. In Proc. of the ACM ASPLOS.Google ScholarDigital Library
Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, and Sriram Rao. 2018. Medea: Scheduling of Long Running Applications in Shared Production Clusters. In Proc. of the ACM EuroSys.Google Scholar
Alim Ul Gias, Giuliano Casale, and Murray Woodside. 2019. ATOM: Model-Driven Autoscaling for Microservices. In Proc. of the IEEE ICDCS.Google ScholarCross Ref
Kavya Govindarajan, Chander Govindarajan, and Mudit Verma. 2022. Network Aware Container Orchestration for Telco Workloads. In IEEE CLOUD.Google Scholar
Ori Hadary, Luke Marshall, Ishai Menache, Abhisek Pan, Esaias E Greeff, David Dion, Star Dorminey, Shailesh Joshi, Yang Chen, Mark Russinovich, and Thomas Moscibroda. 2020. Protean: VM Allocation Service at Scale. In Proc. of the OSDI.Google Scholar
iftop 2023. iftop. https://github.com/soarpenguin/iftop/.Google Scholar
Calin Iorgulescu, Reza Azimi, Youngjin Kwon, Sameh Elnikety, Manoj Syamala, Vivek Narasayya, Herodotos Herodotou, Paulo Tomita, Alex Chen, Jack Zhang, and Junhua Wang. 2018. PerfIso: Performance Isolation for Commercial Latency-Sensitive Services. In Proc. of the USENIX ATC.Google Scholar
Istio 2022. Istio. https://istio.io/.Google Scholar
Seyyed Ahmad Javadi, Amoghavarsha Suresh, Muhammad Wajahat, and Anshul Gandhi. 2019. Scavenger: A Black-Box Batch Workload Resource Manager for Improving Utilization in Cloud Environments. In Proc. of the ACM SoCC.Google ScholarDigital Library
Zhipeng Jia and Emmett Witchel. 2021. Nightcore: Efficient and Scalable Serverless Computing for Latency-Sensitive, Interactive Microservices. Proc. of the ACM ASPLOS (2021).Google Scholar
Shweta Khare, Hongyang Sun, Julien Gascon-Samson, Kaiwen Zhang, Aniruddha Gokhale, Yogesh Barve, Anirban Bhattacharjee, and Xenofon Koutsoukos. 2019. Linearize, Predict and Place: Minimizing the Makespan for Edge-Based Stream Processing of Directed Acyclic Graphs. In Proc. of the ACM/IEEE SEC.Google ScholarDigital Library
Kubernetes 2022. Kubernetes. https://kubernetes.io/.Google Scholar
Neeraj Kulkarni, Gonzalo Gonzalez-Pumariega, Amulya Khurana, Christine A. Shoemaker, Christina Delimitrou, and David H. Albonesi. 2020. CuttleSys: Data-Driven Resource Management for Interactive Services on Reconfigurable Multicores. In Proc. of the ACM/IEEE MICRO.Google ScholarCross Ref
Jiaxin Lei, Manish Munikar, Kun Suo, Hui Lu, and Jia Rao. 2021. Parallelizing Packet Processing in Container Overlay Networks. In ACM EuroSys.Google Scholar
Suyi Li, Luping Wang, Wei Wang, Yinghao Yu, and Bo Li. 2021. George: Learning to Place Long-Lived Containers in Large Clusters with Operation Constraints. In Proc. of the ACM SoCC.Google ScholarDigital Library
Shutian Luo, Huanle Xu, Chengzhi Lu, Kejiang Ye, Guoyao Xu, Liping Zhang, Jian He, and Cheng-Zhong Xu. 2022. An In-Depth Study of Microservice Call Graph and Runtime Performance. IEEE Trans. Parallel Distributed Syst. (2022).Google ScholarCross Ref
Liang Lv, Yuchao Zhang, Yusen Li, Ke Xu, Dan Wang, Wendong Wang, Minghui Li, Xuan Cao, and Qingqing Liang. 2019. Communication-aware container placement and reassignment in large-scale internet data centers. IEEE JSAC (2019).Google ScholarCross Ref
Kasper Grud Skat Madsen, Yongluan Zhou, and Jianneng Cao. 2017. Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine. In IEEE ICDE.Google Scholar
Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource Management with Deep Reinforcement Learning. In ACM HotNets.Google Scholar
Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning Scheduling Algorithms for Data Processing Clusters. In Proc. of the ACM SIGCOMM.Google ScholarDigital Library
Shanka Subhra Mondal, Nikhil Sheoran, and Subrata Mitra. 2021. Scheduling of Time-Varying Workloads Using Reinforcement Learning. AAAI (2021).Google Scholar
Netflix Microservices 2022. Netflix Microservices. https://www.netflix.com/.Google Scholar
Nginx 2022. Nginx. https://www.nginx.com/.Google Scholar
Rajiv Nishtala, Vinicius Petrucci, Paul Carpenter, and Magnus Sjalander. 2020. Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services. In Proc. of the IEEE HPCA.Google ScholarCross Ref
Open Shift 2022. Open Shift. https://www.redhat.com/en/technologies/cloud-computing/openshift.Google Scholar
Pu Pang, Quan Chen, Deze Zeng, and Minyi Guo. 2021. Adaptive Preference-Aware Co-Location for Improving Resource Utilization of Power Constrained Datacenters. IEEE Trans. Parallel Distributed Syst. (2021).Google ScholarCross Ref
Tirthak Patel and Devesh Tiwari. 2020. CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers. In Proc. of the IEEE HPCA.Google ScholarCross Ref
Prometheus 2022. Prometheus. https://prometheus.io/.Google Scholar
Haoran Qiu, Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. 2020. FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices. In Proc. of the USENIX OSDI.Google Scholar
Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proc. of the ACM SoCC.Google ScholarDigital Library
Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemyslaw Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, and John Wilkes. 2020. Autopilot: Workload Autoscaling at Google. In Proc. of the ACM EuroSys.Google ScholarDigital Library
Jiuchen Shi, Jiawen Wang, Kaihua Fu, Quan Chen, Deze Zeng, and Minyi Guo. 2021. QoS-awareness of Microservices with Excessive Loads via Inter-Datacenter Scheduling. In Proc. of the IEEE IPDPS.Google Scholar
sockperf 2023. sockperf. https://github.com/Mellanox/sockperf.Google Scholar
Akshitha Sriraman and Thomas F. Wenisch. 2018. μ Suite: A Benchmark Suite for Microservices. In Proc. of the IEEE IISWC.Google ScholarCross Ref
Kun Suo, Yong Zhao, Wei Chen, and Jia Rao. 2018. An Analysis and Empirical Study of Container Networks. In Proc. of the IEEE INFOCOM.Google ScholarDigital Library
Cory Thoma, Alexandros Labrinidis, and Adam J. Lee. 2014. Automated operator placement in distributed Data Stream Management Systems subject to user constraints. In Proc. of the IEEE ICDEW. IEEE Computer Society.Google Scholar
Muhammad Tirmazi, Adam Barker, Nan Deng, Md E. Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. 2020. Borg: The next Generation. In Proc. of the ACM EuroSys.Google ScholarDigital Library
Sheng Wang, Zhijun Ding, and Changjun Jiang. 2021. Elastic Scheduling for Microservice Applications in Clouds. IEEE Trans. Parallel Distributed Syst. (2021).Google ScholarCross Ref
Xinkai Wang, Chao Li, Lu Zhang, Xiaofeng Hou, Quan Chen, and Minyi Guo. 2022. Exploring Efficient Microservice Level Parallelism. In IEEE IPDPS.Google Scholar
Xiaodong Wang and José F. Martínez. 2015. XChange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures. In Proc. of the IEEE HPCA.Google ScholarCross Ref
Łukasz Wojciechowski, Krzysztof Opasiak, Jakub Latusek, Maciej Wereski, Victor Morales, Taewan Kim, and Moonki Hong. 2021. NetMARKS: Network metrics-AwaRe kubernetes scheduler powered by service mesh. In IEEE INFOCOM.Google Scholar
wrk2 2022. wrk2. https://github.com/giltene/wrk2.Google Scholar
Zhaorui Wu, Yuhui Deng, Hao Feng, Yi Zhou, and Geyong Min. 2021. Blender: A traffic-aware container placement for containerized data centers. In IEEE DATE.Google Scholar
Guoyao Xu Xu, Cheng-Zhong Xu, and Song Jiang. 2016. Prophet: Scheduling Executors with Time-Varying Resource Demands on Data-Parallel Computation Frameworks. In Proc. of the IEEE ICAC.Google ScholarCross Ref
Tianlong Yu, Shadi Abdollahian Noghabi, Shachar Raindel, Hongqiang Liu, Jitu Padhye, and Vyas Sekar. 2016. FreeFlow: High Performance Container Networking. In Proc. of the ACM HotNets.Google ScholarDigital Library
Wei Zhang, Quan Chen, Ningxin Zheng, Weihao Cui, Kaihua Fu, and Minyi Guo. 2022. Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs. IEEE Trans. Comput. (2022).Google ScholarCross Ref
Laiping Zhao, Yanan Yang, Kaixuan Zhang, Xiaobo Zhou, Tie Qiu, Keqiu Li, and Yungang Bao. 2020. Rhythm: Component-Distinguishable Workload Deployment in Datacenters. In Proc. of the ACM EuroSys.Google ScholarDigital Library
Diyu Zhou and Yuval Tamir. 2022. RRC: Responsive Replicated Containers. In Proc. of the USENIX ATC.Google Scholar
Danyang Zhuo, Kaiyuan Zhang, Yibo Zhu, Hongqiang Harry Liu, Matthew Rockett, Arvind Krishnamurthy, and Thomas Anderson. 2019. Slim: OS Kernel Support for a Low-Overhead Container Overlay Network. In Proc. of the USENIX ATC.Google Scholar

Index Terms

On Optimizing Traffic Scheduling for Multi-replica Containerized Microservices
1. Networks
  1. Network services
    1. Cloud computing

Recommendations

Scheduling of Containerized Resources for Microservices in Cloud
Distributed Computing and Intelligent Technology
Abstract
Most developers consider that microservice-based application design and development can improve scalability and maintainability. The microservices are developed as small independent modules and deployed in containers. The containers are deployed ...
Read More
Cost-efficient scheduling algorithms based on beetle antennae search for containerized applications in Kubernetes clouds
Abstract
With the development of cloud-native technologies, Kubernetes becomes the standard of fact for container scheduling. Kubernetes provides service discovery and scheduling of containers, load balancing, service self-healing, elastic scaling, storage ...
Read More
Research on Kubernetes' Resource Scheduling Scheme
ICCNS '18: Proceedings of the 8th International Conference on Communication and Network Security

Currently, Google's open source container orchestration tool Kubernetes (K8s for short) has become the standard of fact for deploying containerized applications on a large scale in private, public, and hybrid cloud environments. By studying the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing
August 2023
858 pages
ISBN:9798400708435
DOI:10.1145/3605573

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 September 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cloud computing
Container orchestration system
Kubernetes
Microservice
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate91of313submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 182
  Total Downloads
- Downloads (Last 12 months)182
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

On Optimizing Traffic Scheduling for Multi-replica Containerized Microservices

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scheduling of Containerized Resources for Microservices in Cloud

Cost-efficient scheduling algorithms based on beetle antennae search for containerized applications in Kubernetes clouds

Research on Kubernetes' Resource Scheduling Scheme

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

On Optimizing Traffic Scheduling for Multi-replica Containerized Microservices

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scheduling of Containerized Resources for Microservices in Cloud

Cost-efficient scheduling algorithms based on beetle antennae search for containerized applications in Kubernetes clouds

Research on Kubernetes' Resource Scheduling Scheme

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media