ABSTRACT
Co-locating Latency-Critical (LC) and Best-Effort (BE) services in edge-clouds is expected to enhance resource utilization. However, this mixed deployment encounters unique challenges. Edge-clouds are heterogeneous, distributed, and resource-constrained, leading to intense competition for edge resources, making it challenging to balance fluctuating co-located workloads. Previous works in cloud datacenters are no longer applicable since they do not consider the unique nature of edges. Although very few works explicitly provide specific schemes for edge workload co-location, these solutions fail to address the major challenges simultaneously.
In this paper, we propose Tango, a harmonious management and scheduling framework for Kubernetes-based edge-cloud systems with mixed services, to address these challenges. Tango incorporates novel components and mechanisms for elastic resource allocation and two traffic scheduling algorithms that effectively manage distributed edge resources. Tango demonstrates harmony not only in the compatible mixed services it supports, but also in the collaborative solutions that complement each other. Based on a backwards compatible design for Kubernetes, Tango enhances Kubernetes with automatic scaling and traffic scheduling capabilities. Experiments on large-scale hybrid edge-clouds, driven by real workload traces, show that Tango improves the system resource utilization by 36.9%, QoS-guarantee satisfaction rate by 11.3%, and throughput by 47.6%, compared to state-of-the-art approaches.
- 2019. A2C. https://github.com/openai/baselines/tree/master/baselines/a2cGoogle Scholar
- 2021. Google data. https://github.com/google/cluster-dataGoogle Scholar
- 2023. K8s-HorizontalPodAutoscaler. https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v1/Google Scholar
- 2023. K8s-SDK. https://github.com/kubernetes-client/Google Scholar
- 2023. Koordinator. https://github.com/koordinator-sh/koordinatorGoogle Scholar
- 2023. Kubernetes. https://github.com/kubernetes/kubernetesGoogle Scholar
- 2023. Ortools. https://developers.google.com/optimization/Google Scholar
- 2023. Prometheus. https://github.com/prometheus/prometheusGoogle Scholar
- 2023. scheduler. https://kubernetes.io/docs/concepts/services-networking/Google Scholar
- 2023. Swarm. https://docs.docker.com/engine/swarmGoogle Scholar
- 2023. VPA. https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscalerGoogle Scholar
- Pradeep Ambati 2019. Optimizing the cost of executing mixed interactive and batch workloads on transient vms. POMACS 3, 2 (2019), 1–24.Google Scholar
- Jun Lin Chen 2022. Starlight: Fast Container Provisioning on the Edge and over the WAN. In USENIX NSDI.Google Scholar
- Shuang Chen 2019. Parties: Qos-aware resource partitioning for multiple interactive services. In ASPLOS.Google Scholar
- Daniel E Eisenbud 2016. Maglev: A fast and reliable software network load balancer. In USENIX NSDI. 523–535.Google Scholar
- Matthias Fey 2019. Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428 (2019).Google Scholar
- Fujimoto 2018. Addressing function approximation error in actor-critic methods. In ICML. 1587–1596.Google Scholar
- Will Hamilton 2017. Inductive representation learning on large graphs. Adv. Neural Inf. Process Syst. 30 (2017).Google Scholar
- Rui Han 2022. EdgeTuner: Fast Scheduling Algorithm Tuning for Dynamic Edge-Cloud Workloads and Resources. In IEEE INFOCOM.Google Scholar
- Yiwen Han 2021. Tailored learning-based scheduling for kubernetes-oriented edge-cloud system. In IEEE INFOCOM.Google Scholar
- Benjamin Hindman 2011. Mesos: A Platform for { Fine-Grained} Resource Sharing in the Data Center. In USENIX NSDI.Google Scholar
- Lei Huang 2022. Towards Elasticity in Heterogeneous Edge-dense Environments. In IEEE ICDCS. 403–413.Google Scholar
- Yinzhi Lu 2022. An Intelligent Deterministic Scheduling Method for Ultra-Low Latency Communication in Edge Enabled Industrial Internet of Things. IEEE Trans. Industr. Inform. (2022).Google Scholar
- Quyuan Luo 2021. Resource scheduling in edge computing: A survey. IEEE Commun. Surv. Tutor. 23, 4 (2021), 2131–2165.Google ScholarCross Ref
- Seyed Hossein Mortazavi 2017. Cloudpath: A multi-tier cloud computing framework. In ACM/IEEE SEC. 1–13.Google Scholar
- Yoonsung Nam, Yongjun Choi, Byeonghun Yoo, Hyeonsang Eom, and Yongseok Son. 2020. EdgeIso: Effective Performance Isolation for Edge Devices. In IEEE IPDPS. 295–305.Google Scholar
- Rajiv Nishtala 2020. Twig: Multi-agent task management for colocated latency-critical cloud services. In IEEE HPCA. 167–179.Google Scholar
- Parveen Patel 2013. Ananta: Cloud scale load balancing. ACM Comput. Commun. Rev. (2013).Google Scholar
- Drew Penney 2022. PROMPT: Learning Dynamic Resource Allocation Policies for Edge-Network Applications. arXiv:2201.07916 (2022).Google Scholar
- Ju Ren 2019. A survey on end-edge-cloud orchestrated network computing paradigms: Transparent computing, mobile edge computing, fog computing, and cloudlet. ACM CSUR (2019).Google Scholar
- Shihao Shen 2022. EdgeMatrix: A Resource-Redefined Scheduling Framework for SLA-Guaranteed Multi-Tier Edge-Cloud Computing Systems. IEEE JSAC (2022).Google Scholar
- Weisong Shi 2016. Edge computing: Vision and challenges. (2016).Google Scholar
- Weisong Shi 2016. Edge computing: Vision and challenges. IEEE Internet Things J. 3, 5 (2016), 637–646.Google ScholarCross Ref
- Chuan Sun 2021. Cooperative computation offloading for multi-access edge computing in 6G mobile networks via soft actor critic. IEEE TNSE (2021).Google Scholar
- Jianhang Tang 2022. Latency-Aware Task Scheduling in Software-Defined Edge and Cloud Computing with Erasure-Coded Storage Systems. IEEE Trans. on Cloud Comput. (Early Access) (2022).Google Scholar
- Abhishek Verma 2015. Large-scale cluster management at Google with Borg. In ACM EuroSys. 1–17.Google Scholar
- Jianyu Wang 2019. Edge cloud offloading algorithms: Issues, methods, and perspectives. ACM CSUR 52, 1 (2019), 1–23.Google Scholar
- Xiaofei Wang 2022. Integrating edge intelligence and blockchain: What, why, and how. IEEE COMST (2022).Google ScholarCross Ref
- Mengwei Xu 2021. From cloud to edge: a first look at public edge platforms. In ACM IMC. 37–53.Google Scholar
- Jinyu Yu 2021. CERES: Container-Based Elastic Resource Management System for Mixed Workloads. In ICPP. 1–10.Google Scholar
- Ke Zhang 2017. Mobile-edge computing for vehicular networks: A promising network paradigm with predictive off-loading. IEEE Veh. Technol. Mag. (2017).Google Scholar
- Yunqi Zhang 2016. History-Based Harvesting of Spare Cycles and Storage in Large-Scale Datacenters. In OSDI.Google Scholar
- Zhiheng Zhong 2020. A cost-efficient container orchestration strategy in kubernetes-based cloud computing infrastructures with heterogeneous resources. ACMTOIT 20, 2 (2020), 1–24.Google Scholar
- Hang Zhu 2021. Network planning with deep reinforcement learning. In ACM SIGCOMM Conference.Google ScholarDigital Library
Index Terms
- Tango: Harmonious Management and Scheduling for Mixed Services Co-located among Distributed Edge-Clouds
Recommendations
Resource provisioning and scheduling in clouds: QoS perspective
Resource provisioning of appropriate resources to cloud workloads depends on the quality of service (QoS) requirements of cloud applications and is a challenging task. In cloud environment, heterogeneity, uncertainty and dispersion of resources ...
MORPHOSYS: Efficient Colocation of QoS-Constrained Workloads in the Cloud
CCGRID '12: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)In hosting environments such as IaaS clouds, desirable application performance is usually guaranteed through the use of Service Level Agreements (SLAs), which specify minimal fractions of resource capacities that must be allocated for unencumbered use ...
Distributed resource allocation in federated clouds
Cloud computing is an emerging technology which relies on virtualization techniques to achieve the elasticity of shared resources for providing on-demand services. When the service demand increases, more resources are required to satisfy the service ...
Comments