skip to main content
10.1145/3605573.3605589acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Tango: Harmonious Management and Scheduling for Mixed Services Co-located among Distributed Edge-Clouds

Published:13 September 2023Publication History

ABSTRACT

Co-locating Latency-Critical (LC) and Best-Effort (BE) services in edge-clouds is expected to enhance resource utilization. However, this mixed deployment encounters unique challenges. Edge-clouds are heterogeneous, distributed, and resource-constrained, leading to intense competition for edge resources, making it challenging to balance fluctuating co-located workloads. Previous works in cloud datacenters are no longer applicable since they do not consider the unique nature of edges. Although very few works explicitly provide specific schemes for edge workload co-location, these solutions fail to address the major challenges simultaneously.

In this paper, we propose Tango, a harmonious management and scheduling framework for Kubernetes-based edge-cloud systems with mixed services, to address these challenges. Tango incorporates novel components and mechanisms for elastic resource allocation and two traffic scheduling algorithms that effectively manage distributed edge resources. Tango demonstrates harmony not only in the compatible mixed services it supports, but also in the collaborative solutions that complement each other. Based on a backwards compatible design for Kubernetes, Tango enhances Kubernetes with automatic scaling and traffic scheduling capabilities. Experiments on large-scale hybrid edge-clouds, driven by real workload traces, show that Tango improves the system resource utilization by 36.9%, QoS-guarantee satisfaction rate by 11.3%, and throughput by 47.6%, compared to state-of-the-art approaches.

References

  1. 2019. A2C. https://github.com/openai/baselines/tree/master/baselines/a2cGoogle ScholarGoogle Scholar
  2. 2021. Google data. https://github.com/google/cluster-dataGoogle ScholarGoogle Scholar
  3. 2023. K8s-HorizontalPodAutoscaler. https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v1/Google ScholarGoogle Scholar
  4. 2023. K8s-SDK. https://github.com/kubernetes-client/Google ScholarGoogle Scholar
  5. 2023. Koordinator. https://github.com/koordinator-sh/koordinatorGoogle ScholarGoogle Scholar
  6. 2023. Kubernetes. https://github.com/kubernetes/kubernetesGoogle ScholarGoogle Scholar
  7. 2023. Ortools. https://developers.google.com/optimization/Google ScholarGoogle Scholar
  8. 2023. Prometheus. https://github.com/prometheus/prometheusGoogle ScholarGoogle Scholar
  9. 2023. scheduler. https://kubernetes.io/docs/concepts/services-networking/Google ScholarGoogle Scholar
  10. 2023. Swarm. https://docs.docker.com/engine/swarmGoogle ScholarGoogle Scholar
  11. 2023. VPA. https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscalerGoogle ScholarGoogle Scholar
  12. Pradeep Ambati 2019. Optimizing the cost of executing mixed interactive and batch workloads on transient vms. POMACS 3, 2 (2019), 1–24.Google ScholarGoogle Scholar
  13. Jun Lin Chen 2022. Starlight: Fast Container Provisioning on the Edge and over the WAN. In USENIX NSDI.Google ScholarGoogle Scholar
  14. Shuang Chen 2019. Parties: Qos-aware resource partitioning for multiple interactive services. In ASPLOS.Google ScholarGoogle Scholar
  15. Daniel E Eisenbud 2016. Maglev: A fast and reliable software network load balancer. In USENIX NSDI. 523–535.Google ScholarGoogle Scholar
  16. Matthias Fey 2019. Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428 (2019).Google ScholarGoogle Scholar
  17. Fujimoto 2018. Addressing function approximation error in actor-critic methods. In ICML. 1587–1596.Google ScholarGoogle Scholar
  18. Will Hamilton 2017. Inductive representation learning on large graphs. Adv. Neural Inf. Process Syst. 30 (2017).Google ScholarGoogle Scholar
  19. Rui Han 2022. EdgeTuner: Fast Scheduling Algorithm Tuning for Dynamic Edge-Cloud Workloads and Resources. In IEEE INFOCOM.Google ScholarGoogle Scholar
  20. Yiwen Han 2021. Tailored learning-based scheduling for kubernetes-oriented edge-cloud system. In IEEE INFOCOM.Google ScholarGoogle Scholar
  21. Benjamin Hindman 2011. Mesos: A Platform for { Fine-Grained} Resource Sharing in the Data Center. In USENIX NSDI.Google ScholarGoogle Scholar
  22. Lei Huang 2022. Towards Elasticity in Heterogeneous Edge-dense Environments. In IEEE ICDCS. 403–413.Google ScholarGoogle Scholar
  23. Yinzhi Lu 2022. An Intelligent Deterministic Scheduling Method for Ultra-Low Latency Communication in Edge Enabled Industrial Internet of Things. IEEE Trans. Industr. Inform. (2022).Google ScholarGoogle Scholar
  24. Quyuan Luo 2021. Resource scheduling in edge computing: A survey. IEEE Commun. Surv. Tutor. 23, 4 (2021), 2131–2165.Google ScholarGoogle ScholarCross RefCross Ref
  25. Seyed Hossein Mortazavi 2017. Cloudpath: A multi-tier cloud computing framework. In ACM/IEEE SEC. 1–13.Google ScholarGoogle Scholar
  26. Yoonsung Nam, Yongjun Choi, Byeonghun Yoo, Hyeonsang Eom, and Yongseok Son. 2020. EdgeIso: Effective Performance Isolation for Edge Devices. In IEEE IPDPS. 295–305.Google ScholarGoogle Scholar
  27. Rajiv Nishtala 2020. Twig: Multi-agent task management for colocated latency-critical cloud services. In IEEE HPCA. 167–179.Google ScholarGoogle Scholar
  28. Parveen Patel 2013. Ananta: Cloud scale load balancing. ACM Comput. Commun. Rev. (2013).Google ScholarGoogle Scholar
  29. Drew Penney 2022. PROMPT: Learning Dynamic Resource Allocation Policies for Edge-Network Applications. arXiv:2201.07916 (2022).Google ScholarGoogle Scholar
  30. Ju Ren 2019. A survey on end-edge-cloud orchestrated network computing paradigms: Transparent computing, mobile edge computing, fog computing, and cloudlet. ACM CSUR (2019).Google ScholarGoogle Scholar
  31. Shihao Shen 2022. EdgeMatrix: A Resource-Redefined Scheduling Framework for SLA-Guaranteed Multi-Tier Edge-Cloud Computing Systems. IEEE JSAC (2022).Google ScholarGoogle Scholar
  32. Weisong Shi 2016. Edge computing: Vision and challenges. (2016).Google ScholarGoogle Scholar
  33. Weisong Shi 2016. Edge computing: Vision and challenges. IEEE Internet Things J. 3, 5 (2016), 637–646.Google ScholarGoogle ScholarCross RefCross Ref
  34. Chuan Sun 2021. Cooperative computation offloading for multi-access edge computing in 6G mobile networks via soft actor critic. IEEE TNSE (2021).Google ScholarGoogle Scholar
  35. Jianhang Tang 2022. Latency-Aware Task Scheduling in Software-Defined Edge and Cloud Computing with Erasure-Coded Storage Systems. IEEE Trans. on Cloud Comput. (Early Access) (2022).Google ScholarGoogle Scholar
  36. Abhishek Verma 2015. Large-scale cluster management at Google with Borg. In ACM EuroSys. 1–17.Google ScholarGoogle Scholar
  37. Jianyu Wang 2019. Edge cloud offloading algorithms: Issues, methods, and perspectives. ACM CSUR 52, 1 (2019), 1–23.Google ScholarGoogle Scholar
  38. Xiaofei Wang 2022. Integrating edge intelligence and blockchain: What, why, and how. IEEE COMST (2022).Google ScholarGoogle ScholarCross RefCross Ref
  39. Mengwei Xu 2021. From cloud to edge: a first look at public edge platforms. In ACM IMC. 37–53.Google ScholarGoogle Scholar
  40. Jinyu Yu 2021. CERES: Container-Based Elastic Resource Management System for Mixed Workloads. In ICPP. 1–10.Google ScholarGoogle Scholar
  41. Ke Zhang 2017. Mobile-edge computing for vehicular networks: A promising network paradigm with predictive off-loading. IEEE Veh. Technol. Mag. (2017).Google ScholarGoogle Scholar
  42. Yunqi Zhang 2016. History-Based Harvesting of Spare Cycles and Storage in Large-Scale Datacenters. In OSDI.Google ScholarGoogle Scholar
  43. Zhiheng Zhong 2020. A cost-efficient container orchestration strategy in kubernetes-based cloud computing infrastructures with heterogeneous resources. ACMTOIT 20, 2 (2020), 1–24.Google ScholarGoogle Scholar
  44. Hang Zhu 2021. Network planning with deep reinforcement learning. In ACM SIGCOMM Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Tango: Harmonious Management and Scheduling for Mixed Services Co-located among Distributed Edge-Clouds

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing
      August 2023
      858 pages
      ISBN:9798400708435
      DOI:10.1145/3605573

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 September 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate91of313submissions,29%
    • Article Metrics

      • Downloads (Last 12 months)148
      • Downloads (Last 6 weeks)30

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format