Abstract:
On-premise edge cloud provides opportunities to enable ML-based latency-critical services to resource-constrained end devices. The edge services are deployed as loosely c...Show MoreMetadata
Abstract:
On-premise edge cloud provides opportunities to enable ML-based latency-critical services to resource-constrained end devices. The edge services are deployed as loosely coupled microservices using cloud orchestrators like Kubernetes, and a load balancer distributes requests from an upstream microservice instance (client) across many downstream microservice instances (servers). However, in a shared environment, transient and sporadic delay events are common due to contention for host and network resources (e.g., high load on servers, high network queuing delays). To meet low latency requirements of edge services, the load balancer should quickly adapt to such delay events and adjust routing decisions (e.g., pick the best downstream instance among all). In this paper, we propose DL3, a distributed load balancer (LB) that quickly adapts to server load and network queuing delays by adjusting routing decisions so that the requests are forwarded to the best possible servers. The key idea is to enable LB with visibility into both servers’ load and transient delays on network paths toward the servers. We prototype DL3 on a Kubernetes-managed edge cloud cluster and evaluated its performance for a latency-sensitive ML-based object detection service. Our preliminary results show that DL3 improves tail response time by 33% compared to the state-of-the-art load balance mechanism.
Date of Conference: 28-31 October 2024
Date Added to IEEE Xplore: 31 December 2024
ISBN Information: