Abstract:
We consider online convex optimization (OCO) with multi-slot feedback delay. An agent selects a sequence of online decisions to minimize the accumulation of time-varying ...Show MoreMetadata
Abstract:
We consider online convex optimization (OCO) with multi-slot feedback delay. An agent selects a sequence of online decisions to minimize the accumulation of time-varying convex loss functions, subject to short-term and long-term constraints that may be time-varying. Both the convex loss function and the long-term constraint function may experience multiple time slots of feedback delay to be received by the agent. Existing works on OCO under this general setting has focused on the static regret, which measures the gap of losses between an online decision sequence and a time-invariant static offline benchmark. In this work, besides the static regret, we also consider a more practically meaningful metric, the dynamic regret, where the benchmark is a time-varying online optimal decision sequence. We propose an efficient algorithm, termed Delay-Tolerant Constrained-OCO (DTC-OCO), which uses a novel double regularization together with a new penalty mechanism on the long-term constraint violation, to tackle the asynchrony between information feedback and decision updates. We obtain upper bounds for its static regret, dynamic regret, and constraint violation, proving that they are sublinear under mild conditions. Furthermore, we consider a variation of DTC-OCO with multi-step gradient descent, and show it provides improved dynamic regret and constraint violation bounds for strongly convex loss functions. For numerical demonstration, we apply DTC-OCO to a general network resource allocation problem. Our simulation results suggest substantial performance gain by DTC-OCO over the current best alternative.
Published in: IEEE/ACM Transactions on Networking ( Volume: 31, Issue: 1, February 2023)