Abstract:
We extend the stochastic multi-armed bandit to the case where the number of arms to play evolves as a stationary process. Our work is motivated by demand response in powe...Show MoreMetadata
Abstract:
We extend the stochastic multi-armed bandit to the case where the number of arms to play evolves as a stationary process. Our work is motivated by demand response in power systems, in which the number of arms to play, or loads to dispatch, depends on a random power imbalance. We give an upper confidence bound-based algorithm that achieves sublinear pseudo-regret. We apply our results in several examples from demand response.
Published in: IEEE Transactions on Automatic Control ( Volume: 63, Issue: 7, July 2018)