Abstract:
A connection has recently been drawn between Dynamic Optimization Problems (DOPs) and Reinforcement Learning Problems (RLPs) where they can be seen as subsets of a broade...Show MoreMetadata
Abstract:
A connection has recently been drawn between Dynamic Optimization Problems (DOPs) and Reinforcement Learning Problems (RLPs) where they can be seen as subsets of a broader class of Sequential Decision-Making Problems (SDMPs). SDMPs require new decisions on an ongoing basis. Typically the underlying environment changes between decisions. The SDMP view is useful as it allows the unified space to be explored. Solutions can be designed for characteristics of problem instances using algorithms from either community. Little has been done on comparing algorithm performance across these communities, particularly under real-world resource constraints. In this paper we lay the theoretical foundations for the concept of offline and online time in SDMPs. We implement a method, based on the theoretical formulations, to limit offline time on representative algorithms. We investigate the online performance on a Conceptual Moving Peaks Benchmark (CMPB). Our results show that the performance of an Evolutionary Dynamic Optimisation (EDO) algorithm depends on the offline time constraint while the performance of an EDO-hybrid is noticeably impacted only past a lower bound on the size of the state-action space. Our method evaluates the effects of resource constraints on online algorithm performance and is a promising start to a rigorous method of algorithm selection for real-world problems.
Date of Conference: 06-09 December 2016
Date Added to IEEE Xplore: 13 February 2017
ISBN Information: