Abstract:
This paper presents an end-to-end AC-D model for addressing the Vehicle Routing Problem with Time Windows (VRPTW) using reinforcement learning techniques. The proposed mo...Show MoreMetadata
Abstract:
This paper presents an end-to-end AC-D model for addressing the Vehicle Routing Problem with Time Windows (VRPTW) using reinforcement learning techniques. The proposed model employs a parameterized stochastic policy, which is trained through the observation of reward feedback and adherence to constraint rules. It then samples problem instances from a given distribution and identifies an approximately optimal solution. The model’s parameters are optimized using the policy gradient algorithm. Once trained, the model generates a sequence of continuous actions as solutions.The proposed model is built upon the Actor-Critic architecture and incorporates an encoder and decoder. Through dynamic interaction with the environment, it generates a set of sequences that satisfy the required constraints. By effectively minimizing the cost of all paths associated with the VRPTW, the model demonstrates its capability to optimize the overall routing efficiency.
Published in: 2023 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)
Date of Conference: 29-31 July 2023
Date Added to IEEE Xplore: 18 October 2023
ISBN Information: