Learning heuristic functions and search policies for classical planning
File(s)
Author(s)
Gomoluch, Pawel Krzysztof
Type
Thesis or dissertation
Abstract
Automated planning deals with the problem of composing sequences of actions, which, when executed from a given initial state, lead to the satisfaction of a given goal. Classical planning, in particular, relies on a known and deterministic model of the environment. In this thesis, we consider learning from the planner's experience as a way of improving its performance.
Modern classical planners are typically based on heuristic forward search. Such planners include two key components: a search routine, such as A* or greedy best-first search, and a heuristic function, which guides the search by evaluating any given state of the environment. A domain-independent heuristic function can be applied to problems from any planning domain.
The first part of the thesis investigates the use of machine learning techniques to obtain heuristic functions. The existing work on learning heuristics has focused on learning domain-specific functions. We propose a domain-independent approach, which allows the heuristics to be learned from data representing multiple domains and then deployed on any domain, thus preserving the generality of the planner. The learning task takes the form of supervised learning: solutions to a number of training problems are used to learn a heuristic function, which can then be applied for unseen problems and domains. We evaluate the approach experimentally and indicate its potential strengths and weaknesses.
In the second part of the thesis, we turn our attention to the search routine. The search routines used in classical planning are typically based on best-first search and remain fixed throughout the process of solving a problem. In this thesis, we construct search routines capable of changing the search approach while solving a problem. This allows us to formulate the task of learning search policies. The purpose of a search policy is to indicate the most suitable way of continuing the search given its current state. We consider two distinct approaches. The first one relies on a fixed set of search subroutines. The policy search problem translates to reinforcement learning with a discrete action space. The second one introduces a novel parametrised search algorithm, combining a variety of search techniques in a single routine. In this case, the search policy is responsible for setting the values of the search algorithm's parameters. This results in a learning task akin to reinforcement learning in continuous action spaces. Experimental evaluation shows that the learners are capable of discovering efficient search policies tailored to given distributions of planning problems.
Modern classical planners are typically based on heuristic forward search. Such planners include two key components: a search routine, such as A* or greedy best-first search, and a heuristic function, which guides the search by evaluating any given state of the environment. A domain-independent heuristic function can be applied to problems from any planning domain.
The first part of the thesis investigates the use of machine learning techniques to obtain heuristic functions. The existing work on learning heuristics has focused on learning domain-specific functions. We propose a domain-independent approach, which allows the heuristics to be learned from data representing multiple domains and then deployed on any domain, thus preserving the generality of the planner. The learning task takes the form of supervised learning: solutions to a number of training problems are used to learn a heuristic function, which can then be applied for unseen problems and domains. We evaluate the approach experimentally and indicate its potential strengths and weaknesses.
In the second part of the thesis, we turn our attention to the search routine. The search routines used in classical planning are typically based on best-first search and remain fixed throughout the process of solving a problem. In this thesis, we construct search routines capable of changing the search approach while solving a problem. This allows us to formulate the task of learning search policies. The purpose of a search policy is to indicate the most suitable way of continuing the search given its current state. We consider two distinct approaches. The first one relies on a fixed set of search subroutines. The policy search problem translates to reinforcement learning with a discrete action space. The second one introduces a novel parametrised search algorithm, combining a variety of search techniques in a single routine. In this case, the search policy is responsible for setting the values of the search algorithm's parameters. This results in a learning task akin to reinforcement learning in continuous action spaces. Experimental evaluation shows that the learners are capable of discovering efficient search policies tailored to given distributions of planning problems.
Version
Open Access
Date Issued
2020-02
Date Awarded
2020-08
Copyright Statement
Creative Commons Attribution NonCommercial Licence
Advisor
Russo, Alessandra
Alrajeh, Dalal
Sponsor
Imperial College London
Fondazione Bruno Kessler, Trento, Italy
Publisher Department
Computing
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)