Learning heuristic functions and search policies for classical planning

Gomoluch, Pawel Krzysztof

doi:https://doi.org/10.25560/82211

File(s)

Gomoluch-P-2020-PhD-Thesis.pdf (1.02 MB)

Thesis

Author(s)

Gomoluch, Pawel Krzysztof

Type

Thesis or dissertation

Abstract

Automated planning deals with the problem of composing sequences of actions, which, when executed from a given initial state, lead to the satisfaction of a given goal. Classical planning, in particular, relies on a known and deterministic model of the environment. In this thesis, we consider learning from the planner's experience as a way of improving its performance.

Modern classical planners are typically based on heuristic forward search. Such planners include two key components: a search routine, such as A* or greedy best-first search, and a heuristic function, which guides the search by evaluating any given state of the environment. A domain-independent heuristic function can be applied to problems from any planning domain.

The first part of the thesis investigates the use of machine learning techniques to obtain heuristic functions. The existing work on learning heuristics has focused on learning domain-specific functions. We propose a domain-independent approach, which allows the heuristics to be learned from data representing multiple domains and then deployed on any domain, thus preserving the generality of the planner. The learning task takes the form of supervised learning: solutions to a number of training problems are used to learn a heuristic function, which can then be applied for unseen problems and domains. We evaluate the approach experimentally and indicate its potential strengths and weaknesses.

In the second part of the thesis, we turn our attention to the search routine. The search routines used in classical planning are typically based on best-first search and remain fixed throughout the process of solving a problem. In this thesis, we construct search routines capable of changing the search approach while solving a problem. This allows us to formulate the task of learning search policies. The purpose of a search policy is to indicate the most suitable way of continuing the search given its current state. We consider two distinct approaches. The first one relies on a fixed set of search subroutines. The policy search problem translates to reinforcement learning with a discrete action space. The second one introduces a novel parametrised search algorithm, combining a variety of search techniques in a single routine. In this case, the search policy is responsible for setting the values of the search algorithm's parameters. This results in a learning task akin to reinforcement learning in continuous action spaces. Experimental evaluation shows that the learners are capable of discovering efficient search policies tailored to given distributions of planning problems.

Version

Open Access

Date Issued

2020-02

Date Awarded

2020-08

URI

http://hdl.handle.net/10044/1/82211

DOI

https://doi.org/10.25560/82211

Copyright Statement

Creative Commons Attribution NonCommercial Licence

Advisor

Russo, Alessandra

Alrajeh, Dalal

Sponsor

Imperial College London

Fondazione Bruno Kessler, Trento, Italy

Publisher Department

Computing

Publisher Institution

Imperial College London

Qualification Level

Doctoral

Qualification Name

Doctor of Philosophy (PhD)