Learning action models for the improved execution of navigation plans

https://doi.org/10.1016/S0921-8890(02)00163-XGet rights and content

Abstract

Most state-of-the-art navigation systems for autonomous service robots decompose navigation into global navigation planning and local reactive navigation. While the methods for navigation planning and local navigation themselves are well understood, the plan execution problem, the problem of how to generate and parameterize local navigation tasks from a given navigation plan is largely unsolved.

This paper describes how a robot can autonomously learn to execute navigation plans. We formalize the problem as a Markov Decision Process (MDP) and derive a decision theoretic action selection function from it. The action selection function employs models of the robot’s navigation actions, which are autonomously acquired from experience using neural networks or regression tree learning algorithms. We show, both in simulation and on an RWI B21 mobile robot, that the learned models together with the derived action selection function achieve competent navigation behavior.

Introduction

Robot navigation is the task of reliably and quickly navigating to specified locations in the robot’s operating environment. Most state-of-the-art navigation systems for mobile service robots consist—besides components for map learning and the estimation of the robot’s position—of components for global navigation planning and local reactive navigation [1].

Using a map of the operating environment, global navigation planning modules compute plans for navigating to specified locations in the environment. Navigation plans are typically specified as sequences of discrete actions, mappings from robot states into discrete navigation actions (navigation policies), or paths, sequences of intermediate destinations. Latombe [2] gives a comprehensive overview of algorithms for path and motion planning. Approaches for computing navigation policies in the MDP framework are described in [3], [4], [5].

Researchers have also investigated a variety of methods for carrying out local navigation tasks. Local navigation tasks are those in which the destinations are located in the surroundings of the robot and for which no global (static) map of the environment is required. The reactive navigation methods often employ concurrent, continuous and sensor-driven control processes and a behavior arbitration method that continually combines the output signals of the individual control processes into common control signals for the robot’s drive. Arkin [6] gives a comprehensive introduction to the principles of (behavior-based) reactive control. Other approaches generate trajectories for local navigation tasks based on simple models of the robot dynamics and choose the trajectories with the highest utility with respect to some given evaluation function [7], [8]. Konolige [14] suggests to do fast shortest path planning based on the local map of the robot.

The methods used for navigation planning and local navigation are difficult to integrate. Navigation planning is performed as open-loop control, often assumes an abstract discretization of the state and action space, and typically works in a drastically reduced state space that ignores moving obstacles and the robot’s dynamic state. Local navigation, on the other hand, is performed as closed-loop control, steers continuous behaviors, uses asynchronously arriving sensor data streams, and considers moving obstacles and the robot’s dynamic state.

In many modern robot navigation systems, both processes take place at different layers of control which are separated by an abstract interface. In the RHINO system [5], for example, the global path planning system sends intermediate target points to the local navigation component. The local navigation component then tries to approach these target points while avoiding collisions with moving or static obstacles [7]. The plan execution problem in this case is the problem of how to compute target points for the local navigation component from the global navigation plan. In general, the plan execution problem considers the problem of choosing appropriate actions executable by the local navigation component given the global navigation plan.

Fig. 1 illustrates that the generation of local navigation tasks and parameterizations of the local navigation processes for a given navigation plan has an enormous impact on the performance of the robot. The figure shows two navigation traces that are generated by the same navigation plan and local navigation module using different plan execution schemes. As you can see the behavior depicted on the right is much smoother. Beetz and Belker [9] give a more detailed description of the experiment and a thorough explanation of the results.

In this paper, we propose a novel solution to the plan execution problem. We formalize the navigation plan execution problem as a Markov Decision Process (MDP). From this MDP, we derive a decision theoretic action selection function. It selects actions that maximize the expected local performance given the plan computed by the global path planning component and models of the actions supported by the local navigation component. The action models comprise the probability of a failure and the expected duration for executing the actions. We show that the models needed for action selection can be automatically learned by the robot. We use simple multi-layer perceptrons and regression trees for the learning task and give empirical results of their performance. We illustrate the power of the derived action selection function together with the learned models both in simulation and on an RWI B21 mobile robot.

The main scientific contribution of this paper is the formalization of a key problem in the development of hybrid control systems for mobile robots: the plan execution problem. Designing appropriate plan execution mechanisms is very hard and therefore such mechanisms are often implemented in an ad hoc manner and hand-tuned for different applications. In our approach, a plan execution mechanism is autonomously learned by the robot.

The remainder of this paper is organized as follows. Section 2 states the navigation plan execution problem as a Markov Desicion Problem and derives an action selection function from the MDP. In Section 3, we describe how the action models necessary to compute a policy for the problem can be autonomously learned. In Section 4, we experimentally demonstrate that the learned action selection policy significantly improves the robot’s navigation performance. Section 5 discusses related work.

Section snippets

The plan execution problem as an MDP

In this section, we briefly introduce the notion of MDPs, formulate the plan execution problem as an MDP, and derive an action selection function for the MDP.

Learning the models

Let us now consider how the function v:S×A→R, the average velocity when executing action a in state s, and the probabilistic action model P(S|S, A) can be learned. To do so, we will first define a suitable feature language to describe state–action pairs (s, a) that comprises observable conditions that are expected to correlate with the navigation performance.

In our learning experiments, we use the following features: (1) clearance towards the target position, (2) clearance at current position,

Experimental results

In this section, we will demonstrate (1) that the action selection function defined by Eq. (2) can be used to execute a global navigation plan quickly and reliably and (2) that the necessary models can be learned autonomously. We have performed both simulator experiments and experiments on a real robot to show that the learned action selection improves the robot’s performance substantially and significantly.

The experiments are run on an RWI B21 mobile robot platform and its simulator. We use

Related work

In this paper, we have applied the MDP framework to the problem of navigation plan execution. To the best of our knowledge, this is a new application of the MDP framework. However, it has been applied to mobile robot navigation before, for example, to plan navigation policies [3], [4]. Simmons and Koenig [4] use a state space derived from a topological map of the robot’s working environment. They use a coarse discretization of 1 m in each direction but also consider the robots orientation (four

Conclusion

We have discussed how the problem of navigation plan execution can be formalized in the MDP framework and how a decision theoretic action selection function can be derived from that formalization. Standard learning techniques like neural networks and decision/regression trees can be applied to learn the required models. The approach has been tested both in simulation and on an RWI B21 mobile robot to demonstrate that it improves the robot’s behavior significantly compared to a plan execution

Thorsten Belker works as a research scientist at the Intelligent Autonomous Systems Group, University of Bonn. His research interests include artificial intelligence, machine learning and mobile robotics with a focus on robot learning. He received his diploma in Computer Science from the University of Bonn in 1999.

References (25)

  • D. Kortenkamp, R. Bonasso, R. Murphy (Eds.), AI-based Mobile Robots: Case Studies of Successful Robot Systems, MIT...
  • J.-C. Latombe, Robot Motion Planning, Kluwer Academic Publishers, Boston, MA,...
  • L. Kaelbling, A. Cassandra, J. Kurien, Acting under uncertainty: Discrete Bayesian models for mobile-robot navigation,...
  • R. Simmons, S. Koenig, Probabilistic robot navigation in partially observable environments, in: Proceedings of the 14th...
  • S. Thrun, A. Buecken, W. Burgard, D. Fox, T. Fröhlinghaus, D. Hennig, T. Hofmann, M. Krell, T. Schmidt, AI-based Mobile...
  • R. Arkin, Behavior-based Robotics, MIT Press, Cambridge, MA,...
  • D. Fox, W. Burgard, S. Thrun, The dynamic window approach to collision avoidance, IEEE Robotics and Automation Magazine...
  • R. Simmons, The curvature-velocity method for local obstacle avoidance, in: Proc. IEEE International Conference on...
  • M. Beetz, T. Belker, Environment and task adaptation for robotic agents, in: Proceedings of the 14th European...
  • M. Beetz, W. Burgard, D. Fox, A. Cremers, Integrating Active Localization into High-level Control Systems, Robotics and...
  • L.P. Kaelbling, M.L. Littman, A.R. Cassandra, Planning and acting in partially observable stochastic domains,...
  • D. Fox, W. Burgard, S. Thrun, Markov localization for mobile robots in dynamic environments, Journal of Artificial...
  • Cited by (0)

    Thorsten Belker works as a research scientist at the Intelligent Autonomous Systems Group, University of Bonn. His research interests include artificial intelligence, machine learning and mobile robotics with a focus on robot learning. He received his diploma in Computer Science from the University of Bonn in 1999.

    Michael Beetz is a lecturer in Munich University of Technology’s Department of Computer Science. His research interests include artificial intelligence, plan-based control of autonomous agents, and intelligent autonomous robotics. He received his PhD in Computer Science from Yale University and his venia legendi from the University of Bonn in 2001.

    Armin B. Cremers is a full professor of Computer Science III at the University of Bonn. Prior to this position he was a full professor at the University of Dortmund and an assistant professor at the University of Southern California. His research interests include database and information systems, autonomous mobile robots, and artificial intelligence.

    The research reported in this paper is partly funded by the Deutsche Forschungsgemeinschaft (DFG) under contract number BE 2200/3-1.

    View full text