Robot learning through task identification

doi:10.1016/j.robot.2006.04.015

Robotics and Autonomous Systems

Volume 54, Issue 9, 30 September 2006, Pages 766-778

https://doi.org/10.1016/j.robot.2006.04.015 Get rights and content

Abstract

The operation of an autonomous mobile robot in a semi-structured environment is a complex, usually non-linear and partly unpredictable process. Lacking a theory of robot–environment interaction that allows the design of robot control code based on theoretical analysis, roboticists still have to resort to trial-and-error methods in mobile robotics.

The RobotMODIC project aims to develop a theoretical understanding of a robot’s interaction with its environment, and uses system identification techniques to identify the system robot–task–environment. In this paper, we present two practical examples of the RobotMODIC process: mobile robot self-localisation and mobile robot training to achieve door traversal.

In both examples, a transparent mathematical function is obtained that maps inputs–sensory perception in both cases–to output — location and steering velocity respectively. Analysis of the obtained models reveals further information about the way in which a task is achieved, the relevance of individual sensors, possible ways of obtaining more parsimonious models, etc.

Introduction

The behaviour of a mobile robot is the result of properties of the robot itself (physical aspects–the “embodiment”), the environment (“situatedness”), and the control program (the “task”) the robot is executing (see Fig. 1). This triangle of robot, task and environment constitutes a complex, interacting and typically nonlinear system that is difficult to analyse, and whose behaviour is only partially predictable. Because of this lack of theoretical understanding, it is hard to develop robot control code without resorting to costly and imprecise trial-and-error methods.

The RobotMODIC project [1] investigates this relationship between robot, task and environment, and aims to develop an analytical description of the interdependency between the factors that govern robot behaviour. The approach we have taken so far is to model aspects of robot behaviour, using transparent mathematical models which, we argue, retain the essential properties of the robot’s behaviour, while being analysable through established mathematical procedures.

Essentially, we view the interaction between a mobile robot and its environment as a kind of “computation”, in which the robot “computes” behaviour (the output) from the three inputs: robot morphology, environmental characteristics and executed task (see Fig. 2).

Similar to a cylindrical lens, which can be used to perform an analog computation, highlighting vertical edges and suppressing horizontal ones, or a camera lens computing a Fourier transform by analog means, a robot’s behaviour–commonly the mobile robot’s trajectory–can be seen as emergent from the three components shown in Fig. 1: the robot “computes” its behaviour from its own makeup, the world’s makeup, and taking into account the program it is currently running (the task).

The RobotMODIC procedure, therefore, consists of four main steps:

(1)
Observe the robot’s behaviour in the target environment, by logging sensory perception, motor response and location in the environment,
(2)
model the relationship under investigation, using a transparent mathematical modelling process that expresses the relationship as a closed, analysable function,
(3)
establish the validity of the obtained model by comparing the originally observed behaviour with that of the model,
(4)
analyse the model using standard mathematical techniques, thus gaining understanding of the interaction between robot and environment.

To obtain models of input–output relationships we use the ARMAX (linear autoregressive, moving average model with exogenous inputs, [2]) or NARMAX (nonlinear ARMAX) system identification methods; we therefore refer to this process as “robot identification”.

Both ARMAX and NARMAX model the input–output relationship as a mathematical function, in our case linear or nonlinear polynomials. ARMAX is available through standard programming packages such as Matlab or Scilab.

The NARMAX modelling approach is a parameter estimation methodology for identifying both the important model terms and the parameters of unknown non-linear dynamic systems. For multiple input, single output noiseless systems this model takes the form: $y (n) = f (u_{1} (n), u_{1} (n - 1), u_{1} (n - 2), \dots, u_{1} (n - N_{u}), u_{1} {(n)}^{2}, u_{1} {(n - 1)}^{2}, u_{1} {(n - 2)}^{2}, \dots, u_{1} {(n - N_{u})}^{2}, \dots, u_{1} {(n)}^{l}, u_{1} {(n - 1)}^{l}, u_{1} {(n - 2)}^{l}, \dots, u_{1} {(n - N_{u})}^{l}, u_{2} (n), u_{2} (n - 1), u_{2} (n - 2), \dots, u_{2} (n - N_{u}), u_{2} {(n)}^{2}, u_{2} {(n - 1)}^{2}, u_{2} {(n - 2)}^{2}, \dots, u_{2} {(n - N_{u})}^{2}, \dots, u_{2} {(n)}^{l}, u_{2} {(n - 1)}^{l}, u_{2} {(n - 2)}^{l}, \dots, u_{2} {(n - N_{u})}^{l}, \dots, \dots, u_{d} (n), u_{d} (n - 1), u_{d} (n - 2), \dots, u_{d} (n - N_{u}), u_{d} {(n)}^{2}, u_{d} {(n - 1)}^{2}, u_{d} {(n - 2)}^{2}, \dots, u_{d} {(n - N_{u})}^{2}, \dots, u_{d} {(n)}^{l}, u_{d} {(n - 1)}^{l}, u_{d} {(n - 2)}^{l}, \dots, u_{d} {(n - N_{u})}^{l}, y (n - 1), y (n - 2), \dots, y (n - N_{y}), y {(n - 1)}^{2}, y {(n - 2)}^{2}, \dots, y {(n - N_{y})}^{2}, \dots, y {(n - 1)}^{l}, y {(n - 2)}^{l}, \dots, y {(n - N_{y})}^{l})$ where $y (n)$ and $\vec{u} (n)$ are the sampled output and input signals at time $n$ respectively, $N_{y}$ and $N_{u}$ are the regression orders of the output and input respectively and $d$ is the input dimension. $f ()$ is a non-linear function and it is typically taken to be a polynomial or wavelet multi-resolution expansion of the arguments. The degree $l$ of the polynomial is the highest sum of powers in any of its terms.

The NARMAX methodology breaks the modelling problem into the following steps:

(1)
Structure detection,
(2)
parameter estimation,
(3)
model validation,
(4)
prediction, and
(5)
analysis.

A detailed procedure of how these steps are done is presented in [3], [4], [5]; nevertheless a brief explanation of them is given below.

Any data set that we intend to model is first split in two sets (usually of equal size). The first, referred to as the estimation data set, is used to calculate the model parameters. The remaining data set, referred to as the test set, is used to test and evaluate the model.

The structure of the NARMAX polynomial is determined by the inputs $\vec{u}$ , the output $y$ , the input and output orders $N_{u}$ and $N_{y}$ respectively and the degree $l$ of the polynomial. The number of terms of the NARMAX model polynomial can be very large depending on these variables, but not all of them are significant contributors to the computation of the output, in fact often most terms can be safely removed from the model equation without introducing any significant errors.

The calculation of the NARMAX model parameters is an iterative process. Each iteration involves three steps: (i) estimation of model parameters, (ii) model validation and (iii) removal of non-contributing terms.

In the first step the NARMAX model is used to compute an equivalent auxiliary model whose terms are orthogonal, thus allowing their associated parameters to be calculated sequentially and independently from each other. Once the parameters of the auxiliary model are obtained, the NARMAX model parameters are computed from the auxiliary model.

The NARMAX model is then tested using the validation data set. If the error between the model predicted output and the actual output is below a user-defined threshold, non-contributing model terms are removed in order to reduce the size of the polynomial.

To determine the contribution of a model term to the output, the so-called Error Reduction Ratio (ERR) [4] is computed for each term. The ERR of a term is the percentage reduction in the total mean-squared error (i.e. the difference between model predicted and true system output) as a result of including (in the model equation) the term under consideration. The bigger the ERR is, the more significant the term. Model terms with ERR under a certain threshold are removed from the model polynomial.

In the following iteration, if the error is higher as a result of the last removal of model terms then these are re-inserted back into the model equation and the model is considered as final.

The RobotMODIC process offers some very practical benefits to the robotics engineer. It is possible, for instance, to represent robot control code in a very condensed and “universal” format that allows the easy transfer of robot control code between different robot platforms [6] (see, for instance, Section 3 of this paper). It is also possible to represent one sensor modality in terms of another, for example sonar sensor perceptions can be expressed as functions of laser perceptions (this “sensor identification” is obviously only possible if both sensor modalities contain similar information — to model laser perception as a function of bumper signals, for example, is of course impossible! [7]).

In this paper, we present two practical examples of how this process can be applied to mobile robotics:

(1)
Robot navigation: self-localisation of a mobile robot.
(2)
Robot programming: obtaining robot control code through operator-supervised training.

Section snippets

Application 1: Mobile robot self-localisation

The experimental scenario we are interested in analysing here is that of a physical mobile robot moving through a real-world environment, and determining its position in continuous Cartesian coordinates $〈 x, y 〉$ , based on external signals (sensory perception $\vec{P}$ ), as well as previous hypotheses regarding the robot’s position. This scenario, which is similar to evidence-based localisation [8], [9], [10] and simultaneous localisation and mapping [11], [12] scenarios, is shown in Fig. 3.

In contrast to

Application 2: Robot training

In a second set of experiments our objective was to obtain transparent, analysable models of a reactive sensor–motor task. As before in the localisation task, our prime objective was the aspect of transparent modelling, as well as achieving a specific task.

To obtain the data needed for modelling the sensor–motor mapping, we employed the experimental method of robot teaching, sometimes referred to as “behavioural cloning” (see [14] for a review article). This method has successfully been used in

Summary

In this paper we present two applications of the RobotMODIC process. In the first case, a linear ARMAX polynomial for robot self-localisation is estimated, which gives the robot’s position as a function of its laser and sonar sensor perceptions.

Initially, no refinement was made to the estimated model polynomial, i.e. all sensors were taken into account to produce the model. By looking at the contribution of each model term in the resulting polynomial it could be seen that not all sensor inputs

Acknowledgements

The RobotMODIC project is supported by the British Engineering and Physical Sciences Research Council under grant GR/S30955/01. Roberto Iglesias is supported through the Spanish research grants PGIDIT04TIC206011PR, TIC2003-09400-C04-03, and TIN2005-03844.

Ulrich Nehmzow is Reader in Analytical and Cognitive Robotics in the Department of Computer Science at the University of Essex.

References (22)

D. Fox et al.
Active Markov localisation for mobile robots
Robotics and Autonomous Systems
(1998)
N. Tomatis et al.
Hybrid simultaneous localization and map building: a natural integration of topological and metric
Robotics and Autonomous Systems
(2003)
U. Nehmzow et al.
Robotmodic: Modelling, identification and characterisation of mobile robots
R. Pearson
Discrete-time Dynamic Models
(1999)
S.A. Billings et al.
The determination of multivariable nonlinear models for dynamical systems
M. Korenberg et al.
Orthogonal parameter estimation algorithm for non-linear stochastic systems
International Journal of Control
(1988)
S.A. Billings et al.
Correlation based model validity tests for non-linear models
International Journal of Control
(1986)
T. Kyriacou, U. Nehmzow, R. Iglesias, S.A. Billings, Cross-platform programming through system identification, in:...
U. Nehmzow
Scientific Methods in Mobile Robotics—Quantitative Analysis of Agent Behaviour
(2006)
S. Thrun
Bayesian landmark learning for mobile robot localisation
Machine Learning
(1998)

F. Dellaert, D. Fox, W. Burgard, S. Thrun, Monte carlo localization for mobile robots, in: Proceedings of the IEEE...

Cited by (11)

Learning robot reaching motions by demonstration using nonlinear autoregressive models
2018, Robotics and Autonomous Systems
Citation Excerpt :
System identification techniques using autoregressive models were applied before to teach wheeled mobile robots by demonstration. Autoregressive moving average model with exogenous inputs (ARMAX) and nonlinear ARMAX (NARMAX) were used in [10] and [11] to model sensor–motor mappings, where the sensory information of the demonstrations are used to obtain a model that is used directly to control the robot. The learned models, which requires a considerable amount of data and demonstrations to be identified, were used in reactive tasks such as wall following, corridor following, door traversal and route learning.
This paper presents NAR-RM, a method for learning robot reaching motions from a set of demonstrations using Nonlinear AutoRegressive (NAR) polynomial models. Reaching motions are modeled as solutions to autonomous discrete-time nonlinear dynamical systems, so that the movements started near the data of the demonstrations follow the trained trajectories and always reach and stop at the target. Since NAR models obtained using standard system identification techniques do not always adequately model the reaching motions, in this paper we present a method that uses a least-squares estimator with constraints to impose the location of fixed points in the model. With the imposition of new fixed points it is possible to change the location of the original fixed points of the model, thus allowing the learning of stable reaching motions. We evaluate our method using a library of human handwriting motions, a mobile robot and an industrial manipulator.
Sensor counter approach for a mobile robot to navigate a path using programming by demonstration
2012, Procedia Engineering
This paper presents a sensor counter approach for a mobile robot to navigate a path using programming by demonstration. In this paper a hybrid method which uses sensor values and counter values as path variables has been proposed, in order to avoid the dynamic obstacle in the environment we propose an obstacle avoidance algorithm (OAA) which is merged with hybrid method. Proposed method has been implemented and tested in a mobile robot platform AAMoR-1. Experiments has been done in an real time test environment to find the accuracy of the robot, the robot has been taught a particular trajectory by a human operator and later changing the robot to autonomous mode where in it moves in the same trajectory as taught, obstacle avoidance algorithm has also been tested by adding an obstacle in the taught path. Good test results have been obtained.
Simultaneous learning of perception and action in mobile robots
2010, Robotics and Autonomous Systems
One of the main problems of robots is the lack of adaptability and the need for adjustment every time the robot changes its working place. To solve this, we propose a learning approach for mobile robots using a reinforcement-based strategy and a dynamic sensor-state mapping. This strategy, practically parameterless, minimises the adjustments needed when the robot operates in a different environment or performs a different task.
Our system will simultaneously learn the state space and the action to execute on each state. The learning algorithm will attempt to maximise the time before a robot failure in order to obtain a control policy suited to the desired behaviour, thus providing a more interpretable learning process. The state representation will be created dynamically, starting with an empty state space and adding new states as the robot finds new situations that has not seen before. A dynamic creation of the state representation will avoid the classic, error-prone and cyclic process of designing and testing an ad hoc representation. We performed an exhaustive study of our approach, comparing it with other classic strategies. Unexpectedly, learning both perception and action does not increase the learning time.
Model identification and model analysis in robot training
2008, Robotics and Autonomous Systems
Citation Excerpt :
Through the use of a system identification approach the behaviour of the robot is modelled through a polynomial representation that is easily and accurately transferable to any robot platform with similar sensor configuration [2]. Moreover, this polynomial representation can be analysed to understand the main aspects involved in robot behaviour: we can for instance identify the most relevant hardware components of the robot (e.g. sensors) [3,1], or predict the robot’s response to particular inputs [4]. The robot-training process we proposed works in two stages: first, the robot is driven under manual control demonstrating the target behaviour.
Robot training is a fast and efficient method of obtaining robot control code. Many current machine learning paradigms used for this purpose, however, result in opaque models that are difficult, if not impossible to analyse, which is an impediment in safety-critical applications or application scenarios where humans and robots occupy the same workspace.
In experiments with a Magellan Pro mobile robot we demonstrate that it is possible to obtain transparent models of sensor-motor couplings that are amenable to subsequent analysis, and how such analysis can be used to refine and tune the models post hoc.
Task identification and characterisation in mobile robotics through non-linear modelling
2007, Robotics and Autonomous Systems
Citation Excerpt :
An important part of the project has been aimed towards the “identification”–in the sense of mathematical modelling–of the mobile robot’s motion responses to perceptual stimuli (task identification). In this context, the transparent analysable modelling approaches ARMAX and NARMAX have been applied to solve different tasks, like door traversal [13–15], route learning [16], or even a visual task like teaching a mobile robot how to track a moving football [17]. Besides the fact that the models we obtained were able to control the robot properly, the most interesting aspect is that we were able to analyse these models to predict the behaviour of the robot [17], and even to formulate new hypotheses concerning the robot’s operation–for example concerning the relevance and irrelevance of certain sensors for a given task.
The lack of a theory-based design methodology for mobile robot control programs means that control programs have to be developed through an empirical trial-and-error process. This can be costly, time consuming and error prone.
In this paper we show how to develop a theory of robot–environment interaction, which would overcome the above problem. We show how we can model a mobile robot’s task (so-called “task identification”) using non-linear polynomial models (NARMAX), which can subsequently be formally analysed using established mathematical methods. This provides an understanding of the underlying phenomena governing the robot’s behaviour.
Apart from the paper’s main objective of formally analysing robot–environment interaction, the task identification process has further benefits, such as the fast and convenient cross-platform transfer of robot control programs (“Robot Java”), parsimonious task representations (memory issues) and very fast control code execution times.
Robot Programming from Fish Demonstrations
2023, Biomimetics

View all citing articles on Scopus

Ulrich Nehmzow is Reader in Analytical and Cognitive Robotics in the Department of Computer Science at the University of Essex.

He obtained his Diploma in Electrical Engineering and Information Science from the University of Technology at Munich in 1988, and his Ph.D. in Artificial Intelligence from the University of Edinburgh in 1992. He is a chartered engineer (CEng) and a member of the IEE. After a postdoctoral position at Edinburgh University he became Lecturer in Artificial Intelligence at the University of Manchester in 1994, and Senior Lecturer in Robotics at the University of Essex in 2001. His research interests are scientific methods in robotics, robot learning, robot navigation, simulation and modelling and novelty detection for industrial applications of robotics.

Ulrich Nehmzow is the co-chair of the British conference on mobile robotics (TAROS) and member of the editorial board of Journal of Connection Science and the AISB journal. He is secretary of the International Society for Adaptive Behavior.

Roberto Iglesias received the B.S. and Ph.D. degrees in physics from the University of Santiago de Compostela, Spain, in 1996 and 2003, respectively. He is currently an Associate Professor in the Department of Electronics and Computer Science at the University of Santiago de Compostela, Spain. His research interests focus on control and navigation in mobile robotics and machine learning, mainly reinforcement learning and artificial neural networks.

Theocharis Kyriacou is currently a Senior Research Officer at the University of Essex (UK). He obtained his Ph.D. in Vision-Based Urban Navigation Procedures for Verbally Instructed Robots from the University of Plymouth (UK) in 2004. He earned his B.Eng. (Honours) degree in Electronic Engineering (Systems) from the University of Sheffield in 2000. Prior to studying for his undergraduate degree he obtained a Higher Engineering diploma in Electrical Engineering from the Higher Technical Institute (H.T.I.) of Cyprus.

Stephen A. Billings received the B.Eng. degree in Electrical Engineering with first class honours from the University of Liverpool in 1972, the degree of Ph.D. in Control Systems Engineering from the University of Sheffield in 1976, and the degree of D.Eng. from the University of Liverpool in 1990. He is a Chartered Engineer (CEng), Chartered Mathematician (CMath), Fellow of the IEE (UK) and Fellow of the Institute of Mathematics and its Applications. He was appointed as Professor in the Department of Automatic Control and Systems Engineering, University of Sheffield, UK in 1990 and leads the Signal Processing and Complex Systems research group. His research interests include system identification and information processing for nonlinear systems, NARMAX methods, model validation, prediction, spectral analysis, adaptive systems, nonlinear systems analysis and design, neural networks, wavelets, fractals, machine vision, cellular automata, spatio-temporal systems, fMRI and optical imagery of the brain.

View full text

Robot learning through task identification

Abstract

Introduction

Section snippets

Application 1: Mobile robot self-localisation

Application 2: Robot training

Summary

Acknowledgements

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotmodic: Modelling, identification and characterisation of mobile robots

Discrete-time Dynamic Models

The determination of multivariable nonlinear models for dynamical systems

Orthogonal parameter estimation algorithm for non-linear stochastic systems

International Journal of Control

Correlation based model validity tests for non-linear models

International Journal of Control

Scientific Methods in Mobile Robotics—Quantitative Analysis of Agent Behaviour

Bayesian landmark learning for mobile robot localisation

Machine Learning