1 Introduction

Autonomous navigation in rough terrain is an important goal in robotics, in order to achieve it, it is necessary, among other things, a virtual representation of the environment where the robot is going to navigate (a map). This map has to contain information about the traverse cost of the regions to let the robot knows how hard is going to traverse these areas.

This paper presents the results of applying the control theory idea of dynamical system identification to the task of learning traversability cost maps. The rough and dynamic environment is then treated as a dynamical system and the measured data obtained by robot sensors are the descriptors of the environment that help to identify the mathematical model (cost map) of the terrain.

As we have said before, the type of terrain that the RHONN is learning is unstructured or rough terrain that it is found in outdoor environments such as valleys, mountains, rivers, roads etc. The elements of this class of terrain cannot be defined just by traversable and non-traversable but as a continuous interval between those states that measures the difficulty that has a robot to traverse the environment.

There are some related works that deal with unstructured terrain mapping. The LEARCH (LEArning to seaRCH) is a family of imitation learning algorithms [3]. The algorithm tries to find a cost function that makes that a path planner generates the same path as an expert does. The advantage of this is that a priori knowledge of the terrain is not needed to generate the cost map. However, the accuracy of the cost map falls in cells that are far from the experts path. Also, it requires retraining on each new environment, and the training phase is computationally expensive, so it cannot be performed online.

In [2] it is proposed a probabilistic cost map where each cell in the map grid is represented by a distribution over the probable cost of the terrain rather than a single scalar value. This approach allows planning algorithms that use heuristics such as A* to be more efficient in time and memory. However the lack of an automated methodology to assign the values of each grid of the environment makes necessary the assignment of them by a human expert.

The authors in [1] define a terrain traversability gradation method using an online non-supervised learning algorithm to classify images obtained from a stereoscopic camera using the robot’s previous experiences. It is a very flexible algorithm since it can adapt to new scenarios by comparing the terrain images with previously found images, although it is inconvenient that as the number of images increased, also the requirements of memory are, and the number of comparisons is computationally expensive.

In [5] a cost map is generated using the visual characteristics of the road. A SVM (Support Vector Machine) is trained to weight the pixels in the image and determine if it is a valid path. This approach starts with a small database made by a human expert and expands it from its experiences with online training. Only it is limited to the immediate local planning of the area to be crossed.

In [4] a robot dog is trained to classify the cost to cross the terrain, where each point of the terrain is modeled by a paraboloid. The dog classifies the points by which it passes as good or bad and the cost of new terrain is obtained using this classification and some parameters of paraboloids. The disadvantage of this approach is that they only use the data that the dog learned as a table, which means that it is not able to correctly classify points that are not in the database, or in their absence, it is required a very large database to be able to sort the points correctly.

Recurrent High Order Neural Network (RHONNs) as described in [6, 7] are very good for modeling and identifying dynamic systems that can be both linear and nonlinear. These characteristics are useful since the environment in which a robot moves can change as there are external agents that modify its path. An unstable climate that causes rain and turns streams into flows or an earthquake that generates debris are agents that can transform a previous path that was safe in a practically impassable one. If we take these aspects into account, we can consider that any real navigation environment is a dynamic system.

This paper is organized as follows: In Sect. 2 it is introduced the RHONN’s theory trained with EKF, Sect. 3 is devoted to showing the design of our solution to learn traversability cost maps using RHONNs with EKF as well as the experimental results obtained with our approach. Section 4 presents the conclusions of this work.

2 Recurrent High Order Neural Networks

Or RHONNs for short, are a discrete-time nonlinear neural network which have had a wide use in modeling dynamical systems [6, 7], the model of the RHONN is described by:

$$\begin{aligned} {x}_{i}(k + 1) = w_{i}^\top \! z_{i}( {x}(k) , u(k) ), \ i = 1,\cdots ,n, \end{aligned}$$
(1)

where \(x_{i}\) is the state of the i-th neuron, \(u = [u_{1}, u_{2}, \cdots , u_{m}]^{\top }\) is the input vector to the neural network, \(w_{i}\) are the adjustable weights of the network, and \(z_{i}\) is given by:

$$\begin{aligned} z_{i}(x(k), u(k))&= \begin{bmatrix} z_{i_{1}} z_{i_{2}} \cdots z_{i_{L{i}}} \end{bmatrix}^\top = \begin{bmatrix} \mathop {\prod }\nolimits _{j\in I_{1}}\xi _{i_{j}}^{d_{i_{j}}(1)} \mathop {\prod }\nolimits _{j\in I_{2}}\xi _{i_{j}}^{d_{i_{j}}(2)} \cdots \mathop {\prod }\nolimits _{j\in I_{L_{i}}}\xi _{i_{j}}^{d_{i_{j}}(L_{i})} \end{bmatrix}^\top , \end{aligned}$$
(2)

\(\{I_{1},I_{2},\cdots ,I_{L_{i}}\}\) is a collection of unodered subsets of \(\{1,2,\cdots ,m + n\}\), where \(L_{i}\) represents the number of high order connections, \(d_{i_{j}}(.) \in \mathbb {Z}_{> 0}\), and \(\xi _{i}\) defined as:

$$\begin{aligned} \xi _{i}&= \begin{bmatrix} \xi _{i_{1}} \cdots \xi _{i_{n}} \xi _{i_{n + 1}} \cdots \xi _{i_{n + m}} \end{bmatrix}^\top = \begin{bmatrix} S(x_{1}(k)) \cdots S(x_{n}(k)) u_{1}(k) \cdots u_{m}(k) \end{bmatrix}^\top . \end{aligned}$$
(3)

S(.) is a sigmoid function and is usually represented by:

$$\begin{aligned} S(\varsigma ) = \frac{1}{1 + \exp (-\beta \varsigma )},\ \beta >0, \text { where } \varsigma \in \mathbb {R}. \end{aligned}$$
(4)

2.1 The Extended Kalman Filter (EKF) Training Algorithm

There are a number of methods to train neural networks. The one that we will be using is the EKF algorithm [7], it has a remarkable rate of convergence. The algorithm is described by:

$$\begin{aligned} K_{i}(k)&= P_{i}(k)H_{i}(k)[R_{i}(k) + H_{i}^\top \!(k)P_{i}(k)H_{i}(k)]^{-1}, \end{aligned}$$
(5)
$$\begin{aligned} w_{i}(k + 1)&= w_{i}(k) + \eta _{i}K_{i}[y(k) - \hat{y}(k)],\end{aligned}$$
(6)
$$\begin{aligned} P_{i}(k + 1)&= P_{i}(k) - K_{i}(k)H_{i}^\top (k)P_{i}(k) + Q_{i}(k), \end{aligned}$$
(7)

where \(P_{i} \in \mathbb {R}^{L_{i}\times L_{i}}\), \(w_{i} \in \mathbb {R}^{L_{i}}\) is the weight vector of the network, \(L_{i}\) is the total number of neural network weights, \(y \in \mathbb {R}^m\) is the desired output vector, \(\hat{y} \in \mathbb {R}^m\) is the network output, \(\eta _{i}\) is a learning rate parameter, \(K_{i} \in \mathbb {R}^{L_{i}\times m}\) represents the Kalman gain matrix, \(Q_{i} \in \mathbb {R}^{L_{i}\times L_{i}}\), \(R_{i} \in \mathbb {R}^{m \times m}\), and \(H_{i} \in \mathbb {R}^{L_{i}\times m}\) is defined as:

$$\begin{aligned} H_{ij}(k)=\left[ \frac{\partial {\hat{y}(k)}}{\partial {w_{ij}(k)}}\right] _{w_{i}(k)=\hat{w}_{i}(k+1)},\ i=1,\cdots ,n\ \text {and} \ j=1,\cdots , L_{i}. \end{aligned}$$
(8)

The matrices \(P_{i}, Q_{i}\) and \(R_{i}\) can be initialized as a zero matrix of their respective size.

3 Cost Maps Identification Using RHONN’s

A map of the environment can be seen as \(\mathcal {E} \in \mathbb {R}^{m\times n}\). Each \(u_{ij} \in \mathcal {E}\) is also a vector \(u \in \mathcal {F}\) where \(\mathcal {F} \subseteq \mathbb {R}^l\) is the feature space, l is the number of terrain features that describe the environment. In order for a path planner to generate an optimal path in the environment it requires a transit cost map, thereby to calculate the cost map we need a cost function \(C: \mathcal {F} \rightarrow t_{\mathrm {c}}\) with the transit cost \(t_{\mathrm {c}} \in \mathbb {R}\), i.e. a function that maps describing features from the environment to traversal cost.

We propose to use RHONN to learn the cost function C due to its qualities of generalization and reuse of knowledge. The dynamical rough terrain in which the agent is going to navigate can be seen as a complex dynamical system to be identified and learned by the RHONN. The huge variety of these environments makes it necessary the use of a learning algorithm which helps to avoid to have to present each variation of the environment to the navigation system to obtain a traverse cost function for each one. So, good generalization quality of RHONNs [7] helps to minimize the issue by letting a smaller set of knowledge handle a wider variety of cases. Furthermore, using a RHONN to identify and learn a traverse cost function allows the navigation agent to re-use knowledge (cost functions) that had been learned in previous navigation episodes if the environment is similar. The above, because the RHONNs are based on Hopfield networks which are well-known by their associative memory capabilities [6].

3.1 Experimental Results

To prove using RHONNs as cost function of rough terrain, we designed the next methodology: Each patch of rough terrain was divided into a grid (as can be seen in Figs. 13 and 16). Each cell of this grid was described using four real values that represent features of the terrain: slope, the quantity of rubble present, the density of vegetation and water depth (grids of Figs. 1, 3, 5, 7, 10 and 13). Using these describing values, a human expert produces a map of traversal cost of the terrain, i.e. a human expert maps the four features of each cell of the grid into a real value that represents the difficulty for a navigation agent to cross given patches of terrain (first row of Figs. 2, 4, 6, 8, 11 and 14). We use these expert-generated maps of traversal cost as training data for the RHONN and then we compare the map of traversal cost learned by the RHONN (second row of Figs. 2, 4, 6, 8, 11 and 14) with the one generated by the human expert to obtain the precision error of our system.

Fig. 1.
figure 1

Randomly generated feature map.

Table 1. Generalization test of Fig. 1.
Fig. 2.
figure 2

Comparison of expert’s and RHONN’s outputs of Fig. 1.

Fig. 3.
figure 3

Generated map from Fig. 1 after 90% changed cells

For our experiments we used a RHONN with input \(u = \{u_{1}, u_{2}, u_{3}, u_{4}\} \in \mathbb {R}^4\) where each feature \(u_{i}\) represents slope, rubble, density of vegetation and density of water in that order. Output \(\hat{y} = x \in \mathbb {R}\) where x is the single state vector, and \(\{I_{1},I_{2},\cdots , I_{20} \}\) = {[0 1], [0 2], [0 3], [0 4], [0 5], [1 1], [1 2], [1 3], [1 4], [1 5], [2 2], [2 3], [2 4], [2 5], [3 3], [3 4], [3 5], [4 4], [4 5], [5 5]}. Figures 1 and 5 are randomly generated maps each used to train a different RHONN. Using as training data the mapping examples from experts in Figs. 2 and 6.

Then to test the generalization capabilities of the RHONNs a simulation of a dynamic environment begins by gradually randomizing up to 30% the value of the elements in each feature vector until 90% of the map is changed. Tables 1 and 2 show the mean quadratic error from the RHONNs.

Figures 4 and 8 show that the RHONN cost approximation capability is very good even if the environment is different from the training set.

Finally we test the RHONN with some real world environments as shown in Figs. 9 and 12.

Fig. 4.
figure 4

Comparison of expert’s and RHONN’s outputs of Fig. 3.

Table 2. Generalization test of Fig. 5
Fig. 5.
figure 5

Randomly generated feature map.

Fig. 6.
figure 6

Comparison of expert’s and RHONN’s outputs of Fig. 5.

Fig. 7.
figure 7

Generated map from Fig. 5 after 90% changed cells.

Fig. 8.
figure 8

Comparison of expert’s and RHONN’s outputs of Fig. 7.

Fig. 9.
figure 9

Satellite map of a grove.

Fig. 10.
figure 10

Feature map of Fig. 9.

Fig. 11.
figure 11

Comparison of expert’s and RHONN outputs of Fig. 9.

Fig. 12.
figure 12

A pretty view from a golf field.

Fig. 13.
figure 13

Feature map of Fig. 12.

Fig. 14.
figure 14

Comparison of expert’s and RHONN outputs of Fig. 12.

4 Conclusion

The tests applied to the RHONNs show very encouraging results. The tests show that the learning capability of the RHONNs is very good to recognize the model of the dynamic environment in which a robot could navigate even when these maps are made from an expert’s criterion without additional complexity from the RHONN’s model.

Furthermore, it can be seen from the experimental results that the RHONNs learn the cost function of the environment even when sudden changes have happened, i.e. the RHONNs gave back the correct traverse cost for a patch of terrain that belongs to a previously learned environment even when the descriptors of this patch have changed because the terrain has suffered a sudden change (as unforeseen landslide or flood by an example). These good results can be attributed to the capabilities of the RHONN for identifying and modeling complex dynamical systems. For future work, the identification system using the RHONN will be implemented to learn the navigation environment of a robot in real time.