Prediction of Vibrations as a Measure of Terrain Traversability in Outdoor Structured and Natural Environments

Bekhti, Mohammed Abdessamad; Kobayashi, Yuichi

doi:10.1007/978-3-319-29451-3_23

Mohammed Abdessamad Bekhti¹⁷ &
Yuichi Kobayashi¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9431))

Included in the following conference series:

Image and Video Technology

2636 Accesses
5 Citations

Abstract

Terrain recognition is an important task that a mobile robot has to accomplish autonomously to navigate in hazardous territories safely with no additional human monitoring. For this, sensory information should be employed to construct a good model to estimate the degree of traversability of upcoming terrains. In this paper, a regression-based method is proposed to estimate mobile robot vibration from terrain images as a description for terrain traversability. Texture attributes obtained from evaluation of the fractal dimension to describe the terrains were combined with appropriate acceleration features for function approximation using Gaussian Process regression (GP). Results showed effectiveness of the method to predict motion data for different terrain configurations in structured and rough environments.

You have full access to this open access chapter, Download conference paper PDF

Energy-Aware Terrain Analysis for Mobile Robot Exploration

An Adaptive Genetic Algorithm Based Neural Network for Traversability Analysis Under Uncertain Terrain Conditions

Variational Hilbert Regression with Applications to Terrain Modeling

Keywords

1 Introduction

Autonomous mobile robot navigation in natural environments is a challenging task, due to variety of environment configurations a mobile platform may encounter and lack of a precise prior knowledge of the terrain conditions [1]. Thereby, it is important to design a highly autonomous recognition system that will allow the mobile platform to rely less on human instructions to assess terrain traversability, i.e., terrain representation can evolve through interaction with new terrain scenarios without referring to predefined set of terrain classes such as grass, rocks, mud, etc.

Several sensors to perceive the environment have been used to develop frameworks for traversability. They can be categorized as exteroceptive and proprioceptive sensors [2]. Exteroceptive sensors such as cameras and lasers give a direct representation of the environment. In addition to this, wheel encoders and proprioceptive sensors such as vibration sensors, current sensors, gyroscopes provide complementary information about the environment, and thus the importance of combining both type of sensors to improve terrain recognition and reducing the dependency on human design.

A speed control system was developed based on roughness identification using a 2D laser [3]. Terrain scans return the height of the point on the soil which are compared with measures of flatness obtained by scanning the height of a flat surface in front of vehicle. Based on the comparison of the four parameters used to describe the terrain roughness with predefined thresholds, the speed is controlled to be either reduced, augmented or maintained at the current level.

Shape attributes of the ground were extensively investigated for traversability estimation. The idea is to use 3D environment analysis methods and chart the mobility smoothness in a hierarchical fashion. Based on this, 3D Ladar information was clustered according to scatterness (grass), surfaceness (ground), and linearness (tree trunc) for terrain modeling [4]. 2D lasers are plane scanners and thus unable to describe the geometry of the soil, also obstacle detection (e.g., [5]) may fail if the scanned plane is high. On the other hand, 3D lasers or lidars provide detailed representation of the scenery [6], but still are costly, consume more energy to perform.

Cameras are undoubtedly more preferred since they are much affordable, easier to integrate, and can furnish various information such as color, texture, etc. In this context, several methods have been implemented methods which used both image and geometry data for traversability estimation.

Investigating elevation of height profile allowed detection and categorization of obstacles into negative (ditches, holes) and positive (bump) obstacles [7]. Terrain categorization was done through classification of color information within predefined material-class. Obstacle degree of traversability was inferred by combining range information and classification outcome. In the same context, an algorithm for self-supervise terrain classification combining geometrical and visual features was proposed [8]. With regard to image information, HSV color expressed by a four dimensional vector and wavelet-based texture information were used. Features are then fused using a naive Bayes classifier. Both previously mentioned methods lack use of low level feedback information (proprioceptive data) related to the moving platform that can be used as ground truth.

In the context of proprioceptive data, vibrations generated from interaction of the wheels and the soil was used to perform terrain classification [9]. The acceleration signal is divided into small sections which will serve to evaluate the Power Spectral Density (PSD). To categorize terrain samples Linear Discriminant Analysis (LDA) was employed. Vibration data may provide a good understanding of robot behavior while navigating but it is not enough to anticipate upcoming terrains and scenarios such as obstacles since it is limited to underfoot only.

Latest research projects showed the effectiveness of learning from both exteroceptive and proprioceptive sensory information to estimate the vehicle behavior toward subsequent terrains. Image color information was employed by Krebs et al. to approximate the interaction of the mobile platform and the ground based on a classification framework [10]. The goal of training process was to learn terrain understanding through interaction with several terrain configurations. Rover Terrain Interaction (RTI) attributes were combined with corresponding exteroceptive data to build an inference model. It showed effectiveness for RTI parameters such as slip and vibration.

Regression of terrain traversability based on robot pose was proposed by Ho et al. [11]. Vehicle attitude was estimated when running on deformable terrains from an approximation done on rigid grounds and terrain geometry attributes using Gaussian Process regression (GP) [15]. For the final objective of navigation, geometry is just an intermediate representation. It is possible to directly predict motion property from other sensory modalities like cameras which can provide 3D map of the environment.

Image texture information can furnish a detailed representation about terrain natures due to the high dimension of the feature vector compared with color information [10], and thus enables a good prediction for traversability. Nonetheless, classifying terrain using only texture attribute may not be suitable. Another issue in classification framework is the non-consistency of attributed labels, since it relies on the judgment of the designer.

This paper addresses approximation of terrain signature i.e., vibrations from mobile robot motion information based on learned data association of exteroceptive and proprioceptive information using GP regression. With regard to exteroceptive data, texture information based on fractal dimension is used to describe terrain configurations. Also vibration i.e., vertical acceleration generated from interaction between the mobile platform and the soil was considered in this study case. Compared with semi-supervised methods, the proposed framework uses acceleration signals obtained from interaction of the vehicle with the soil as ground truth. The proposed approach is able to estimate the vibration that the mobile platform may encounter when running on a terrain regardless of its nature.

2 Problem Definition

2.1 Objective

A camera and an acceleration sensor are mounted on a four wheel mobile platform as shown in Fig. 1. The camera acquires terrain images and the acceleration sensor registers acceleration signal generated during a sequence of a run. The target is to predict vibration feature as a measure of traversability only from terrain images. This information will be used to understand the vehicle behavior towards subsequent terrains and is expected to enable a safe run.

2.2 Methodology

From terrain images, Region of Interest (ROI) are determined for image feature extraction. In this case study, from every single image two terrain patches (ROIs) will be used. Texture attribute evaluated by measuring the fractal dimension [13] is retained. Registered acceleration signal is divided into small sequences to measure motion features according to extracted terrain patches. For function approximation, GP regression is employed to learn data correlation of texture and vibration features. Based on this result, prediction of vibration features is performed using only image information.

3 Regression of Motion Information

3.1 Terrain Patches Identification

From terrain images, patches corresponding to a short range distance will be considered for texture extraction. For this, the camera model is used to define the ROIs (refer to [12] for further details). As shown in Fig. 2, the camera is characterized by the camera frame $\lbrace {O_c,x_c,y_c,z_c}\rbrace $ whose center is $O_c$. The following equation translates a point from the world reference frame represented by $P \in \mathbb {R}^4$ to $p\in \mathbb {R}^3$ (both containing one as the last element) onto the image plane:

$$\begin{aligned} p=\mathbf K [\mathbf R |t]P, \end{aligned}$$

(1)

where $[\mathbf R |t] \in \mathbb {R}^{4\times 4}$ is the extrinsic parameter matrix, and $\mathbf K \in \mathbb {R}^{4\times 3}$ is the camera parameter matrix.

The above-mentioned model will be used to determine terrain the vehicle is traversing using robot’s wheels base L and width W. As shown in Fig. 3, both camera and world references frame are center of the platform. The world reference frame is given by $\lbrace {O_w,x_w,y_w,z_w}\rbrace $ with $O_w$ as origin. The goal of the process is to map points to identify the inner and outer bounds of terrain region crossed by the vehicle. Since all points lay on the same line, only the depth components according to the $w_z$-axis will vary. For both left and right sides, these points are expressed as follows:

$$\begin{aligned} \left\{ \begin{array}{l} P_{ro} = [{\frac{L}{2}}+W,\quad 0,\quad z_w] \\ P_{\text {ri}} = [{\frac{L}{2}},\quad 0,\quad z_w]\\ P_{\text {lo}} = [-{\frac{L}{2}}-W,\quad 0,\quad z_w] \\ P_{\text {li}} = [-{\frac{L}{2}},\quad 0,\quad z_w] \end{array}\right. , \end{aligned}$$

(2)

where $P_{\text {ro}}$, $P_{\text {ri}}$ denote the points laying on the right outer and inner bounds respectively and $P_{\text {lo}}$, $P_{\text {li}}$ the left outer and inner bounds. Results of this operation are shown in Fig. 4. To increase the number of terrain sample, two terrain patches will be extracted from a single image. Warp perspective transformation is performed to remove the trapezoidal shape introduced when taking a picture.

3.2 Texture Attribute Extraction

Texture attributes are extracted using Segmentation-based Fractal Texture Analysis (SFTA) [13]. The SFTA method consists of two steps. In the first step, the input grayscale image is decomposed into binary images using the Two-Threshold Binary Decomposition (TTBD) [13]. In the second step, each binary image is used to calculate the fractal dimension from its region boundaries. TTBD computes a set of thresholds T, which is acheived by the Multi-level Otsu algorithm [14]. After obtaining the desired threshold number $n_\text {t}$, pairs of contiguous thresholds $T\cup \{n_\text {l}\}$ with $n_\text {l}$ the highest value in the input gray scale image, associated with pairs of thresholds $\{t,n_\text {t}\}$ with $t \in T$, are employed to obtain the binary images as follows

$$\begin{aligned} I^{b}_{{x,y}}= {\left\{ \begin{array}{ll} 1, &{}\text {if } t_\text {l} < I(x,y) \le t_\text {u} \\ 0, &{} \text {otherwise} \end{array}\right. }, \end{aligned}$$

(3)

where $I^{b}_{{x,y}}$ denotes the binary value of image pixel (x, y), $t_\text {l}$ and $t_\text {u}$ denote the lower and the upper threshold values respectively. Consequently, $2n_\text {t}$ binary images will be obtained. In the application of this paper, SFTA feature vector will include only the fractal dimension of the boundaries. $\varDelta (x,y)$ represents the border image of the binary image $I^{b}_{{x,y}}$, and is evaluated as

$$\begin{aligned} \varDelta (x,y)= {\left\{ \begin{array}{ll} 1, &{}\text {if } \exists (x',y') \in N_{x,y } \\ &{}\textit{ s.t. } I^{b}_{{x',y'}}=0 \wedge I^{b}_{{x,y}}=1\\ 0, &{}\text {otherwise} \end{array}\right. }, \end{aligned}$$

(4)

where $N_{x,y}$ is the 8-connexity of a pixel (x, y). $\varDelta (x,y)$ takes the value one if the pixel at location (x, y) in the related binary image $I^{b}_{{x,y}}$ has the value one and having at least one neighbor pixel with zero value. The border image is used to compute the fractal dimension $D \in \mathbb {R}$ by the box counting algorithm as follows

$$\begin{aligned} D_0(x,y)=\lim _{\varepsilon \rightarrow 0} \frac{\log N(\varepsilon )}{\log (\varepsilon ^{-1})}, \end{aligned}$$

(5)

where $N(\varepsilon )$ is the number of hyper-cubes of length $\varepsilon $ that fill the object. The resulting SFTA feature vector is denoted by $\varvec{x}\in \mathbb {R}^{2n_\text {t}}$.

3.3 Motion Feature Extraction

Let $a_k, k=1,\cdots ,K$ be the vertical acceleration signal translating vibrations during a run on a short range distance with time step k and K is the total time steps. Amplitude distance was retained to describe the motion of the platform, and is given by

$$\begin{aligned} a_{k_\text {d}} = \max _{k=1,\cdots ,K} a_k - \min _{k=1,\cdots ,K} a_k. \end{aligned}$$

(6)

3.4 Matching Image and Motion Features

As mentioned above, two short terrain segments are considered for image feature extraction. Accordingly, relevant acceleration segment for motion feature extraction is matched to image features using coordinate transformation and odometry. As shown in Fig. 5, at time $t=0$, the robot is located at the origin of the world reference frame. Location to take a new picture is denoted as $Z_{I_i(x,y)}$ and is expressed as

$$\begin{aligned} Z_{I_i(x,y)}=id_s, \quad i = 1,\cdots ,N_\text {image}, \end{aligned}$$

(7)

where $d_s$ denotes sampling distance, and $N_\text {image}$ is the total number of terrain images acquired during a run. Due to the camera tilt angle $\alpha $, terrains will be covered from a certain position expressed as

$$\begin{aligned} Z_{I_i(x,y)} + Z_{\text {BZ}}, \quad i = 1,\cdots ,N_\text {image}, \end{aligned}$$

(8)

where $Z_{\text {BZ}}$ is area not covered by the image, which is called blind zone. As mentioned above, a short range distance l covered by the images will be retained for image feature extraction. A reason for that is the further a pixel (x, y) is from the camera focus point, the more noise the pixel will be subject to. Thus visual representation of the environment may not be faithful to the real world. Acceleration signal sequence generated by the distance l is used for motion feature extraction, and is bounded according to the ROI as follows

$$\begin{aligned} A_i = [Z_{I_i(x,y)} + Z_{\text {BZ}}, Z_{I_i(x,y)} + Z_{\text {BZ}} + l], \end{aligned}$$

(9)

where $A_i$ denotes the ROI.

3.5 Regression Analysis using Gaussian Process (GP)

Gaussian process [15] is used for regression analysis. Let $\mathcal {D}=\left\{ (\varvec{x}_j,y_j),\right. $ $\left. j = 1,\cdots ,n \right\} $ denote a set of training samples with pairs of SFTA feature vector inputs $\varvec{x}_j$ and motion feature outputs $y_j \in \mathbb {R}$, n denotes the total number of training samples. The goal is to evaluate the predictive distribution for the function $f_{\varvec{*}}$ for test input samples $\varvec{x}_{\varvec{*}}$. The noise is additive, independent and following a normal distribution. The relationship between the function $f(\varvec{x})$ and the observed noisy targets y is given by

$$\begin{aligned} y_j = f(\varvec{x}_j) + \varepsilon _j, \end{aligned}$$

(10)

where $\varepsilon _j$ is a noise which follows $\mathcal {N}(0,\sigma _{\text {noise}}^2)$, where $\sigma _{\text {noise}}^2$ denotes the variance of the noise. The notation $\mathcal {N}(\mathbf a ,A)$ is retained for the normal distribution with mean a and covariance A. Gaussian Process regression is a Bayesian algorithm that assumes that a priori the function values respond as

$$\begin{aligned} p(\mathbf f |\varvec{x}_1,\varvec{x}_2,\cdots ,\varvec{x}_n)=\mathcal {N}(0,K), \end{aligned}$$

(11)

where $\mathbf f =[f_1,f_2,\cdots ,f_n]^\top $ contains the latent function values, $f_j = f(\varvec{x}_j)$ and K is a covariance matrix which inputs are provided by the covariance function defined as $K_{ij}=k(\varvec{x}_i,\varvec{x}_j)$. A widely employed covariance function is the squared exponential given by

$$\begin{aligned} K_{ij}=k(\varvec{x}_i,\varvec{x}_j)=\sigma ^2\exp \left( -\frac{\left( \varvec{x}_i-\varvec{x}_j\right) ^\top \left( \varvec{x}_i-\varvec{x}_j\right) }{2\lambda ^2}\right) , \end{aligned}$$

(12)

where $\sigma ^2$ controls the variance, and $\lambda $ is the isotropic lengthscale parameter that describes the smoothness of a function.

4 Experiment

4.1 Experiment Settings

To predict motion information based on Gaussian Process regression, the vehicle was engaged extensively in 30 different outdoor environment configurations to acquire terrain observations (images) and corresponding acceleration signals. The goal of experiment is to assess performance of proposed framework to predict motion features. Samples of terrain configurations are shown in Figs. 6 and 7. Training was done using a set of 1208 terrain patches and corresponding acceleration features. An extra set of 109 image features were used for testing. Experiment parameters were set as follows

Camera height h: 540 [mm],
Camera tilt angle $\alpha $: 31[deg],
Number of thresholds for SFTA $n_\text {t}$: 8,
Dimension of SFTA feature vector $\varvec{x}\in \mathbb {R}^{16}$,
ROI length l: 250 [mm],
Number of images used for function approximation $N_\text {tr}$: 1208,
Number of images used for prediction $N_\text {pr}$: 109,
Acceleration signal sampling period $T_s$: 10 [ms],
Step distance for taking images $d_s$: 500 [mm].

4.2 Results

Effectiveness of the proposed approach was evaluated based on whether the real motion feature belongs to $1\sigma $ trust interval. The results are introduced by Fig. 8. A total of 76 terrain samples of different natures were successfully approximated within the trust interval. Moreover, the fractal dimension offers a discrimination of terrains texture attributes. Principal Component Analysis (PCA) was applied to SFTA feature vector to reduce the dimension from $\varvec{x}\in \mathbb {R}^{16}$ to $\varvec{z}\in \mathbb {R}^{2}$ to visualize the distribution of the motion information. As shown in Fig. 9, terrain configurations are scattered in different areas.

4.3 Discussion

The remaining 33 samples were not estimated within the confidence interval. In the case of terrain configurations shown by Fig. 10(a) and (b), where tree roots appear from the ground creating bumps, motion data could not be predicted well. When the mobile platform crosses terrain deformation or irregularity (tree roots, rocks, etc.), acceleration sensor acquires instant high vibration values i.e., motion information is more localized compared with uniform terrains which generates smoother acceleration signal as shown in Fig. 11. On the other hand, image region can include both grass textures and bumps. More localized (with narrower regions) features to take into account irregularities of the soil will improve this problem. Thus, sufficient number of data will be required also for terrain irregularities to achieve a good approximation of motion features.

5 Conclusion and Future Work

In this paper, we introduced a method to assess terrain mobility for mobile robot navigating in outdoor environments based on data association of exteroceptive data (terrain observations) and proprioceptive information (vibrations of the platform). SFTA texture attribute was used to describe the nature of terrains. It can provide a discrimination of terrain texture information and thus the ability to distinguish terrain natures (stones, grass, and asphalt). Using GP regression, a set of training data of different terrain configurations was used for learning by combining both image and motion attributes, and a different set of terrain images was used for validation. Obtained results showed a good motion data prediction with 70 % success rate. SFTA feature vector is a non localized representation of terrain characteristics with regard to the running information which is more local as of its instantaneous nature. A more localized image representation may be suitable for predicting local motion data of the mobile robot when running on a terrain with irregularities. More general problems could influence the image features such as high brightness which may lead to a strong reflectance from the ground. Also, perspective effect of the camera leads to a less appreciation of the details included in the soil.

References

Chilian, A., Hirschmuller, H.: Stereo camera based navigation of mobile robots on rough terrain. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4571–4576. IEEE Press (2009)
Google Scholar
Sancho-Pradel, D., Gao, Y.: A survey on terrain assessment techniques for autonomous operation of planetary robots. J. Br. Interplanetary Soc. 63, 206–217 (2010)
Google Scholar
Castelnovi, M., Arkin, R., Collins, T.: Reactive speed control system based on terrain roughness detection. In: IEEE International Conference on Robotics and Automation, pp. 891–896. IEEE Press (2005)
Google Scholar
Lalonde, J.-F., Vandapel, N., Huber, D.F., Hebert, M.: Natural terrain classification using three-dimensional ladar data for ground robot mobility. J. Field Robot. 23, 839–861 (2006)
Article Google Scholar
Labayrade, R., Gruyer, D., Royere, C., Perrollaz, M., Aubert, D.: Obstacle Detection Based on Fusion Between Stereovision and 2D Laser Scanner. Mobile Robots: Perception and Navigation (2007)
Google Scholar
Papadakis, P.: Terrain traversability analysis methods for unmanned ground vehicles: a survey. Eng. Appl. Artif. Intel. 26, 1373–1385 (2013)
Article Google Scholar
Bellutta, P., Manduchi, R., Matthies, L., Owens, K., Rankin, A.: Terrain perception for DEMO III. In: IEEE Intelligent Vehicles, pp. 326–331. IEEE Press (2000)
Google Scholar
Brooks, C.A., Iagnemma, K.D.: Self-supervised classification for planetary rover terrain sensing. In: IEEE Aerospace Conference, pp. 1–9. IEEE Press (2007)
Google Scholar
Brooks, C., Iagnemma, K., Dubowsky, S.: Vibration-based terrain analysis for mobile robots. In: IEEE International Conference on Robotics and Automation, pp. 3415–3420. IEEE Press (2005)
Google Scholar
Krebs, A., Pradalier, C., Siegwart, R.: Adaptive rover behavior based on online empirical evaluation: Rover-terrain interaction and near-to-far learning. J. Field Robot. 27, 158–180 (2010)
Google Scholar
Ken, H., Peynot, T., Sukkarieh, S.: A near-to-far non-parametric learning approach for estimating traversability in deformable terrain. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2827–2833. IEEE Press (2013)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2000)
MATH Google Scholar
Costa, A.F., Humpire-Mamani, G., Traina, A.J.M.: An efficient algorithm for fractal analysis of textures. In: 25th Conference on Graphics, Patterns and Images, pp. 39–46. IEEE Press (2012)
Google Scholar
Liao, P., Chen, T., Chung, P.: A fast algorithm for multilevel thresholding. J. Inf. Sci. Eng. 17, 713–727 (2001)
Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)
MATH Google Scholar

Download references

Acknowledgement

This research was partly supported by JSPS KAKENHI Grant Number 25330305.

Author information

Authors and Affiliations

Graduate School of Science and Technology, Shizuoka University, Hamamatsu, Japan
Mohammed Abdessamad Bekhti & Yuichi Kobayashi

Authors

Mohammed Abdessamad Bekhti
View author publications
You can also search for this author in PubMed Google Scholar
Yuichi Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Abdessamad Bekhti .

Editor information

Editors and Affiliations

The University of Western Australia, Crawley, Perth, West Australia, Australia
Thomas Bräunl
University of Otago, Dunedin, New Zealand
Brendan McCane
en Matematicas A.C., Centro de Investigación, Guanajuato, Mexico
Mariano Rivera
Central China Normal University, Wuhan, Hubei, China
Xinguo Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bekhti, M.A., Kobayashi, Y. (2016). Prediction of Vibrations as a Measure of Terrain Traversability in Outdoor Structured and Natural Environments. In: Bräunl, T., McCane, B., Rivera, M., Yu, X. (eds) Image and Video Technology. PSIVT 2015. Lecture Notes in Computer Science(), vol 9431. Springer, Cham. https://doi.org/10.1007/978-3-319-29451-3_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-29451-3_23
Published: 04 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29450-6
Online ISBN: 978-3-319-29451-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)