Keywords

1 Introduction

Recently, supermarkets (and other similar environments), have been interested in service robots that can provide efficient services to their costumers. However, in order to provide an effective service on these environments, mobile robots face a new localization challenge. A regular supermarket consist of a set of long parallel aisles with shelves on both sides. Normally this distribution is common in most supermarkets and they do not change the distribution of the aisles very often, making it a very static environment (Fig. 1).

Fig. 1.
figure 1

Aisles of a supermarket

Due to the repeated structure of the supermarket aisles the classic localization techniques commonly based on laser range finders are not enough, due to the fact that many aisles of this environment, viewed with its laser sensor, look exactly the same, making the localization a very difficult task. Moreover, coupled with the problem known as closed loop, i.e. recognize when the robot is in the same place that it has visited previously; these problems make the robot easily lost.

The main idea is to try to use the same information as humans. Normally humans use the products on the shelves to identify the aisle where they are, so if a robot is equipped with a camera and a pan/tilt unit, they can try to segment and identify the products to improve its localization.

Even when the aisles are statics the distribution of the products on the shelves is not. Due to the marketing strategies some changes can occur, for example:

  • Products relocations for sales or promotions

  • A new presentation of an existing product

  • Presentation of a new brand or new products

  • Display of seasonal products

The changes with the distribution of the products makes impossible to obtain a static visual map for the robot with such information, so the robot will have to identify the objects at the same time as he goes through the aisles and compute the probability of being on a particular aisle.

To identify the different products on the shelves of the supermarkets is not a trivial task for the robot. On the shelves, we can find different products with different colors, sizes and shapes; and normally they are placed one next to each other, which makes difficult for the robot to segment the different objects. Also, it is expected when the robot follows a client, he will not be looking at the products directly, because the client normally moves in a straight line (or as most as possible) through the aisle. Therefore the robot needs to be able to recognize the products with a considerable distortion from the perspective of the image.

In order to improve the autonomous mobile robot localization, we propose a technique to get additional information from the aisles by detecting at first, the vanishing points from the shelf lines and then segmenting products from the shelves, in order to estimate the correct aisle in function of the quantity and type of products present.

This paper is organized as follow, in Sect. 2, we present some significant related works. On Sect. 3 are explained the proposal approach, beginning with the problem of perspective reconstruction and then by segmenting the products in shelves. Section 4 refers to results and evaluations and finally on Sect. 5 are given the conclusions and future work.

2 Related Work

The incorporation of visual information to the maps, has been already tackled by some research groups; for example, Mariottini and Roumeliotis [7] present a strategy for active vision-based localization. They localize the robot in a large visual-memory map using a series of images. To reduce localization ambiguity that comes from very similar images when the robot navigates in the environment, it is used a Bayesian approach. However, this approach needs to have previous information of the places where he would navigate and changes in the shelves would make that the robot does not recognize the place.

Other works have focussed on an hybrid localization approach, for example, Li et al. [5] and the robot HERB [9] use visual landmarks to improve the robot localization. Both of them present good results, but the visual landmarks are not always suitable for real environments, since they can be blocked, damaged or removed by the humans.

In [2], Lucas, an autonomous mobile robot developed as a library assistant is presented. His localization is based on the fusion of odometry, monocular vision and sonar data, validated with an extended Kalman Filter. The robot uses a simple a priori map and does not use pre-recorded images to aid localization.

The robot TOOMAS, presented in [4], is a shopping guide for home improvement stores. The robot uses a Map-Match-SLAM algorithm [8]. To create the map the robot navigate trough the store and later, based on the store management system, it label the locations of all the articles in the store.

Robocar [3] is another service robot designed as a shopping assistant, in this case for helping blind people to do their shopping independently. In this case, robot localization is done by using RFID tags along the aisles of the supermarket. Even though, to equip the stores with RFID tags, indoor GPS or other active or passive components, could be a practical solution to improve the localization, most store owners prefer not to install markers [4]. They prefer a simple plug & play solution using only the robot’s on-board sensors and advanced navigation techniques. The work presented in [10] proves the importance of identify and localize relevant objects and later incorporate them to the robot map.

In this work, it is proposed to solve navigation ambiguity using geometry of the aisles, in order to be able to segment products on the shelves and then by a probabilistic method determine the most probable position of the robot on the supermarket in real-time and avoiding to not pay attention to their primary user, the client.

3 Perspective Reconstruction

The proposed approach is divided on two main issues: (a) a correct detection of the shelves based on the scene vanishing points, and (b) products segmentation from the shelves.

3.1 Detecting Vanishing Points

As vanishing points are invariant to translation and changes in orientation, they may be used to determine the orientation of the robot. In [2] is presented a robot in a library environment (which has many similarities with a supermarket). On this work the robot’s onboard camera is fixed to the robot axis, then the vanishing point of the image will tell the orientation of the robot with respect to the orientation of the bookshelves. Using the vanishing point detection, the relationship between 2D line segments in the image and the corresponding 3D orientation in the object plane can be established.

Behan [2] states “all the dominant oblique lines will share a common vanishing point due to the parallelepiped nature of the environment”, so the same characteristic can be expected for the supermarket environment. A popular approach for detecting the vanishing points is the one presented by Barnard [1] where based on a Gaussian-sphere he determines the vanishing points to find the orientation of parallel lines and planes. In [6] Magee and Aggarwal present another computationally inexpensive approach for detecting the vanishing points.

Fig. 2.
figure 2

The robot UVerto, the cameras are placed over a PTU.

However, when the camera of the robot is not fixed, i.e. it’s placed over a pan/tilt unit (PTU), determining the vanishing points on the scene viewed by the robot do not correspond with the robot’s orientation. The robot used (Fig. 2) along this work, has its cameras over a PTU. With this cameras, robot should follow user to help him when necessary and simultaneously it needs to locate itself.

In order to detect vanishing points, the lines corresponding to the shelves are identified. However, the presence of the products in the shelves create a very noisy images, so, in order to have a good line detection, a smoothing filter is applied. A bilateral filter was chosen because it present a better conservation of the edges compared with the most common filters. Then to find the lines in the image the Hough line transform was used. From the set of lines detected by the algorithm (denoted by L), all their angles are calculated in order to reject vertical lines as well as some lines not corresponding to the possible shelves orientations. Then the remaining lines R are grouped in pairs to calculate their intersection points I. But a common problem is that depending on the complexity of the scene, the number and distribution of intersection points I can vary a lot, as seen on Fig. 3.

Fig. 3.
figure 3

Examples of the lines and intersection points detected

Thus to face this problem a methodology similar to RANSAC has been proposed to detect vanishing points in the scene. With the lines R and the intersection points I, the vanishing points can be calculated by following the next steps:

  • First the standard deviation from the set of points I is calculated.

  • If the standard deviation of I is smaller than the threshold Th then the mean of I is considered as a vanishing point and the algorithm ends, if not, continue.

  • From I a random subset C is selected.

  • The standard deviation from C is calculated.

  • If the standard deviation of C is smaller than the threshold Th then the mean of C is considered as a vanishing point and the algorithm ends, if not, return to the third step.

After the vanishing point is detected in the image, the robot can segment the shelves into different levels, which are called “slices”. Since the robot will be hardly moving in the center of the aisle, the vanishing points is used to determinate which side of the image will be processed, choosing the side closer to the robot, as it has a bigger part of the visual field.

The slices are formed by following the next steps: from all the lines L in the image, those who have a smaller distance to the vanishing point than a threshold (Th2) are selected and denoted by LF. All the elements in LF are sorted in function of the axis and the image plane. If the distance between two consecutive lines is smaller than a threshold (Th3) they are considered as part of the same shelve and then not processed as a slice.

3.2 Recovering Front Covers of Products

In the image sections we refer as “slices”’ we can find the products displayed on the shelves however a few considerations must be taking into account. First, the image resolution is not the same for all the products, the ones who are closer to the robot have a higher resolution but can be incomplete since the lines sometimes hit the upper or the lower edge of the image. Second, the slices that give more information are generally the ones closer to the center of the image, i.e. shelves in the center of the image.

Fig. 4.
figure 4

Characteristics of the products that make difficult the product segmentation

One intuitive idea for segmenting the products would be to find the vertical lines on the slices but due to perspective deformation of the image and the characteristics of the products (the different shape and size of the boxes, the way they are placed in the shelves, etc., as it is shown in Fig. 4) it is necessary to improve this detection.

Fig. 5.
figure 5

Examples of the vertical lines detected

From the set of edges previously detected in the whole image, we perform a search to detect vertical lines within the slice. We select a pair of edges that intersect the two lines of a slice and are closer to the edge of the image as shown in Fig. 5. Having these four intersections the perspective deformation can be corrected.

In Fig. 6, it is shown the result of extracting and transforming the products on the shelve. The red lines mark the shelve detected with the previously explained approach, the blue lines are the vertical lines selected to perform the perspective correction and in green the intersection points used to perform an homography transformation.

Fig. 6.
figure 6

Examples of the products transformed (Color figure online)

Fig. 7.
figure 7

Example of the products segmentation and analysis

Once it has been rectified this part of the image, the “sub-image” of the slice containing the products is processed to find the edges to separate individual products. Tests performed with the Canny and Sobel algorithms, were not good enough to this end.

So, to improve this detection, the image is transformed to a different space color and it is proceed to analyze one of the channels to extract the edges. For the experiment with the HSV space, the channel V was analyzed and for the CIE L*a*b the channel L was analyze with good results.

The Figs. 7 and 8 show the results for the analysis with the CIE L*a*b space color and an edge detection with the Scharr operator. Finally, histogram of edge points are performed, as can be observed on the images of previous figure. The next step is to classify the objects segmented from the shelves. Once the robot is able to recognize the products he can infer by probability the aisle in which he is located. However a more detail explanation of the object recognition and its implementation is out of the scope of this paper. At this stage of the work, only these histograms are used to compare products on the shelves to the ones captured on a previously build database with the supermarket products. The use of SIFT or similar features are not encourage since they are applied to static environments.

Fig. 8.
figure 8

Example of the products segmentation and analysis

4 Experiments and Results

For the experiments around 3000 images were taken in a real environment at a local supermarket, using a Kinect like sensor mounted on the robot. The images were capture while the robot was moving through the aisles. No customers were present during the data acquisition, however it was tried to emulate the way a robot would navigate in the environment while helping a customer by avoiding to move in a straight line, as he should avoid obstacles, and without facing directly to the shelves.

A few images were selected from the total of images captured in the supermarket for testing our algorithm, the images present small distortion due to movement and belong to cases were the vanishing point is evident for the human and case were it is not.

5 Conclusion

In this paper it has been presented a method for autonomous service robot localization on the aisles of a supermarket. The method uses the products on shelves, in order to compute the probability of been on a certain aisle of the supermarket by counting the number and type of them. Products are segmented form the shelves based on a simple but efficient RANSAC kind algorithm to determine the vanishing points on the images. This work was tested on real images on a semi-structured environment in the presence of a lot of noise, working between 12 to 15 Hz. We were able to combine existing algorithms to the one we propose for the detection of the vanishing points, in order to identify the main lines on the shelves. Having this “slices” reduces the search space and simplifies the object segmentation. The time consumed by our method makes it possible for the robot to perform it on real time. As future work a further classification of the segmented products is proposed. A research for the existing techniques must be performed to select an algorithm that adapts better to our particular problem, giving the images characteristics.