Elsevier

Pattern Recognition Letters

Volume 29, Issue 8, 1 June 2008, Pages 1060-1071
Pattern Recognition Letters

Active object recognition based on Fourier descriptors clustering

https://doi.org/10.1016/j.patrec.2007.06.016Get rights and content

Abstract

This paper presents a new 3D object recognition/pose strategy based on Fourier descriptors clustering for silhouettes. The method consists of two parts. Firstly, an off-line process calculates and stores a clustered Fourier descriptors database corresponding to the silhouettes of the synthetic model of the object viewed from multiple viewpoints. Next, an on-line process solves the recognition/pose problem for an object that is sensed by a camera placed at the end of a robotic arm. The method solves the ambiguity problem – due to object symmetries or similar projections belonging to different objects – by taking a minimum number of additional views of the scene which are selected through a heuristic next best view (NBV) algorithm. The method works in reduced computational time conditions and provides identification and pose of the object. A validation test of this method has been carried out in our lab yielding excellent results.

Introduction

3D object recognition is a common task in industrial robotic applications like automatic assembly, inspection or object manipulation. Object recognition involves image data processing by comparing image-based features or image-computable representations with a stored representation of the object. An active recognition system must control the positions and parameters of the vision sensor to make the recognition task more effective. Thus, due to the limited information contained in a 2D image, different objects may have the same appearance from different views. In such cases, a single view is insufficient to identify an object.

A solution to this ambiguity problem consists of moving the sensor to different viewpoints and processing the new information until the uncertainty is resolved. Since the acquisition and processing of new images is an expensive task, it is desirable to take a minimal number of additional views (Deinzer et al., 2003). Therefore, in order to minimize the cost of the whole recognition process an efficient next best view (NBV) algorithm is mandatory.

There are different cost functions to be minimized in a NBV algorithm. Most of them are related with the movement of the sensor (camera) to a new position. For instance, the distance between the initial and final camera positions and the joint angles of the device that supports the camera (usually a manipulator robot) are two typical cost measures.

Classical active recognition systems have three main modules whose efficiency is decisive for the overall performance:

  • The object recognition system (identification and pose estimation).

  • The fusion task (combination of the hypotheses obtained from each sensor position).

  • The next best view planning (optimal sensor position to solve the ambiguity problem).

All these modules can be implemented following different strategies but, in general, most research projects use stochastic or probabilistic models (Borotschnig et al., 1999, Aksoy and Haralick, 2000, Deinzer et al., 2006). Various factors affect the strategy used for recognition, such as the sensor type, object type and object representation scheme.

The key of our active recognition system consists of using a reduced set of Fourier descriptors to connect and develop the recognition phases: object representation, classification, identification, pose estimation, next best view planning and hypothesis fusion.

We focus the object representation on silhouettes because: they can be robustly extracted from images, they are insensitive to surface feature variations – such as color and texture – and, finally, they easily encode the shape information (Poppe and Poel, 2005). The most popular methods for 2D object recognition from silhouettes are based on invariant moments or Fourier descriptors. Invariant moments exhibit the drawback that two completely different silhouettes may have the same low order invariant moments, which may lead to ambiguities in the recognition process. Fourier descriptors yield much more information about the silhouette, and only similar silhouettes exhibit similar Fourier descriptors. Since we consider that the objects to be non-occluded and that the background to be uncluttered, we use a representation scheme in which the silhouettes from different viewpoints are represented by their Fourier descriptors.

This paper is organized as follows. Section 2 presents a brief state of the art on active recognition based on computer vision. Section 3 presents an overview of the method. Section 4 describes our object identification/pose estimation approach. Section 5 details the next best view method. Section 6 shows the performance of our method by carrying out experiments in a real platform and some conclusions are stated in Section 7.

Section snippets

Related research

Previous works on active recognition differ in the way of representing objects, the way they combine information and the way they plan the next observation (Roy et al., 2004). Most 3D representation methods are model-based (CAD, solid geometry schemes and spatial-occupancy representations) and view-based (appearance-based parametric eigenspaces, aspect graph and contours). The representation schemes are characterized by a set of local or global features. The identification process matches

Overview of the method

In this method the scene silhouette (silhouette of the 3D object to be recognized) is recognized among a set of silhouettes (in our case, 80 or 320 per object) of a group of objects through an algorithm based on Fourier descriptors clustering. Therefore, the recognition of the silhouette of the scene involves both identification and pose of the object. The method consists of off-line and on-line parts.

The off-line process consists of building a structured database of silhouettes belonging to a

Statement of the problem

Let us assume a contour l(n) composed of N points on the XY plane:l(n)=[x(n),y(n)],n=0,,N-1where the origin of the index n is an arbitrary point of the curve, and n1 and n1 + 1 are consecutive points according to a given direction (for example, clockwise direction) over the silhouette. Assume also that points over the curve have been regularized in the sense that two consecutive points are always at the same Euclidean distance. Let us define the silhouette z by the complex sequence:z(n)=x(n)+jy(n

Discriminant silhouettes

If only one silhouette (h1, v1) verifies expression (11) (L = {(h1, v1)}), then there is no ambiguity, the object is recognized and its pose is calculated. If there were more than one verified silhouette (11), then those silhouettes would be similar and, in order to accomplish a reliable recognition decision, more views of the scene must be taken. In this case, the next pose location of the camera is computed through the next best view algorithm (NBV) which will be presented in this section. Before

Recognition test without noise

A validation test of this method has been carried out in our lab. The experimental setup is composed of a Stäubli RX 90 Robot with Jai-CVM1000 a microcamera at its end. This system controls the position and vision direction of the camera, the object always being always centered in the scene. Fig. 7 shows the experimental setup.

In the off-line process, the synthesized models (with 80,000 polygons/object) are built through a VI-910 Konica Minolta 3D Laser scanner. At the same time the silhouette

Conclusion

This paper has presented a new active recognition system. The system turns a 3D object recognition problem into a multiple silhouette recognition problem where images of the same object from multiple view points are considered. Fourier descriptors properties have been used to carry out the silhouette clustering, matching and pose processes.

Our method implies the use of databases with a very large number of stored silhouettes but an efficient version of the matching process with Fourier

Acknowledgements

This work has been supported by the Spanish Projects PBI05-028 JCCM and DPI DPI2006-14794-C02-01.

References (17)

There are more references available in the full text version of this article.

Cited by (10)

  • High-precision and real-time algorithms of multi-object detection, recognition and localization toward ARVD of cooperative spacecrafts

    2015, Optik
    Citation Excerpt :

    Clustering is an unsupervised classification of patterns (feature vectors) into groups (clusters). This method allows us to recognize multiple objects through their membership probabilities for each cluster [21]. A self-organization clustering is proposed for the classification of object pixels which are obtained by above object segmentation.

  • 2D shape representation and similarity measurement for 3D recognition problems: An experimental analysis

    2012, Pattern Recognition Letters
    Citation Excerpt :

    In active recognition systems this handicap is addressed by moving the camera to different positions and processing several captures of the object until the uncertainty is resolved. Classical active recognition systems are made up of three main stages: the shape recognition algorithm – which concerns shape identification and shape pose estimation in a 2D context, the fusion stage – in which the combination of the hypotheses obtained from each sensor position is carried out, and the next-best-view planning stage – in which the optimal next sensor positions are computed (González et al., 2008). The two last stages are used to improve the active recognition efficiency, thus reducing hypothesis uncertainty.

  • Heuristic framework to develop active object recognition

    2012, RIAI - Revista Iberoamericana de Automatica e Informatica Industrial
  • Meta-classifier based on boosted approach for object class recognition

    2014, International Review on Computers and Software
View all citing articles on Scopus
View full text