Active object recognition based on Fourier descriptors clustering

doi:10.1016/j.patrec.2007.06.016

Pattern Recognition Letters

Volume 29, Issue 8, 1 June 2008, Pages 1060-1071

https://doi.org/10.1016/j.patrec.2007.06.016 Get rights and content

Abstract

This paper presents a new 3D object recognition/pose strategy based on Fourier descriptors clustering for silhouettes. The method consists of two parts. Firstly, an off-line process calculates and stores a clustered Fourier descriptors database corresponding to the silhouettes of the synthetic model of the object viewed from multiple viewpoints. Next, an on-line process solves the recognition/pose problem for an object that is sensed by a camera placed at the end of a robotic arm. The method solves the ambiguity problem – due to object symmetries or similar projections belonging to different objects – by taking a minimum number of additional views of the scene which are selected through a heuristic next best view (NBV) algorithm. The method works in reduced computational time conditions and provides identification and pose of the object. A validation test of this method has been carried out in our lab yielding excellent results.

Introduction

3D object recognition is a common task in industrial robotic applications like automatic assembly, inspection or object manipulation. Object recognition involves image data processing by comparing image-based features or image-computable representations with a stored representation of the object. An active recognition system must control the positions and parameters of the vision sensor to make the recognition task more effective. Thus, due to the limited information contained in a 2D image, different objects may have the same appearance from different views. In such cases, a single view is insufficient to identify an object.

A solution to this ambiguity problem consists of moving the sensor to different viewpoints and processing the new information until the uncertainty is resolved. Since the acquisition and processing of new images is an expensive task, it is desirable to take a minimal number of additional views (Deinzer et al., 2003). Therefore, in order to minimize the cost of the whole recognition process an efficient next best view (NBV) algorithm is mandatory.

There are different cost functions to be minimized in a NBV algorithm. Most of them are related with the movement of the sensor (camera) to a new position. For instance, the distance between the initial and final camera positions and the joint angles of the device that supports the camera (usually a manipulator robot) are two typical cost measures.

Classical active recognition systems have three main modules whose efficiency is decisive for the overall performance:

•
The object recognition system (identification and pose estimation).
•
The fusion task (combination of the hypotheses obtained from each sensor position).
•
The next best view planning (optimal sensor position to solve the ambiguity problem).

All these modules can be implemented following different strategies but, in general, most research projects use stochastic or probabilistic models (Borotschnig et al., 1999, Aksoy and Haralick, 2000, Deinzer et al., 2006). Various factors affect the strategy used for recognition, such as the sensor type, object type and object representation scheme.

The key of our active recognition system consists of using a reduced set of Fourier descriptors to connect and develop the recognition phases: object representation, classification, identification, pose estimation, next best view planning and hypothesis fusion.

We focus the object representation on silhouettes because: they can be robustly extracted from images, they are insensitive to surface feature variations – such as color and texture – and, finally, they easily encode the shape information (Poppe and Poel, 2005). The most popular methods for 2D object recognition from silhouettes are based on invariant moments or Fourier descriptors. Invariant moments exhibit the drawback that two completely different silhouettes may have the same low order invariant moments, which may lead to ambiguities in the recognition process. Fourier descriptors yield much more information about the silhouette, and only similar silhouettes exhibit similar Fourier descriptors. Since we consider that the objects to be non-occluded and that the background to be uncluttered, we use a representation scheme in which the silhouettes from different viewpoints are represented by their Fourier descriptors.

This paper is organized as follows. Section 2 presents a brief state of the art on active recognition based on computer vision. Section 3 presents an overview of the method. Section 4 describes our object identification/pose estimation approach. Section 5 details the next best view method. Section 6 shows the performance of our method by carrying out experiments in a real platform and some conclusions are stated in Section 7.

Section snippets

Related research

Previous works on active recognition differ in the way of representing objects, the way they combine information and the way they plan the next observation (Roy et al., 2004). Most 3D representation methods are model-based (CAD, solid geometry schemes and spatial-occupancy representations) and view-based (appearance-based parametric eigenspaces, aspect graph and contours). The representation schemes are characterized by a set of local or global features. The identification process matches

Overview of the method

In this method the scene silhouette (silhouette of the 3D object to be recognized) is recognized among a set of silhouettes (in our case, 80 or 320 per object) of a group of objects through an algorithm based on Fourier descriptors clustering. Therefore, the recognition of the silhouette of the scene involves both identification and pose of the object. The method consists of off-line and on-line parts.

The off-line process consists of building a structured database of silhouettes belonging to a

Statement of the problem

Let us assume a contour l(n) composed of N points on the XY plane: $l (n) = [x (n), y (n)], n = 0, \dots, N - 1$ where the origin of the index n is an arbitrary point of the curve, and n₁ and n₁ + 1 are consecutive points according to a given direction (for example, clockwise direction) over the silhouette. Assume also that points over the curve have been regularized in the sense that two consecutive points are always at the same Euclidean distance. Let us define the silhouette z by the complex sequence: $z (n) = x (n) + j y (n$

Discriminant silhouettes

If only one silhouette (h₁, v₁) verifies expression (11) (L = {(h₁, v₁)}), then there is no ambiguity, the object is recognized and its pose is calculated. If there were more than one verified silhouette (11), then those silhouettes would be similar and, in order to accomplish a reliable recognition decision, more views of the scene must be taken. In this case, the next pose location of the camera is computed through the next best view algorithm (NBV) which will be presented in this section. Before

Recognition test without noise

A validation test of this method has been carried out in our lab. The experimental setup is composed of a Stäubli RX 90 Robot with Jai-CVM1000 a microcamera at its end. This system controls the position and vision direction of the camera, the object always being always centered in the scene. Fig. 7 shows the experimental setup.

In the off-line process, the synthesized models (with 80,000 polygons/object) are built through a VI-910 Konica Minolta 3D Laser scanner. At the same time the silhouette

Conclusion

This paper has presented a new active recognition system. The system turns a 3D object recognition problem into a multiple silhouette recognition problem where images of the same object from multiple view points are considered. Fourier descriptors properties have been used to carry out the silhouette clustering, matching and pose processes.

Our method implies the use of databases with a very large number of stored silhouettes but an efficient version of the matching process with Fourier

Acknowledgements

This work has been supported by the Spanish Projects PBI05-028 JCCM and DPI DPI2006-14794-C02-01.

References (17)

A. Adán et al.
Modeling wave set: Definition and application of a new topological organization for 3D object modeling
Comput. Vis. Image Understand.
(2000)
H. Borotschnig et al.
Appearance based active object recognition
Image Vis. Comput.
(2000)
S. Kovacic et al.
Planning sequences of views for 3D object recognition and pose determination
Pattern Recogn.
(1998)
Aksoy, S., Haralick, R.M., 2000. Probabilistic vs. geometric similarity measures for image retrieval. In: Proceedings...
H. Borotschnig et al.
A comparison of probabilistic, possibilistic and evidence theoretic fusion schemes for active object recognition
Computing
(1999)
E.O. Brigham
The Fast Fourier Transform and Applications
(1988)
B. Bustos et al.
Feature-based similarity search in 3D object databases
ACM Comput. Surv. (CSUR)
(2005)
F. Deinzer et al.
Viewpoint selection. Planning optimal sequences of views for object recognition
Computer Analysis of Images and Patterns
(2003)

There are more references available in the full text version of this article.

Cited by (10)

High-precision and real-time algorithms of multi-object detection, recognition and localization toward ARVD of cooperative spacecrafts
2015, Optik
Citation Excerpt :
Clustering is an unsupervised classification of patterns (feature vectors) into groups (clusters). This method allows us to recognize multiple objects through their membership probabilities for each cluster [21]. A self-organization clustering is proposed for the classification of object pixels which are obtained by above object segmentation.
The spacecraft pose estimation based on vision measurement is widely used as an effective method for proximity operation in autonomous rendezvous and docking (ARVD). In order to estimate pose parameters more effectively, it is necessary to achieve high-precision object localization in video images. However, there are some difficulties to satisfy precision and real-time requirements simultaneously for the localization of multiple objects. In this paper, high-precision and real-time algorithms are proposed for multi-object detection, recognition and localization toward ARVD of cooperative spacecrafts. At first, through the denoise and characteristic analysis of video images, an object detection method is proposed based on adaptive threshold segmentation by threshold optimization with the variances of inter-classes and intra-class; then a recognition algorithm of multi-object is proposed by self-organization clustering; moreover, a high-precision localization algorithm of multi-object is proposed based on bilinear interpolation, which can achieve sub-pixel precision for object centroid estimation. A series of experimental results demonstrate that the proposed algorithms can achieve excellent object detection accuracy of 100% within 120 m, high-precision object localization of 0.004–0.43 pixel at 0.3–150 m distance, and is also characterized with strong anti-disturbance and well real-time of 22 ms average runtime so as to make a sound basis for further engineering tests.
2D shape representation and similarity measurement for 3D recognition problems: An experimental analysis
2012, Pattern Recognition Letters
Citation Excerpt :
In active recognition systems this handicap is addressed by moving the camera to different positions and processing several captures of the object until the uncertainty is resolved. Classical active recognition systems are made up of three main stages: the shape recognition algorithm – which concerns shape identification and shape pose estimation in a 2D context, the fusion stage – in which the combination of the hypotheses obtained from each sensor position is carried out, and the next-best-view planning stage – in which the optimal next sensor positions are computed (González et al., 2008). The two last stages are used to improve the active recognition efficiency, thus reducing hypothesis uncertainty.
One of the most usual strategies for tackling the 3D object recognition problem consists of representing the objects by their appearance. 3D recognition can therefore be converted into a 2D shape recognition matter. This paper is focused on carrying out an in depth qualitative and quantitative analysis with regard to the performance of 2D shape recognition methods when they are used to solve 3D object recognition problems. Well known shape descriptors (contour and regions) and 2D similarities measurements (deterministic and stochastic) are thus combined to evaluate a wide range of solutions. In order to quantify the efficiency of each approach we propose three parameters: Hard Recognition Rate (Hr), Weak Recognition Rate (Wr) and Ambiguous Recognition Rate (Ar). These parameters therefore open the evaluation to active recognition methods which deal with uncertainty. Up to 42 combined methods have been tested on two different experimental platforms using public database models. A detailed report of the results and a discussion, including detailed remarks and recommendations, are presented at the end of the paper.
Heuristic framework to develop active object recognition
2012, RIAI - Revista Iberoamericana de Automatica e Informatica Industrial
Este trabajo presenta un framework para el desarrollo de sistemas activos de reconocimiento de objetos de forma libre. El framework propuesto aborda el problema de incertidumbre presente en los sistemas de reconocimiento de objetos basados en visión monocular mediante un modelo heurístico que permite usar cualquier tipo de vector de características para representar la información de las vistas. De esta manera, se pueden emplear vectores de características que estimen la pose del objeto con mayor precisión que en los tradicionales sistemas estocásticos. La estrategia empleada para el desarrollo del sistema de reconocimiento activo propuesto se basa en agrupar las vistas de los objetos de la base de datos en clusters y, a partir del estudio de la información contenida en ellos, desarrollar de manera eficiente las tareas de clasificación, selección de las posiciones del sensor y el cálculo de la evidencia. El algoritmo de clasificación emplea una máquina de soporte vectorial (SVM) dotando al sistema de reconocimiento de robustez ante pequeñas deformaciones en la apariencia de los objetos por ruido, cambios de iluminación, variaciones en el punto de vista etc. Para la estimación de las posiciones del sensor se utiliza una D-Sphere con el objetivo de reducir la incertidumbre empleando el menor número de movimientos. Además, cada cluster es modelado como una D-Sphere lo que permite de manera off-line evaluar las diferencias de apariencia, entre objetos ambiguos, según el punto de vista desde el que se les observe . Este método ha sido experimentado en un entorno real con un robot manipulador dotado de una webcam en su efector final.
This paper presents a framework for the development of active systems for object recognition. The proposed framework addresses the problem of uncertainty in object recognition based on monocular vision using a heuristic model that allow the implementation of the shape recognition stage by means of feature vectors without stochactic properties (PCA). The strategy employed to develop the proposed active recognition system is based on grouping the views of the objects in the database and clusters. From the study of the information contained in the cluster are efficiently developed the classification task, selection of sensor positions and calculation of the evidence. The classification algorithm uses a support vector machine (SVM) model to provide robustness to small deformations in the appearance of objects by noise, lighting changes, variations in the point of view. The sensor planing stage, which aims to reduce uncertainty using a minimum number of sensor movements, is based on the D-Sphere model. Each cluster is represented by a D-Sphere which allows, in an off-line process, to evaluate the uncertainty between objects hypothesis. This method has been tested in a real environment with a robot equipped with a webcam on the end-effector.
3D object recognition and classification: a systematic literature review
2019, Pattern Analysis and Applications
Active Vision via Extremum Seeking for Robots in Unstructured Environments: Applications in Object Recognition and Manipulation
2018, IEEE Transactions on Automation Science and Engineering
Meta-classifier based on boosted approach for object class recognition
2014, International Review on Computers and Software

View all citing articles on Scopus

View full text

Active object recognition based on Fourier descriptors clustering

Abstract

Introduction

Section snippets

Related research

Overview of the method

Statement of the problem

Discriminant silhouettes

Recognition test without noise

Conclusion

Acknowledgements

Comput. Vis. Image Understand.

Image Vis. Comput.

Pattern Recogn.

A comparison of probabilistic, possibilistic and evidence theoretic fusion schemes for active object recognition

Computing

The Fast Fourier Transform and Applications

Feature-based similarity search in 3D object databases

ACM Comput. Surv. (CSUR)

Viewpoint selection. Planning optimal sequences of views for object recognition

Computer Analysis of Images and Patterns