2D shape representation and similarity measurement for 3D recognition problems: An experimental analysis

doi:10.1016/j.patrec.2011.09.033

Pattern Recognition Letters

Volume 33, Issue 2, 15 January 2012, Pages 199-217

https://doi.org/10.1016/j.patrec.2011.09.033 Get rights and content

Abstract

One of the most usual strategies for tackling the 3D object recognition problem consists of representing the objects by their appearance. 3D recognition can therefore be converted into a 2D shape recognition matter. This paper is focused on carrying out an in depth qualitative and quantitative analysis with regard to the performance of 2D shape recognition methods when they are used to solve 3D object recognition problems. Well known shape descriptors (contour and regions) and 2D similarities measurements (deterministic and stochastic) are thus combined to evaluate a wide range of solutions. In order to quantify the efficiency of each approach we propose three parameters: Hard Recognition Rate (Hr), Weak Recognition Rate (Wr) and Ambiguous Recognition Rate (Ar). These parameters therefore open the evaluation to active recognition methods which deal with uncertainty. Up to 42 combined methods have been tested on two different experimental platforms using public database models. A detailed report of the results and a discussion, including detailed remarks and recommendations, are presented at the end of the paper.

Highlights

► Shape recognition methods applied to 3D object recognition. ► Definition of three new parameters to evaluate shape recognition systems. ► Qualitative and quantitative analysis of a set of shape recognition systems.

Introduction

Three-dimensional object recognition is the process of finding an object in a scene. This task implies determining the object’s identity and/or its pose (position and orientation) with regard to a particular reference frame. For instance, in object manipulation with robots, the pose of the object must be extracted through an accurate estimation of the translation and rotation parameters with regard to the robot coordinate system.

In the field of three-dimensional object recognition using a monocular sensor, two main streams appear: view-based (or appearance-based) approaches and structural (or primitive-based) approaches. Since primitive-based approaches yield a low performance when unexpected changes occur in the scene, view-based methods have become a popular representation scheme owing to their robustness to noise, photometric effects, blurred vision and changing illumination. The main advantage of this approach (view-based methods) is that the image of the query object can be directly compared with a set of stored images in a database which are efficient and robust to variations in the scene. Indeed, the 3D problem has led to a 2D shape recognition question in which multiple views associated with the object from different points of views have to be handled. Each view in the database is thus associated with a particular viewpoint that corresponds with the current camera position (position and orientation). From here on, we shall use the term ‘shape’ to refer to the appearance of the object from a specific viewpoint – in a 2D context and ‘object’ as a general word to describe something in a 3D dimension environment. The 3D object pose estimation will signify geometric transformations between the camera position in the scene and the viewpoint from which the object is viewed in the database, whereas shape pose estimation will concern rotation, translation and scale in a 2D context.

Meanwhile, when a single view is taken to recognize an object, the principal problem is that one 2D image frequently provides insufficient information with which to identify the object and correctly estimate its pose. Uncertainty and ambiguity problems frequently arise in such cases owing to the fact that no depth information is available. Different objects might therefore seem to be quite similar from different viewpoints, which affect the robustness of the 3D recognition system. In active recognition systems this handicap is addressed by moving the camera to different positions and processing several captures of the object until the uncertainty is resolved. Classical active recognition systems are made up of three main stages: the shape recognition algorithm – which concerns shape identification and shape pose estimation in a 2D context, the fusion stage – in which the combination of the hypotheses obtained from each sensor position is carried out, and the next-best-view planning stage – in which the optimal next sensor positions are computed (González et al., 2008). The two last stages are used to improve the active recognition efficiency, thus reducing hypothesis uncertainty.

Since the view-based strategy converts 3D object recognition into a 2D shape recognition problem, an enormous amount of approaches concerning how to represent 2D shapes and how to measure similarities between shapes can be found in literature (Bustos et al., 2005). However, to the best of our knowledge, no comparative study of different 2D shape recognition algorithms adapted to view-based 3D recognition systems has yet been reported. In order to provide a solution to this issue, the goal of this paper is simply to carry out an in depth qualitative and quantitative analysis with regard to the performance of 2D shape recognition methods when they are used to solve 3D object recognition problems.

The paper is structured as follows. In Section 2 we tackle the requirements of 2D shape representation models, and compare different representation, identification and shape pose estimation methods to be implemented in 3D applications. Several unclear questions concerning the performance of shape recognition systems in 3D recognition environments are also discussed. Section 3 presents the statement of the experimental tests developed in Sections 4 Recognition using Platform 1, 5 Recognition tests using Platform 2. Sections 4 Recognition using Platform 1, 5 Recognition tests using Platform 2 are focused on evaluating the recognition performance in systems using two platforms. Finally, in Section 6 we present a discussion of the experimental results along with our conclusions.

Section snippets

2D shape representation methods

2D shape representation is carried out through the use of two principal descriptors: contour descriptors and region descriptors. Models based on contours are more popular than those based on regions. Contour-based methods necessitate the extraction of boundary information which, in some cases, might not be available. Region-based methods are more robust to noise and do not necessarily rely on shape boundary information, but do not, however, extract the features of a shape. The desirable

Statement of the experimental tests

It is not easy to make an experimental comparison between different recognition methods since each one is tested under different conditions and with different databases. Moreover, the amount of details in each technique makes it impossible to reproduce the experiments in exactly the same way.

Platform setup

The ALOI-VIEW collection consists of 1000 objects recorded under various imaging circumstances. More specifically, the viewing angle, illumination angle, and illumination color are systematically varied for each object. In our experiment, we have used a collection of objects which have been imaged from viewing angles spanning a range of up to 5°. Fig. 2 shows an example of an object represented from 72 viewpoints. RMS has been tested on 12 objects (see Fig. 3). Note that the objects selected

Platform setup

The objects belonging to the 3DSL dataset have been built in our lab. To do this, a high accuracy three-dimensional mesh model of each object was obtained in advance by means of a laser scanner sensor. Fig. 11 presents a selection of objects from the 3DSL database. Note that the database is composed of both free and polyhedral shapes and even includes some similar objects. For instance, it would appear to be quite difficult to distinguish between objects 6 and 7.

As was previously mentioned, the

Final discussion and conclusions

This paper presents a qualitative and quantitative study of the performance of a set of representative 2D shape recognition strategies when they are used as the pillar of 3D recognition solutions. In order to implement different recognition approaches, we have combined several of the most important 2D shape descriptors together with a set of deterministic and stochastic similarity measurements. Up to 42 combinations have been considered. The entire method set has been denominated as the RMS

Acknowledgments

This research was supported by the Spanish Government research Programme via Projects DPI 2009-14024-C02-01 and DPI2009-09956(MCyT), by the Junta de Comunidades de Castilla-La Mancha via project PCI-08-0135, and the European Social Fund.

References (48)

C.C. Chen
Improved moment invariants for shape discrimination
Pattern Recognition
(1993)
J.B. Cole et al.
A lie group theoretic approach to the invariance problem in feature extraction and object recognition
Pattern Recognition Lett.
(1991)
J. Flusser
On the independence of rotation moment invariants
Pattern Recognition
(2000)
J. Fu et al.
Shape differentiation of freeform surfaces using a similarity measure based on an integral of Gaussian
Computer-Aided Des.
(2008)
E. González et al.
Active object recognition based on Fourier Descriptors clustering
Pattern Recognition Lett.
(2008)
E. Rivlin et al.
Deformation invariants in object recognition
Comput Vision Image Understand
(1997)
R.B. Yadav et al.
Retrieval and classification of shape-based objects using Fourier, generic Fourier, and wavelet-Fourier Descriptors technique: A comparative study
Opt. Lasers Eng.
(2007)
E.M. Arkin et al.
An efficiently computable metric for comparing polygonal shapes
IEEE Trans. Pattern Anal. Machine Intell.
(1991)
Y. Avrithis et al.
Affine-invariant curve normalization for object shape representation, classification, and retrieval
Machine Vision Appl.
(2001)
S. Belongie et al.
Shape Context: A new descriptor for shape matching and object recognition
NIPS
(2000)

C.J.C. Burges

A tutorial on support vector machines for pattern recognition

Data Min. Knowl. Discov.

(1999)

B. Bustos et al.

Feature-based similarity search in 3D object databases

ACM Comput. Surv.

(2005)

D.Y. Chen et al.

On visual similarity based 3D model retrieval

Comput. Graph. Forum

(2003)

F.S. Cohen et al.

Part II: 3-D object recognition and shape estimation from image contours using B-splines, shape invariant matching, and neural network

IEEE Trans. Pattern Anal. Machine Intell.

(1994)

3D Synthetic Library (3DSL)....

J. Flusser

Moment invariants in image analysis

Proc. World Acad. Sci. Eng. Technol.

(2006)

Garcia, E., 2006. Cosine Similarity and Term Weight...

J.M. Geusebroek et al.

The Amsterdam Library of Object Images

Int. J. Comput. Vision

(2005)

S. Giannarou et al.

Shape signature matching for object identification invariant to image transformations and occlusion

Lect. Notes Comput. Sci.

(2007)

M. Hagedoorn et al.

Reliable and efficient pattern matching using an affine invariant metric

Int. J. Comput. Vision

(1999)

M.K. Hu

Visual pattern recognition by moment invariants

IRE Trans. Inform. Theory

(1962)

Z. Huang et al.

Affine-invariant B-spline moments for curve matching

Comput Vision Pattern Recognition

(1994)

D.P. Huttenlocher et al.

Comparing images using the Hausdorff distance

IEEE Trans. Pattern Anal. Machine Intell.

(1993)

A. Khotanzad et al.

Invariant image recognition by Zernike moments

IEEE Trans. Pattern Anal. Machine Intell.

(1990)

Cited by (9)

Comparison of nominal and real 2D contours of manufactured products using Ant Colony Optimisation of shape landmarks
2021, Procedia Manufacturing
In studying manufacturing processes it is often necessary to compare nominal or reference product shapes that are expected to be ideally achieved after applying a nominal set of process parameters and the actual product shape achieved. The latter is usually obtained by metrological equipment such as profilometers, tomographs and 3D scanners, whilst the former may also result from manufacturing process numerical simulation software. In this work, comparison of 2D contours is studied by adopting the well-established Procrustes method. The method relies on minimizing deviation of landmark points belonging to the shapes under scrutiny. Landmarks that need to be compared pairwise should be chosen in an optimal way, especially when the contours contain different numbers of points, when they differ locally etc. Optimal determination of landmarks and as a result implementation of contour comparison is performed through ant colony optimization implemented on Matlab^TM platform. Indicative results are shown where deviations discovered signify shrinkage of a die-cast clay part.
A new geometric descriptor for symbols with affine deformations
2014, Pattern Recognition Letters
Citation Excerpt :
Recent studies on symbols (Li and Tan, 2010) show that the SIFT descriptor has low discrimination while its shape counterparts perform better. For the shapes composed of simple line/point structures, the descriptors, that characterize the outer contour of a shape, are able to provide robustness to the transformations such as translation, rotation and scale (González et al., 2012). We roughly categorize these descriptors into global and local ones.
Great attention has been devoted to the development of shape descriptors that is the key to object recognition. Previous works have great success on either relatively simple symbols or limited transformations, e.g., translation, rotation and scaling. We propose a new affine invariant, named characteristic ratio (CHAR) that includes more points for complex symbols with rich inner structures. Moreover, we build a novel shape descriptor with CHAR values calculated on collinear points that cover the convex hull of a shape. Dynamic Time Warping algorithm is employed to compare the similarity of spectrum. The performance of the proposed descriptor is validated by the experiments compared with the classical SIFT descriptor, Shape Context (SC), recently developed Cross Ratio Spectrum (CRS) and Circular Blurred Shape Model (CBSM) on three kinds of symbols, i.e., alphanumeric characters, television networks logos and traffic signs with a wide range of transformations (2016 images in total). The results indicate a high recognition rate to severe affine deformations, and a good discriminating ability to similar symbols. We also perform the experiments on the GREC database corrupted by noise with different degrees, showing the robustness of our descriptor to noise.
Heuristic framework to develop active object recognition
2012, RIAI - Revista Iberoamericana de Automatica e Informatica Industrial
Este trabajo presenta un framework para el desarrollo de sistemas activos de reconocimiento de objetos de forma libre. El framework propuesto aborda el problema de incertidumbre presente en los sistemas de reconocimiento de objetos basados en visión monocular mediante un modelo heurístico que permite usar cualquier tipo de vector de características para representar la información de las vistas. De esta manera, se pueden emplear vectores de características que estimen la pose del objeto con mayor precisión que en los tradicionales sistemas estocásticos. La estrategia empleada para el desarrollo del sistema de reconocimiento activo propuesto se basa en agrupar las vistas de los objetos de la base de datos en clusters y, a partir del estudio de la información contenida en ellos, desarrollar de manera eficiente las tareas de clasificación, selección de las posiciones del sensor y el cálculo de la evidencia. El algoritmo de clasificación emplea una máquina de soporte vectorial (SVM) dotando al sistema de reconocimiento de robustez ante pequeñas deformaciones en la apariencia de los objetos por ruido, cambios de iluminación, variaciones en el punto de vista etc. Para la estimación de las posiciones del sensor se utiliza una D-Sphere con el objetivo de reducir la incertidumbre empleando el menor número de movimientos. Además, cada cluster es modelado como una D-Sphere lo que permite de manera off-line evaluar las diferencias de apariencia, entre objetos ambiguos, según el punto de vista desde el que se les observe . Este método ha sido experimentado en un entorno real con un robot manipulador dotado de una webcam en su efector final.
This paper presents a framework for the development of active systems for object recognition. The proposed framework addresses the problem of uncertainty in object recognition based on monocular vision using a heuristic model that allow the implementation of the shape recognition stage by means of feature vectors without stochactic properties (PCA). The strategy employed to develop the proposed active recognition system is based on grouping the views of the objects in the database and clusters. From the study of the information contained in the cluster are efficiently developed the classification task, selection of sensor positions and calculation of the evidence. The classification algorithm uses a support vector machine (SVM) model to provide robustness to small deformations in the appearance of objects by noise, lighting changes, variations in the point of view. The sensor planing stage, which aims to reduce uncertainty using a minimum number of sensor movements, is based on the D-Sphere model. Each cluster is represented by a D-Sphere which allows, in an off-line process, to evaluate the uncertainty between objects hypothesis. This method has been tested in a real environment with a robot equipped with a webcam on the end-effector.
3D object recognition and classification: a systematic literature review
2019, Pattern Analysis and Applications
The mapping-adaptive convolution: A fundamental theory for homography or perspective invariant matching methods
2017, SIAM Journal on Imaging Sciences
3D model retrieval based on W-systems and volume descriptors invariance of Fourier transform
2014, Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics

View all citing articles on Scopus

View full text

2D shape representation and similarity measurement for 3D recognition problems: An experimental analysis

Abstract

Highlights

Introduction

Section snippets

2D shape representation methods

Statement of the experimental tests

Platform setup

Platform setup

Final discussion and conclusions

Acknowledgments

Pattern Recognition

Pattern Recognition Lett.

Pattern Recognition

Computer-Aided Des.

Pattern Recognition Lett.

Comput Vision Image Understand

Opt. Lasers Eng.

An efficiently computable metric for comparing polygonal shapes

IEEE Trans. Pattern Anal. Machine Intell.

Affine-invariant curve normalization for object shape representation, classification, and retrieval

Machine Vision Appl.

Shape Context: A new descriptor for shape matching and object recognition

NIPS

A tutorial on support vector machines for pattern recognition

Data Min. Knowl. Discov.

Feature-based similarity search in 3D object databases

ACM Comput. Surv.

On visual similarity based 3D model retrieval

Comput. Graph. Forum

Part II: 3-D object recognition and shape estimation from image contours using B-splines, shape invariant matching, and neural network

IEEE Trans. Pattern Anal. Machine Intell.

Moment invariants in image analysis

Proc. World Acad. Sci. Eng. Technol.

The Amsterdam Library of Object Images

Int. J. Comput. Vision

Shape signature matching for object identification invariant to image transformations and occlusion

Lect. Notes Comput. Sci.

Reliable and efficient pattern matching using an affine invariant metric

Int. J. Comput. Vision

Visual pattern recognition by moment invariants

IRE Trans. Inform. Theory

Affine-invariant B-spline moments for curve matching

Comput Vision Pattern Recognition

Comparing images using the Hausdorff distance

IEEE Trans. Pattern Anal. Machine Intell.

Invariant image recognition by Zernike moments

IEEE Trans. Pattern Anal. Machine Intell.