Elsevier

Pattern Recognition Letters

Volume 33, Issue 2, 15 January 2012, Pages 199-217
Pattern Recognition Letters

2D shape representation and similarity measurement for 3D recognition problems: An experimental analysis

https://doi.org/10.1016/j.patrec.2011.09.033Get rights and content

Abstract

One of the most usual strategies for tackling the 3D object recognition problem consists of representing the objects by their appearance. 3D recognition can therefore be converted into a 2D shape recognition matter. This paper is focused on carrying out an in depth qualitative and quantitative analysis with regard to the performance of 2D shape recognition methods when they are used to solve 3D object recognition problems. Well known shape descriptors (contour and regions) and 2D similarities measurements (deterministic and stochastic) are thus combined to evaluate a wide range of solutions. In order to quantify the efficiency of each approach we propose three parameters: Hard Recognition Rate (Hr), Weak Recognition Rate (Wr) and Ambiguous Recognition Rate (Ar). These parameters therefore open the evaluation to active recognition methods which deal with uncertainty. Up to 42 combined methods have been tested on two different experimental platforms using public database models. A detailed report of the results and a discussion, including detailed remarks and recommendations, are presented at the end of the paper.

Highlights

► Shape recognition methods applied to 3D object recognition. ► Definition of three new parameters to evaluate shape recognition systems. ► Qualitative and quantitative analysis of a set of shape recognition systems.

Introduction

Three-dimensional object recognition is the process of finding an object in a scene. This task implies determining the object’s identity and/or its pose (position and orientation) with regard to a particular reference frame. For instance, in object manipulation with robots, the pose of the object must be extracted through an accurate estimation of the translation and rotation parameters with regard to the robot coordinate system.

In the field of three-dimensional object recognition using a monocular sensor, two main streams appear: view-based (or appearance-based) approaches and structural (or primitive-based) approaches. Since primitive-based approaches yield a low performance when unexpected changes occur in the scene, view-based methods have become a popular representation scheme owing to their robustness to noise, photometric effects, blurred vision and changing illumination. The main advantage of this approach (view-based methods) is that the image of the query object can be directly compared with a set of stored images in a database which are efficient and robust to variations in the scene. Indeed, the 3D problem has led to a 2D shape recognition question in which multiple views associated with the object from different points of views have to be handled. Each view in the database is thus associated with a particular viewpoint that corresponds with the current camera position (position and orientation). From here on, we shall use the term ‘shape’ to refer to the appearance of the object from a specific viewpoint – in a 2D context and ‘object’ as a general word to describe something in a 3D dimension environment. The 3D object pose estimation will signify geometric transformations between the camera position in the scene and the viewpoint from which the object is viewed in the database, whereas shape pose estimation will concern rotation, translation and scale in a 2D context.

Meanwhile, when a single view is taken to recognize an object, the principal problem is that one 2D image frequently provides insufficient information with which to identify the object and correctly estimate its pose. Uncertainty and ambiguity problems frequently arise in such cases owing to the fact that no depth information is available. Different objects might therefore seem to be quite similar from different viewpoints, which affect the robustness of the 3D recognition system. In active recognition systems this handicap is addressed by moving the camera to different positions and processing several captures of the object until the uncertainty is resolved. Classical active recognition systems are made up of three main stages: the shape recognition algorithm – which concerns shape identification and shape pose estimation in a 2D context, the fusion stage – in which the combination of the hypotheses obtained from each sensor position is carried out, and the next-best-view planning stage – in which the optimal next sensor positions are computed (González et al., 2008). The two last stages are used to improve the active recognition efficiency, thus reducing hypothesis uncertainty.

Since the view-based strategy converts 3D object recognition into a 2D shape recognition problem, an enormous amount of approaches concerning how to represent 2D shapes and how to measure similarities between shapes can be found in literature (Bustos et al., 2005). However, to the best of our knowledge, no comparative study of different 2D shape recognition algorithms adapted to view-based 3D recognition systems has yet been reported. In order to provide a solution to this issue, the goal of this paper is simply to carry out an in depth qualitative and quantitative analysis with regard to the performance of 2D shape recognition methods when they are used to solve 3D object recognition problems.

The paper is structured as follows. In Section 2 we tackle the requirements of 2D shape representation models, and compare different representation, identification and shape pose estimation methods to be implemented in 3D applications. Several unclear questions concerning the performance of shape recognition systems in 3D recognition environments are also discussed. Section 3 presents the statement of the experimental tests developed in Sections 4 Recognition using Platform 1, 5 Recognition tests using Platform 2. Sections 4 Recognition using Platform 1, 5 Recognition tests using Platform 2 are focused on evaluating the recognition performance in systems using two platforms. Finally, in Section 6 we present a discussion of the experimental results along with our conclusions.

Section snippets

2D shape representation methods

2D shape representation is carried out through the use of two principal descriptors: contour descriptors and region descriptors. Models based on contours are more popular than those based on regions. Contour-based methods necessitate the extraction of boundary information which, in some cases, might not be available. Region-based methods are more robust to noise and do not necessarily rely on shape boundary information, but do not, however, extract the features of a shape. The desirable

Statement of the experimental tests

It is not easy to make an experimental comparison between different recognition methods since each one is tested under different conditions and with different databases. Moreover, the amount of details in each technique makes it impossible to reproduce the experiments in exactly the same way.

Platform setup

The ALOI-VIEW collection consists of 1000 objects recorded under various imaging circumstances. More specifically, the viewing angle, illumination angle, and illumination color are systematically varied for each object. In our experiment, we have used a collection of objects which have been imaged from viewing angles spanning a range of up to 5°. Fig. 2 shows an example of an object represented from 72 viewpoints. RMS has been tested on 12 objects (see Fig. 3). Note that the objects selected

Platform setup

The objects belonging to the 3DSL dataset have been built in our lab. To do this, a high accuracy three-dimensional mesh model of each object was obtained in advance by means of a laser scanner sensor. Fig. 11 presents a selection of objects from the 3DSL database. Note that the database is composed of both free and polyhedral shapes and even includes some similar objects. For instance, it would appear to be quite difficult to distinguish between objects 6 and 7.

As was previously mentioned, the

Final discussion and conclusions

This paper presents a qualitative and quantitative study of the performance of a set of representative 2D shape recognition strategies when they are used as the pillar of 3D recognition solutions. In order to implement different recognition approaches, we have combined several of the most important 2D shape descriptors together with a set of deterministic and stochastic similarity measurements. Up to 42 combinations have been considered. The entire method set has been denominated as the RMS

Acknowledgments

This research was supported by the Spanish Government research Programme via Projects DPI 2009-14024-C02-01 and DPI2009-09956(MCyT), by the Junta de Comunidades de Castilla-La Mancha via project PCI-08-0135, and the European Social Fund.

References (48)

  • C.J.C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Min. Knowl. Discov.

    (1999)
  • B. Bustos et al.

    Feature-based similarity search in 3D object databases

    ACM Comput. Surv.

    (2005)
  • D.Y. Chen et al.

    On visual similarity based 3D model retrieval

    Comput. Graph. Forum

    (2003)
  • F.S. Cohen et al.

    Part II: 3-D object recognition and shape estimation from image contours using B-splines, shape invariant matching, and neural network

    IEEE Trans. Pattern Anal. Machine Intell.

    (1994)
  • 3D Synthetic Library (3DSL)....
  • J. Flusser

    Moment invariants in image analysis

    Proc. World Acad. Sci. Eng. Technol.

    (2006)
  • Garcia, E., 2006. Cosine Similarity and Term Weight...
  • J.M. Geusebroek et al.

    The Amsterdam Library of Object Images

    Int. J. Comput. Vision

    (2005)
  • S. Giannarou et al.

    Shape signature matching for object identification invariant to image transformations and occlusion

    Lect. Notes Comput. Sci.

    (2007)
  • M. Hagedoorn et al.

    Reliable and efficient pattern matching using an affine invariant metric

    Int. J. Comput. Vision

    (1999)
  • M.K. Hu

    Visual pattern recognition by moment invariants

    IRE Trans. Inform. Theory

    (1962)
  • Z. Huang et al.

    Affine-invariant B-spline moments for curve matching

    Comput Vision Pattern Recognition

    (1994)
  • D.P. Huttenlocher et al.

    Comparing images using the Hausdorff distance

    IEEE Trans. Pattern Anal. Machine Intell.

    (1993)
  • A. Khotanzad et al.

    Invariant image recognition by Zernike moments

    IEEE Trans. Pattern Anal. Machine Intell.

    (1990)
  • Cited by (9)

    • A new geometric descriptor for symbols with affine deformations

      2014, Pattern Recognition Letters
      Citation Excerpt :

      Recent studies on symbols (Li and Tan, 2010) show that the SIFT descriptor has low discrimination while its shape counterparts perform better. For the shapes composed of simple line/point structures, the descriptors, that characterize the outer contour of a shape, are able to provide robustness to the transformations such as translation, rotation and scale (González et al., 2012). We roughly categorize these descriptors into global and local ones.

    • Heuristic framework to develop active object recognition

      2012, RIAI - Revista Iberoamericana de Automatica e Informatica Industrial
    • 3D model retrieval based on W-systems and volume descriptors invariance of Fourier transform

      2014, Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics
    View all citing articles on Scopus
    View full text