Elsevier

Pattern Recognition

Volume 64, April 2017, Pages 29-38
Pattern Recognition

Keypoints-based surface representation for 3D modeling and 3D object recognition

https://doi.org/10.1016/j.patcog.2016.10.028Get rights and content

Highlights

  • We propose a novel technique, called Keypoint-based Surface Representation (KSR).

  • The proposed technique does not require local features around detected keypoints.

  • KSR exploits geometrical relationships between keypoints for surface representation.

  • KSR is tested on 3 popular datasets for 3D modeling and 3D object recognition.

  • KSR achieves superior 3D modeling and recognition results.

Abstract

The three-dimensional (3D) modeling and recognition of 3D objects have been traditionally performed using local features to represent the underlying 3D surface. Extraction of features requires cropping of several local surface patches around detected keypoints. Although an important step, the extraction and representation of such local patches adds to the computational complexity of the algorithms. This paper proposes a novel Keypoints-based Surface Representation (KSR) technique. The proposed technique has the following two characteristics: (1) It does not rely on the computation of features on a small surface patch cropped around a detected keypoint. Rather, it exploits the geometrical relationship between the detected 3D keypoints for local surface representation. (2) KSR is computationally efficient, requiring only seconds to process 3D models with over 50,000 points with a MATLAB implementation. Experimental results on the UWA and Stanford 3D models dataset suggest that it can accurately perform pairwise and multiview range image registration (3D modeling). KSR was also tested for 3D object recognition with occluded scenes. Recognition results on the UWA dataset show that the proposed technique outperforms existing methods including 3D-Tensor, VD-LSD, keypoint-depth based feature, spherical harmonics and spin image with a recognition rate of 95.9%. The proposed approach also achieves a recognition rate of 93.5% on the challenging Ca'Fascori dataset compared to 92.5% achieved by game-theoretic. The proposed method is computationally efficient compared to state-of-the-art local feature methods.

Introduction

The computation of similarity between three dimensional (3D) surfaces is key to a number of pattern recognition tasks such as 3D modeling and 3D object recognition [1], [2], [3]. The aim of 3D modeling is to measure the similarity between 3D surfaces captured from different viewpoints, align and merge them to construct a complete 3D model of an object [4], [5], [6], [7]. On the other hand, the task of 3D object recognition consists in correctly determining the identity and pose of objects in a scene [8], [9], [10]. Both these tasks find vast applications in fields such as robotics [11], [12], [13], reverse engineering [14], scene understanding [15], [16], medical [17], [18] and biometric systems [19], [20].

For the last two decades, the most popular approach to measure the similarity between surfaces (for 3D modeling and object recognition) exploits a compact representation of the 3D surface, known as 3D features [21], [22]. Local correspondences established by matching the features are used to solve higher level tasks such as automatic 3D modeling and 3D object recognition. Local surface description can address the challenges posed by changes in viewpoint, clutter and occlusion [8], [23]. A variety of 3D surface features have been proposed in the literature. Mian et al. [8] proposed a 3D-Tensor descriptor for 3D object recognition. A local reference frame was first constructed by selecting a pair of vertices that satisfied certain geometric constraints. A 3D-tensor descriptor was then generated by constructing a local 3D grid over the range image, and summing the surface areas intersecting each bin of the 3D grid. Osada et al. [24] proposed shape distribution. In their proposed technique, the shape signature for a 3D model is represented as a probability distribution sampled from a shape function measuring surface properties of the 3D model. Their global representation relies on five shape functions, which include: angle between three random points, a distance between a fixed point and a random point, distance between two random points, the square root of the area of the triangle between three random points and the cube root of the volume of the tetrahedron between four random points on the 3D surface. These five shape functions are then used to define their proposed shape distribution, which follows a complex process of iteration through all the vertices of a 3D mesh. In addition, careful selection of few parameters such as N samples, B Number of bins and V number of vertices is also required for a good representation. Johnson and Hebert [25] used the normal n of a keypoint p as the local reference axis to generate a “spin image” descriptor. They expressed each neighboring point q with two parameters: the radial distance ρ and the signed distance ϱ. They then discretized the ρ-ϱ space into a 2D array accumulator, and counted up the number of points that fell into the bin indexed by (ρ, ϱ). The 2D array was further bilinearly interpolated to construct the spin image. Taati et al. [26] proposed a Variable Dimensional Local Shape Descriptor (VD-LSD). They first performed PCA on the covariance matrix of the neighboring points of each point q on the surface. Therefore, each point q had a local reference frame and three eigenvalues (λ1, λ2, λ3). They then calculated a set of position properties, direction properties and dispersion properties for a point q. The position properties of q included its 3D coordinates expressed in a local reference frame. The direction properties included the Euler angles which were used to register the local reference frame with the global frame. The dispersion properties included the three eigenvalues (λ1, λ2, λ3). They then selected a subset of these properties using a feature selection algorithm. Finally, they accumulated these selected properties of all the neighboring points of a keypoint p into a histogram (i.e., VD-LSD). Tombari et al. [27] proposed a descriptor named Signature of Histograms of OrienTations (SHOT). They first constructed a local reference frame for a keypoint p, and divided the neighborhood space into 3D spherical volumes. They then generated a local histogram for each volume by accumulating the number of points according to the angles between the normal at the keypoint and the normals at the neighboring points. They concatenated all local histograms to form an overall SHOT descriptor. The SHOT descriptor is highly descriptive, computationally efficient and robust to noise. Experimental results showed that SHOT outperformed the spin image and point signature at all levels of noise. Darom et al. proposed Scale Invariant Spin Image (SISI) and Local Depth SIFT (LD-SIFT) descriptors for 3D mesh models [28]. In their proposed technique, the SISI descriptor is constructed by computing the spin image descriptor over local scale, while LD-SIFT is computed by representing the vicinity of the keypoints as a depth map.

Although existing local feature based techniques are accurate and can handle occlusion and clutter, these approaches still suffer from high computational complexity [23], [29]. To overcome these shortcomings, we propose a novel technique, called Keypoint based Surface Representation (KSR), which does not require computation of local features, around each detected keypoint using a local surface patch. Most of the existing feature extraction techniques crop a local surface around a detected keypoint prior to defining a local feature [21], [23]. In contrast, the proposed KSR exploits geometrical relationships between detected keypoints for reliable local surface representation. The proposed KSR can be used with any state-of-the-art keypoint detector which is the major advantage of our approach. he proposed technique is also computational efficient and achieves state-of-the-art 3D modeling and object recognition performance.

The rest of this paper is organized as follows. Section 2 describes the proposed Keypoints based Surface Representation (KSR) technique. The performance evaluation of the proposed technique with state-of-the art keypoint detectors, its robustness to noise and variations in mesh resolution are presented in Section 3. Section 4 presents our pairwise registration and 3D modeling algorithms based on KSR. The qualitative and quantitative evaluation of the 3D modeling accuracy is presented in Section 5. Section 6 presents our automatic 3D object recognition algorithm. Section 7 reports our object recognition results and comparisons with state-of-the-art recognition techniques. This is followed by a computational efficiency analysis of the proposed technique. The paper is concluded in Section 8.

Section snippets

Keypoints based surface representation (KSR)

The proposed methodology for computing the Keypoints based Surface Representation (KSR) is depicted in Fig. 1, while Fig. 2 illustrates the proposed technique. Briefly stated, the KSR method operates as follows. In the first step, 3D keypoints are detected. Next, the geometrical relationships between keypoints are computed by measuring the distances between them. Then, subsets of keypoints are selected based on the minimum distance between them, as illustrated in Fig. 2. KSR is finally computed

Performance evaluation of KSR

The performance of the proposed technique with different keypoint detectors was evaluated on the Bologna dataset [27], for the task of object recognition. The robustness of KSR was also tested with respect to different levels of noise and varying mesh resolutions. In the following, we briefly describe the dataset, keypoint detectors and the evaluation criteria used for the proposed technique.

3D modeling

To evaluate the performance of the proposed technique, we used KSR to perform range image registration under two different scenarios: a pairwise registration (where only two 3D views are used) and a multiview registration, in which, multiple views of the object acquired from different viewpoints and are made available in no particular order.

Experimental setting

To evaluate the accuracy of 3D modeling, we performed a qualitative and a quantitative evaluation of the proposed technique on the UWA [8] and Stanford 3D models dataset [42]. The UWA dataset is a popular and widely used dataset for 3D modeling. It contains 16–22 2.5D views of four different objects. Similarly, the Stanford 3D models dataset is also one of the renowned datasets and has served as a benchmark in the evaluation of 3D-Modeling algorithms [8]. It contains various 3D models of the

3D object recognition

In this section, we describe our fully automatic 3D object recognition algorithm, which is based on our novel KSR technique. Fig. 9 shows the block diagram of our proposed 3D object recognition. Our algorithm goes through two phases, offline training and online recognition. During the offline phase, the keypoints are first detected and KSR between keypoints are then computed for all 3D models and stored in the object dataset. During the online phase, the KSRs are computed for the given scene.

3D object recognition experimental results

To evaluate the performance of the proposed technique, we used KSR to perform 3D object recognition on the UWA [8] and recently proposed Ca'Foscari dataset. UWA dataset contains five 3D models and 50 real scenes. Each scene contains 4 or 5 of the models in the presence of occlusion and clutter.

Ca'Foscari dataset is composed of 20 models and 150 scenes. Each scene contains 3–5 objects in the presence of occlusion and clutter. It is the largest available challenging high resolution 3D object

Conclusion

In this paper, we presented a novel technique which is capable of capturing the geometrical relationship between 3D keypoints. In contrast to existing methods, the proposed technique does not rely on the extraction of local surface features around detected keypoints, thus omitting this computationally expensive step. It is also robust to variations in the noise levels and mesh resolutions as demonstrated by experimental results on the Bologna dataset. The proposed technique has been extensively

Acknowledgments

This research is supported by the University of Western Australia (UWA) and Australian Research Council (ARC) grant DP110102166.

Syed Afaq Ali Shah obtained his PhD from the University of Western Australia in the area of computer vision and machine learning. He is currently working as a research associate in school of computer science and software engineering, the University of Western Australia, Crawley, Australia. His research interests include 3D object recognition, 3D modeling, deep learning and image processing.

References (47)

  • C. Shi et al.

    End-to-end scene text recognition using tree-structured models

    Pattern Recognit.

    (2014)
  • M.R. Ogiela et al.

    Artificial intelligence structural imaging techniques in visual pattern analysis and medical data understanding

    Pattern Recognit.

    (2003)
  • E. Sesa-Nogueras et al.

    Biometric recognition using online uppercase handwritten text

    Pattern Recognit.

    (2012)
  • S.H. Khan et al.

    Secure biometric template generation for multi-factor authentication

    Pattern Recognit.

    (2015)
  • C. Geng et al.

    Face recognition based on the multi-scale local image structures

    Pattern Recognit.

    (2011)
  • C. Li et al.

    Rapid-transform based rotation invariant descriptor for texture classification under non-ideal conditions

    Pattern Recognit.

    (2014)
  • H. Chen et al.

    3d free-form object recognition in range images using local surface patches

    Pattern Recognit. Lett.

    (2007)
  • H. Chen et al.

    3d free-form object recognition in range images using local surface patches

    Pattern Recognit. Lett.

    (2007)
  • B. Taati et al.

    Local shape descriptor selection for object recognition in range data

    Comput. Vis. Image Underst.

    (2011)
  • J. Novatnack, K.Nishino, Computer Vision – ECCV 2008 in: Proceedings of the 10th European Conference on Computer...
  • A.S. Mian et al.

    Three-dimensional model-based object recognition and segmentation in cluttered scenes

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • S.A.A. Shah, M. Bennamoun, F. Boussaid, A. A. El-Sallam, On novel local surface description for automatic object...
  • S.A.A. Shah et al.

    Iterative deep learning for image set based face and object recognition

    Neurocomputing

    (2016)
  • Cited by (47)

    • Measurement methods of 3D shape of large-scale complex surfaces based on computer vision: A review

      2022, Measurement: Journal of the International Measurement Confederation
    • Geometric feature statistics histogram for both real-valued and binary feature representations of 3D local shape

      2022, Image and Vision Computing
      Citation Excerpt :

      Local surface description for point clouds is an active research area in 3D computer vision with numerous applications, including 3D object recognition [1,2], 3D modeling [3], 3D scene registration [4], and 3D-SLAM [5].

    • Object recognition based on convex hull alignment

      2020, Pattern Recognition
      Citation Excerpt :

      A method which can be used to disambiguate the signs of the axes obtained from eigenvectors is proposed in [20]. This method is also applied in the object recognition approach proposed in [21], where the descriptor is built from a set of keypoints and each keypoint is assigned a LRF. Another approach to defining LRFs using point distribution in the sphere neighborhood of a keypoint is proposed in [22].

    • Learning high-level features by fusing multi-view representation of MLS point clouds for 3D object recognition in road environments

      2019, ISPRS Journal of Photogrammetry and Remote Sensing
      Citation Excerpt :

      The Mobile Laser Scanning (MLS) system is increasingly chosen for autonomous vehicles (AVs) or intelligent transportation systems (ITS) (Broggi et al., 2013; Schreiber et al., 2013; Seo et al., 2015), because it collects dense and accurate 3D point clouds efficiently from large areas. As a result, in recent years, 3D point clouds have been widely used in various related applications, such as 3D object detection in roadways (Guan et al., 2014; Yang et al., 2015; Yu et al., 2016; Wen et al., 2016; Yang et al., 2017; Yu et al., 2017), object modeling and 3D reconstruction (Shah et al., 2017; Hu et al., 2018), semantic segmentation (Engelmann et al., 2017; Dong et al., 2018), and registration (Yu et al., 2015a; Zai et al., 2017). As a vital part of preprocessing steps, 3D object recognition plays the core role in the above applications.

    View all citing articles on Scopus

    Syed Afaq Ali Shah obtained his PhD from the University of Western Australia in the area of computer vision and machine learning. He is currently working as a research associate in school of computer science and software engineering, the University of Western Australia, Crawley, Australia. His research interests include 3D object recognition, 3D modeling, deep learning and image processing.

    M. Bennamoun received his M.Sc. degree in control theory from Queen's University, Kingston, ON, Canada, and the Ph.D. degree in computer vision from Queensland University of Technology (QUT), Brisbane, Australia. He lectured Robotics at Queen's University and then joined QUT in 1993 as an Associate Lecturer. He is currently a Winthrop Professor and has been the Head of the School of Computer Science and Software Engineering, The University of Western Australia (UWA), Perth, Australia for five years (Feb, 2007–Feb, 2012). He has published over 200 journal and conference publications and secured highly competitive national grants from the Australian Research Council (ARC). His areas of interest include control theory, robotics, obstacle avoidance, object recognition, artificial neural networks, signal/image processing, and computer vision (particularly 3D)

    F. Boussaid received the M.S. and Ph.D. degrees in microelectronics from the National Institute of Applied Science (INSA), Toulouse, France, in 1996 and 1999, respectively. He joined Edith Cowan University, Perth, Australia, as a Postdoctoral Research Fellow, and a member of the Visual Information Processing Research Group in 2000. He joined the University of Western Australia, Crawley, Australia, in 2005, where he is currently a Professor. His current research interests include smart CMOS vision sensors and image processing.

    View full text