Bag of shape descriptor using unsupervised deep learning for non-rigid shape recognition
Introduction
As high-level visual information, shape feature is easy to be memorized and recognized by the human brain, even if the objects lost color, brightness, and texture [1], [2], [3]. Due to this discriminative and sparse descriptiveness, shape-based object recognition is a fundamental and important task with various applications, such as robot navigation [3], gesture recognition [4], pedestrian detection [5], and object tracking [6]. The related shape descriptors under rigid transformation have been widely studied in the field of computer vision, most of which are the geometric or spectrum-based methods. However, it remains a difficult topic to form a discriminative descriptor under the larger non-rigid shape changes, the noisy condition, and the occlusion [7], [8], [9]. Moreover, the light-weight deep learning on the 2D shape recognition still demands further exploration. It is more challenging to solve structure obstacles between the shape feature and deep learning, such as irregular topology, orientation ambiguity, and rigid or non-rigid transformation [10], [11]. To tackle these issues, we center to learn discriminative shape features for non-rigid shape recognition based on deep learning and BoW (Bag of Words).
Traditionally, shape recognition is usually considered a fundamental classification problem, which consists of three steps, including feature expression, evaluation metrics, and classification optimization. One of the most important and difficult parts is feature expression, which directly affects the recognition efficiency and accuracy. Therefore, our work also investigates this field. In the last decades, many local and global shape descriptors are proposed to extract discriminative features [11], [12], [13], [14], [15], [16], [17] Global shape descriptors encode geometric and spatial attributes of a model into feature space and accomplish further matches. Although some studies have achieved encouraging performance especially in shape retrieval, they hardly solve problems of complex conditions, such as severe occlusion, local larger deformation, and clutter. Hence, they do not perform well for different non-rigid shape classes [18], [19]. On the contrary, local methods achieve patch or point-wise correspondences among fragments by constructing local feature descriptors. Therefore, local descriptors are more effective and robust to incomplete and occluded shapes. Nevertheless, existing local descriptors are directly designed by hand-crafted or fixed geometry such as normal, curvature, and distance. Moreover, these descriptors are constructed on a single region, leading to high sensitivity for different scale deformations, which limit local descriptors discriminability. From these perspectives, it is highly desired to explore feature learning models and multi-scale strategies to learn high feature patterns for shape recognition.
The BoW is originally developed for NLP (natural language processing). In this framework, highly discriminative feature expression can be formed by encoding the context relationship among words. Based on these advantages, many researchers applied BoW to shape recognition [1], [9], [19], [20]. A pioneer work, namely, BoF (Bag of Features) is introduced firstly[2], where the shape is considered as a document and represented by a set of shape words using contour fragments. Though shape words, the obtained dictionary is regarded as the basic primitive for shape representation. Finally, feature coding is used to achieve final shape representation [21], [22], [23]. These methods are relatively stable, insensitive, and robust to small deformations, occlusion, and noises. Therefore, our method is inspired by the success of the BoW framework. However, all existing methods for shape features in BoW are captured using low-level geometry descriptors, such as shape context [11], curvature [18], and skeleton paths [9], [24]. Furthermore, the spatial information among the high-level shape features is discarded in BoW, which plays an essential role to enhance the discriminability of feature representations. Different from existing BoW proposals, we employ unsupervised deep learning to learn the discriminative feature to select correct shape correspondences from the contour fragment through analyzing the similarities and differences both intra-class and inter-class, especially when huge amounts of shapes are trained for public use.
In recent years, deep learning has been attracting more and more research attention in feature representation. In addition, the more intrinsic feature of the training data can be also obtained in an unsupervised way, such as GANs (Generative adversarial networks) and Auto Code [25], [26], [27]. Our research is also inspired by deep learning and applies related ideas to the shape recognition field. Unlike the natural images or 3D-grid, which are distributed on a regular grid with a clear parameterization, deep learning cannot be directly adopted to learn features from the original 2D-shape just as the same way from imageCNN. The significant challenges of introducing in this paper are: (1). The topology mismatch between the irregular shape feature and regular deep learning models; (2). The multi-resolution of contour fragment; (3). The permutation-variant is caused by ambiguous orientation of shape features; (4) The poor performance for rigidly or non-rigidly transformations of the shape. To tackle these issues, we adopt a context structure in terms of multi-views and local reference for shape feature learning. Furthermore, this structure does not introduce larger information loss and retains raw spatial distribution and geometric attributes of the given shape. It has better generalization than the traditional local feature descriptors, such as SIFT [23], HOG [24], and LBP [25]. Note that, regular and ordered context structure also enables the light-weight SSAE (Stack Sparse Auto Encode) to learn directly from contour fragments.
In this paper, we propose a novel shape descriptor framework to extract highly discriminative features from multi-scale context structures. Our key innovation is to force irregular and multi-scale contour fragments to be learned effectively by combining the two frameworks of the traditional BoW and deep learning. To obtain valuable and sufficient feature primitives, we redesign the shape decomposing strategy using Geodesic Distance. The sparse and regularization terms are added in the objective function to decompose all context features into robust and compact elements. In this way, the high-level shape feature is mapped into a new space via LLC (local-constrained linear coding). Moreover, considering the lacking spatial information among high-level shape feature in BoW, we adopt SPM (Spatial Pyramid Matching) to incorporate spatial correlations for a given shape by max-pooling [26], so that the final shape representation encodes not only the multi-scale structures feature but also the dependencies with space relationship. Few papers of deep learning on 2D shape recognition are available. Our paper has the following main contributions.
- •
We design an improved shape decomposing method. The method takes sufficient and complementary contour fragments as the basic primitive for high-level feature expression. Comparing with the traditional decomposing methods, the improved contour fragment method can capture more sufficient and discriminative information.
- •
A novel unsupervised learning framework in terms of context structure and the SSAE is proposed for shape feature expression, which enables to learn the high-level and hierarchical shape feature from contour fragment. It also effectively overcomes structural obstacles between the shape feature and deep learning, such as irregular topology, orientation ambiguity, and rigid or non-rigid transformation of the shape.
- •
Coding discriminability of the high-level shape feature is verified base on the obtained high-level shape dictionary, LLC, and SPM. Where the final shape feature is represented by high correlation patterns and space relation, which captures sparsity of shape words and compact spatial information.
Our work is organized as follows. Section 2 reviews the related work. In Section 3, we will introduce the details of the proposed unsupervised shape feature learning framework. Next, the high-level shape dictionary is learned in Section 4. More details of shape coding and pooling are introduced in Section 5. Experiment parameters setup and result analysis are shown in Section 6, Finally, Section 7 concludes our paper and proposes content for further work.
Section snippets
Related work
Three related works are briefly introduced in this section, including (1) Hand-crafted shape descriptors, (2) BoW for shapes recognition, (3) Deep learning for shapes descriptors.
Overview of the proposed descriptor
An overview of the proposed method is introduced in the following four steps and also illustrated in Fig. 1.
Contour Fragment extraction. First, a set of potential training shapes are decomposed into valuable parts using improved contour fragment, as marked in Fig. 2(b). Different from traditional BoW, a text or a document, is represented as an occurrence frequency histogram of the monosyllabic word, improved contour fragment contains both local and global shape information. Therefore, we take
High-level shape dictionary learning
The learned high-level shape feature h { h k[I, M] is generated from the proposed learning framework at each contour fragment, where k is the number of SSAE output. More specifically, the all extracted h from contour fragments are collected into a high-level feature set H. Such that H { k[I, M], j[I, M], where the M denotes the number of all the high-level shape features set. To learn the high-level shape feature dictionary , are clustered into clusters, where each
Shape encoding and pooling
In the BoW, the contour fragment is encoded by mapping the corresponding high-level feature into a new space-based on its local shape dictionary. In the new space, contour fragments with high-level shape feature have better expression than raw information by an informative shape coding. Inspired by the latest works in [1], [24], we adopt LLC to achieve the encoding, as it has been proved to be effective and robust for object classification. The LLC method is constructed by minimizing the
Result and analysis
In this section, the parameter setup and performance analysis of the proposed method for shape recognition are presented. We first discuss the parameter setup process. By analyzing how these parameters affect the shape recognition performance in the experiment, the parameters tuning procedure are determined. Then, the proposed method is compared with the state-of-the-art shape recognition approaches under a variety of shape datasets, including MPEG-7 dataset, Swedish leaf dataset, Animal
Conclusion
In this paper, a novel bag of shape descriptor based on unsupervised deep learning and BoW is proposed for learning discriminative and compact shape feature. Specifically, the improved contour fragments provide abundant basic primitives for high-level shape representation, result in local to global learning. The low-level shape feature is constructed using context structure for high-level and hierarchical shape feature. This strategy can effectively overcome the obstacles between shape feature
CRediT authorship contribution statement
Linjie Yang: Conceptualization, Methodology, Software, Data curation, Writing - original draft, Investigation, Writing - review & editing. Luping Wang: Conceptualization, Methodology, Software, Data curation, Writing - original draft, Investigation, Writing - review & editing. Yijing Su: Conceptualization, Methodology, Visualization, Investigation, Writing - review & editing. Yin Gao: Conceptualization, Methodology, Visualization, Investigation, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research was supported by the National Science Foundation for Young Scientists of China (Grant No. 61906178), and Science and Technology Program of Quanzhou, China (No. 2019C009R)
References (61)
- et al.
Bag of Shape Features with a learned pooling function for shape recognition
Pattern Recognit. Lett.
(2018) - et al.
Bag of contour fragments for robust shape classification
Pattern Recognit.
(2014) - et al.
Shape recognition by bag of skeleton-associated contour parts
Pattern Recognit. Lett.
(2016) - et al.
Improving bag-of-visual-words image retrieval with predictive clustering trees
Inf. Sci. (Ny)
(2016) - et al.
Deep learning for 3D point clouds: A survey
IEEE Trans. Pattern Anal. Mach. Intell.
(2020) - et al.
Perceptually motivated morphological strategies for shape retrieval
Pattern Recognit.
(2012) - et al.
Shape retrieval using triangle-area representation and dynamic space warping
Pattern Recognit.
(2007) - et al.
Hierarchical projective invariant contexts for shape recognition
Pattern Recognit.
(2016) - et al.
Shape matching and classification using height functions
Pattern Recognit. Lett.
(2012) - et al.
3D shape recognition and retrieval based on multi-modality deep learning
Neurocomputing
(2017)
Neural Bag-of-Features learning
Pattern Recognit.
Recurrent bag-of-features for visual information analysis
Pattern Recognit.
Convexity rule for shape decomposition based on discrete contour evolution
Comput. Vis. Image Underst.
A novel binary shape context for 3D local surface description
ISPRS J. Photogramm. Remote Sens.
Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching
Pattern Recognit.
Defect sizing in thin components
Robust symbolic representation for shape recognition and retrieval
Pattern Recognit.
Shape classification using invariant features and contextual information in the bag-of-words model
Pattern Recognit.
Coordinated navigation of multiple independent disk-shaped robots
IEEE Trans. Robot.
Low-complexity hand gesture recognition system for continuous streams of digits and letters
IEEE Trans. Cybern.
Pedestrian detection inspired by appearance constancy and shape symmetry
An efficient and robust algorithm for shape indexing and retrieval
IEEE Trans. Multimedia
BoSCC: Bag of spatial context correlations for spatially enhanced 3D shape representation
IEEE Trans. Image Process.
Modeling point clouds with self-attention and gumbel subset sampling
Dynamic graph Cnn for learning on point clouds
ACM Trans. Graph.
Shape contexts enable efficient retrieval of similar shapes
Unsupervised 3D local feature learning by circle convolutional restricted Boltzmann machine
IEEE Trans. Image Process.
Unsupervised learning of 3-D local features from raw voxels based on a novel permutation voxelization strategy
IEEE Trans. Cybern.
Shape vocabulary: A robust and efficient shape representation for shape matching
IEEE Trans. Image Process.
Beyond bags of features: spatial pyramid matching for recognizing natural scene categories to cite this version: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories
Cited by (7)
Octagonal lattice-based triangulated shape descriptor engaging second-order derivatives supplementing image retrieval
2024, Journal of Visual Communication and Image RepresentationAn algorithm for extracting similar segments of moving target trajectories based on shape matching
2024, Engineering Applications of Artificial IntelligenceA lightweight deep learning model for classification of synthetic aperture radar images
2023, Ecological InformaticsA local-global shape characterization scheme using quadratic Bezier triangle aiding retrieval
2023, Digital Signal Processing: A Review JournalCitation Excerpt :The mechanism failed to capitulate local features owing to the drastic variations in the matching phase and hence rendered poor retrieval results. Lately, several models engaging Deep Learning (DL) schemes for achieving improved retrieval rates were offered [29–31]. The Stack Sparse Auto Encoder [29] learned high-level and hierarchical shape features by fusing unsupervised DL with Bag of Words (BoW) for shape discrimination.
L-shaped geometry-based pattern descriptor serving shape retrieval
2023, Expert Systems with ApplicationsCitation Excerpt :As the l-shaped descriptor is highly localized and congruent, this resulted in acute shape characterization that yielded good recognition accuracy. The slight improvement over the l-Shaped descriptor witnessed for BoF-USDL (L. Yang et al., 2021) is attributed to the hierarchical merging of BoF features by the DL model. Also, the computational dimensions of the diverse BoF feature highly influence the models’ complexity undermining its usefulness when extended to real-time environment.