Bag of shape descriptor using unsupervised deep learning for non-rigid shape recognition

doi:10.1016/j.image.2021.116297

Signal Processing: Image Communication

Volume 96, August 2021, 116297

https://doi.org/10.1016/j.image.2021.116297 Get rights and content

Highlights

•
Our method is specially designed to learn high-level and hierarchical shape features from multi-scale context structures.
•
An improved decomposing strategy is redesigned to generate valuable contour fragments, results in local to global feature learning.
•
An unsupervised learning framework is also applied to the contour fragment for its feature expression based on the context structure and SSAE (Stack Sparse Auto Encode).

Abstract

Highly discriminative feature expression for non-rigid shape recognition is an important and challenging task, which requires both abstract and robust shape descriptors. However, the majority of existing low-level descriptors are designed via hand-crafted, which are sensitive to local changes and larger deformation. To address this issue, this paper proposes a bag of shape descriptor based on unsupervised deep learning and Bag of Words (BoW) for shape recognition. Different from existing pipelines, our method is specially designed to learn high-level and hierarchical shape features from multi-scale context structures. It effectively overcomes obstacles, such as irregular topology, orientation ambiguity, and rigid or non-rigid transformation in the hierarchical learning of contour fragments. Specifically, by adopting an improved decomposing strategy, the shape can be decomposed to a series of valuable contour fragments, results in local to global feature learning. An unsupervised learning framework is also applied to the contour fragment for its feature expression based on the context structure and SSAE (Stack Sparse Auto Encode). In the process of shape representation, a high-level shape dictionary is learned by K-clustering to achieve discriminative feature coding. In addition, to achieve a compact and simplified shape representation, SPM (Spatial Pyramid Matching) is adopted by max-pooling, which effectively incorporates spatial layout information of the given shape. The experiments demonstrate that the proposed method achieves state-of-the-art performance on several public shape datasets comparing with the latest approaches. Our method also obtains high performance under the noisy and occlusion condition.

Introduction

As high-level visual information, shape feature is easy to be memorized and recognized by the human brain, even if the objects lost color, brightness, and texture [1], [2], [3]. Due to this discriminative and sparse descriptiveness, shape-based object recognition is a fundamental and important task with various applications, such as robot navigation [3], gesture recognition [4], pedestrian detection [5], and object tracking [6]. The related shape descriptors under rigid transformation have been widely studied in the field of computer vision, most of which are the geometric or spectrum-based methods. However, it remains a difficult topic to form a discriminative descriptor under the larger non-rigid shape changes, the noisy condition, and the occlusion [7], [8], [9]. Moreover, the light-weight deep learning on the 2D shape recognition still demands further exploration. It is more challenging to solve structure obstacles between the shape feature and deep learning, such as irregular topology, orientation ambiguity, and rigid or non-rigid transformation [10], [11]. To tackle these issues, we center to learn discriminative shape features for non-rigid shape recognition based on deep learning and BoW (Bag of Words).

Traditionally, shape recognition is usually considered a fundamental classification problem, which consists of three steps, including feature expression, evaluation metrics, and classification optimization. One of the most important and difficult parts is feature expression, which directly affects the recognition efficiency and accuracy. Therefore, our work also investigates this field. In the last decades, many local and global shape descriptors are proposed to extract discriminative features [11], [12], [13], [14], [15], [16], [17] Global shape descriptors encode geometric and spatial attributes of a model into feature space and accomplish further matches. Although some studies have achieved encouraging performance especially in shape retrieval, they hardly solve problems of complex conditions, such as severe occlusion, local larger deformation, and clutter. Hence, they do not perform well for different non-rigid shape classes [18], [19]. On the contrary, local methods achieve patch or point-wise correspondences among fragments by constructing local feature descriptors. Therefore, local descriptors are more effective and robust to incomplete and occluded shapes. Nevertheless, existing local descriptors are directly designed by hand-crafted or fixed geometry such as normal, curvature, and distance. Moreover, these descriptors are constructed on a single region, leading to high sensitivity for different scale deformations, which limit local descriptors discriminability. From these perspectives, it is highly desired to explore feature learning models and multi-scale strategies to learn high feature patterns for shape recognition.

The BoW is originally developed for NLP (natural language processing). In this framework, highly discriminative feature expression can be formed by encoding the context relationship among words. Based on these advantages, many researchers applied BoW to shape recognition [1], [9], [19], [20]. A pioneer work, namely, BoF (Bag of Features) is introduced firstly[2], where the shape is considered as a document and represented by a set of shape words using contour fragments. Though shape words, the obtained dictionary is regarded as the basic primitive for shape representation. Finally, feature coding is used to achieve final shape representation [21], [22], [23]. These methods are relatively stable, insensitive, and robust to small deformations, occlusion, and noises. Therefore, our method is inspired by the success of the BoW framework. However, all existing methods for shape features in BoW are captured using low-level geometry descriptors, such as shape context [11], curvature [18], and skeleton paths [9], [24]. Furthermore, the spatial information among the high-level shape features is discarded in BoW, which plays an essential role to enhance the discriminability of feature representations. Different from existing BoW proposals, we employ unsupervised deep learning to learn the discriminative feature to select correct shape correspondences from the contour fragment through analyzing the similarities and differences both intra-class and inter-class, especially when huge amounts of shapes are trained for public use.

In recent years, deep learning has been attracting more and more research attention in feature representation. In addition, the more intrinsic feature of the training data can be also obtained in an unsupervised way, such as GANs (Generative adversarial networks) and Auto Code [25], [26], [27]. Our research is also inspired by deep learning and applies related ideas to the shape recognition field. Unlike the natural images or 3D-grid, which are distributed on a regular grid with a clear parameterization, deep learning cannot be directly adopted to learn features from the original 2D-shape just as the same way from imageCNN. The significant challenges of introducing in this paper are: (1). The topology mismatch between the irregular shape feature and regular deep learning models; (2). The multi-resolution of contour fragment; (3). The permutation-variant is caused by ambiguous orientation of shape features; (4) The poor performance for rigidly or non-rigidly transformations of the shape. To tackle these issues, we adopt a context structure in terms of multi-views and local reference for shape feature learning. Furthermore, this structure does not introduce larger information loss and retains raw spatial distribution and geometric attributes of the given shape. It has better generalization than the traditional local feature descriptors, such as SIFT [23], HOG [24], and LBP [25]. Note that, regular and ordered context structure also enables the light-weight SSAE (Stack Sparse Auto Encode) to learn directly from contour fragments.

In this paper, we propose a novel shape descriptor framework to extract highly discriminative features from multi-scale context structures. Our key innovation is to force irregular and multi-scale contour fragments to be learned effectively by combining the two frameworks of the traditional BoW and deep learning. To obtain valuable and sufficient feature primitives, we redesign the shape decomposing strategy using Geodesic Distance. The sparse and regularization terms are added in the objective function to decompose all context features into robust and compact elements. In this way, the high-level shape feature is mapped into a new space via LLC (local-constrained linear coding). Moreover, considering the lacking spatial information among high-level shape feature in BoW, we adopt SPM (Spatial Pyramid Matching) to incorporate spatial correlations for a given shape by max-pooling [26], so that the final shape representation encodes not only the multi-scale structures feature but also the dependencies with space relationship. Few papers of deep learning on 2D shape recognition are available. Our paper has the following main contributions.

•
We design an improved shape decomposing method. The method takes sufficient and complementary contour fragments as the basic primitive for high-level feature expression. Comparing with the traditional decomposing methods, the improved contour fragment method can capture more sufficient and discriminative information.
•
A novel unsupervised learning framework in terms of context structure and the SSAE is proposed for shape feature expression, which enables to learn the high-level and hierarchical shape feature from contour fragment. It also effectively overcomes structural obstacles between the shape feature and deep learning, such as irregular topology, orientation ambiguity, and rigid or non-rigid transformation of the shape.
•
Coding discriminability of the high-level shape feature is verified base on the obtained high-level shape dictionary, LLC, and SPM. Where the final shape feature is represented by high correlation patterns and space relation, which captures sparsity of shape words and compact spatial information.

Our work is organized as follows. Section 2 reviews the related work. In Section 3, we will introduce the details of the proposed unsupervised shape feature learning framework. Next, the high-level shape dictionary is learned in Section 4. More details of shape coding and pooling are introduced in Section 5. Experiment parameters setup and result analysis are shown in Section 6, Finally, Section 7 concludes our paper and proposes content for further work.

Section snippets

Related work

Three related works are briefly introduced in this section, including (1) Hand-crafted shape descriptors, (2) BoW for shapes recognition, (3) Deep learning for shapes descriptors.

Overview of the proposed descriptor

An overview of the proposed method is introduced in the following four steps and also illustrated in Fig. 1.

Contour Fragment extraction. First, a set of potential training shapes are decomposed into valuable parts using improved contour fragment, as marked in Fig. 2(b). Different from traditional BoW, a text or a document, is represented as an occurrence frequency histogram of the monosyllabic word, improved contour fragment contains both local and global shape information. Therefore, we take

High-level shape dictionary learning

The learned high-level shape feature h $=$ { h $^{k} |$ k $\in$ [I, M] is generated from the proposed learning framework at each contour fragment, where k is the number of SSAE output. More specifically, the all extracted h $^{k}$ from contour fragments are collected into a high-level feature set H. Such that H $=$ { ${h^{k}}_{j}$ $|$ k $ϵ$ [I, M], j $ϵ$ [I, M $_{V}^{k}$ ], where the M $_{V}^{k}$ denotes the number of all the high-level shape features set. To learn the high-level shape feature dictionary $\emptyset$ , $h_{j}^{k}$ are clustered into $K_{H}$ clusters, where each

Shape encoding and pooling

In the BoW, the contour fragment is encoded by mapping the corresponding high-level feature into a new space-based on its local shape dictionary. In the new space, contour fragments with high-level shape feature have better expression than raw information by an informative shape coding. Inspired by the latest works in [1], [24], we adopt LLC to achieve the encoding, as it has been proved to be effective and robust for object classification. The LLC method is constructed by minimizing the

Result and analysis

In this section, the parameter setup and performance analysis of the proposed method for shape recognition are presented. We first discuss the parameter setup process. By analyzing how these parameters affect the shape recognition performance in the experiment, the parameters tuning procedure are determined. Then, the proposed method is compared with the state-of-the-art shape recognition approaches under a variety of shape datasets, including MPEG-7 dataset, Swedish leaf dataset, Animal

Conclusion

In this paper, a novel bag of shape descriptor based on unsupervised deep learning and BoW is proposed for learning discriminative and compact shape feature. Specifically, the improved contour fragments provide abundant basic primitives for high-level shape representation, result in local to global learning. The low-level shape feature is constructed using context structure for high-level and hierarchical shape feature. This strategy can effectively overcome the obstacles between shape feature

CRediT authorship contribution statement

Linjie Yang: Conceptualization, Methodology, Software, Data curation, Writing - original draft, Investigation, Writing - review & editing. Luping Wang: Conceptualization, Methodology, Software, Data curation, Writing - original draft, Investigation, Writing - review & editing. Yijing Su: Conceptualization, Methodology, Visualization, Investigation, Writing - review & editing. Yin Gao: Conceptualization, Methodology, Visualization, Investigation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was supported by the National Science Foundation for Young Scientists of China (Grant No. 61906178), and Science and Technology Program of Quanzhou, China (No. 2019C009R)

References (61)

ShenW. et al.
Bag of Shape Features with a learned pooling function for shape recognition
Pattern Recognit. Lett.
(2018)
WangX. et al.
Bag of contour fragments for robust shape classification
Pattern Recognit.
(2014)
ShenW. et al.
Shape recognition by bag of skeleton-associated contour parts
Pattern Recognit. Lett.
(2016)
DimitrovskiI. et al.
Improving bag-of-visual-words image retrieval with predictive clustering trees
Inf. Sci. (Ny)
(2016)
GuoY. et al.
Deep learning for 3D point clouds: A survey
IEEE Trans. Pattern Anal. Mach. Intell.
(2020)
HuR.X. et al.
Perceptually motivated morphological strategies for shape retrieval
Pattern Recognit.
(2012)
AlajlanN. et al.
Shape retrieval using triangle-area representation and dynamic space warping
Pattern Recognit.
(2007)
JiaQ. et al.
Hierarchical projective invariant contexts for shape recognition
Pattern Recognit.
(2016)
WangJ. et al.
Shape matching and classification using height functions
Pattern Recognit. Lett.
(2012)
BuS. et al.
3D shape recognition and retrieval based on multi-modality deep learning
Neurocomputing
(2017)

PassalisN. et al.

Neural Bag-of-Features learning

Pattern Recognit.

(2017)

KrestenitisM. et al.

Recurrent bag-of-features for visual information analysis

Pattern Recognit.

(2020)

LateckiL.J. et al.

Convexity rule for shape decomposition based on discrete contour evolution

Comput. Vis. Image Underst.

(1999)

DongZ. et al.

A novel binary shape context for 3D local surface description

ISPRS J. Photogramm. Remote Sens.

(2017)

AttallaE. et al.

Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching

Pattern Recognit.

(2005)

RoyO. et al.

Defect sizing in thin components

DaliriM.R. et al.

Robust symbolic representation for shape recognition and retrieval

Pattern Recognit.

(2008)

RameshB. et al.

Shape classification using invariant features and contextual information in the bag-of-words model

Pattern Recognit.

(2015)

KaragözC.S. et al.

Coordinated navigation of multiple independent disk-shaped robots

IEEE Trans. Robot.

(2014)

PoularakisS. et al.

Low-complexity hand gesture recognition system for continuous streams of digits and letters

IEEE Trans. Cybern.

(2016)

CaoJ. et al.

Pedestrian detection inspired by appearance constancy and shape symmetry

BiswasS. et al.

An efficient and robust algorithm for shape indexing and retrieval

IEEE Trans. Multimedia

(2010)

HanZ.

BoSCC: Bag of spatial context correlations for spatially enhanced 3D shape representation

IEEE Trans. Image Process.

(2017)

YangJ.

Modeling point clouds with self-attention and gumbel subset sampling

WangY. et al.

Dynamic graph Cnn for learning on point clouds

ACM Trans. Graph.

(2019)

MoriG. et al.

Shape contexts enable efficient retrieval of similar shapes

HanZ. et al.

Unsupervised 3D local feature learning by circle convolutional restricted Boltzmann machine

IEEE Trans. Image Process.

(2016)

HanZ. et al.

Unsupervised learning of 3-D local features from raw voxels based on a novel permutation voxelization strategy

IEEE Trans. Cybern.

(2019)

BaiX. et al.

Shape vocabulary: A robust and efficient shape representation for shape matching

IEEE Trans. Image Process.

(2014)

LazebnikS. et al.

Beyond bags of features: spatial pyramid matching for recognizing natural scene categories to cite this version: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories

Cited by (7)

Shape classification using a new shape descriptor and multi-view learning
2024, Displays
Shape classification is considered as a vital task in solving many computer vision problems. Different factors such as affine transformations, scaling, rotations, variation in perspective, noise and occlusion have made the shape classification problem to be a hard problem. This work investigates a new shape descriptor that extracts different features from each boundary pixel. This makes it to be more informative and discriminant in comparison with other descriptors. After feature extraction, the “bag-of-features (BoF)” model is employed to construct the final representation for each image. To enhance the functionality of the BoF model, a novel codebook generation approach is presented. The proposed approach tends to derive a more meaningful visual codebook. Consequently, the produced feature vectors can handle inter- and intra-class variations more effectively. Comprehensive experiments conducted on the various complicated shape datasets show the supremacy of our approach compared to other methods.
Octagonal lattice-based triangulated shape descriptor engaging second-order derivatives supplementing image retrieval
2024, Journal of Visual Communication and Image Representation
Erstwhile shape description schemes lack primarily in establishing trade-offs with accuracy and computational load. Accordingly, a lightweight shape descriptor offering precise definition and compaction of high-frequency features is contributed in this paper using a simple geometrical shape for localization and shape characterization. Initially, the input image is octagonally tessellated and triangularly decomposed into sub-regions whose side-wise differences are evaluated and subjected to second-order differentiation to produce three high-frequency values representing triangle corners. The resultant is processed by the law of sines to yield localized shape features exhibiting congruence and is reiterated on the residual regions, followed by a novel octal encoding scheme encompassing maximal variations in the localized regions. The resulting features are globally fabricated into shape histograms in a non-overlapping manner representing the shape vector. This scheme validated on widely popular benchmark shape datasets demonstrates superior retrieval and recognition accuracies greater than 93% which is lacking in its competitors.
An algorithm for extracting similar segments of moving target trajectories based on shape matching
2024, Engineering Applications of Artificial Intelligence
Trajectory similarity analysis of moving target is the foundation for mining high-value and regular behavioral information such as motion preferences, activity hotspots and frequent paths. Unlike most trajectory similarity analysis methods aimed at discovering correlations of target activities in time, space or spatio-temporal domains, this paper focuses on the shape matching of target trajectories. If some specific shapes frequently appear in historical trajectories, extracting these local shapes would be beneficial for analyzing the target motion templates and behavior modes. Trajectory segments with similar shapes may not have spatio-temporal correlation, and the shapes also have geometric transformation characteristics such as rotation, scaling and translation. Since the existing trajectory similarity analysis methods cannot be directly applied, an algorithm for extracting similar segments based on shape matching is proposed. First, a new shape descriptor based on signed barycenter distance (SBD) is established. It describes a trajectory as a one-dimensional shape feature sequence, which has the advantage of low computational complexity. Then, the distributed nearest neighbor search strategy is used in the particle swarm optimization (PSO) method, which aims to accelerate the retrieval of trajectory segments with similar shapes and improve the matching accuracy. Experiments on MPEG-7, handwritten character and maneuvering target simulation trajectory data sets show that compared with the existing typical shape descriptors, SBD shape descriptor has advantages in accuracy and noise insensitivity, and the improved PSO method can efficiently and accurately obtain the local shape matching results.
A lightweight deep learning model for classification of synthetic aperture radar images
2023, Ecological Informatics
Images acquired by SAR radars are massively being used for various earth observations, such as land and ocean surveillance, oil spill detection, and military and maritime vigilance. Classifying SAR images is challenging due to the noisy and unreadable picture quality of the images. Earlier, the classification of SAR images was time-consuming since it involved manual participation, and automating such tasks has become an area of research. Numerous works have been proposed, focusing on the application of deep learning in SAR image classification. However, most of them are computationally expensive and result in misclassification. With an aim to curtail these issues, we studied the performance of the known deep learning models by implementing each model on SAR image classification. Based on the observed results, we have proposed a new lightweight classification model that is computationally efficient on SAR data. Experiments on the MSTAR benchmark show that the accuracy attained by the proposed model is at par with that of the high-computational models. The proposed model could scale down the parameters by up to $25$ times compared to models such as VersNet while still achieving a classification accuracy of $97 %$ . Our work, therefore, concludes that the use of the single-unit kernel for feature mapping contributes to a reduction in the number of convolutional computations. Additionally, the use of depth-wise convolutions in the proposed model enables superior feature discrimination.
A local-global shape characterization scheme using quadratic Bezier triangle aiding retrieval
2023, Digital Signal Processing: A Review Journal
Citation Excerpt :
The mechanism failed to capitulate local features owing to the drastic variations in the matching phase and hence rendered poor retrieval results. Lately, several models engaging Deep Learning (DL) schemes for achieving improved retrieval rates were offered [29–31]. The Stack Sparse Auto Encoder [29] learned high-level and hierarchical shape features by fusing unsupervised DL with Bag of Words (BoW) for shape discrimination.
Shape characterization plays a highly prominent role in retrieval and relies extremely upon descriptors inbuilt with lightweight operations and compaction qualities. However, realizing such a simple and robust shape descriptor capable of dealing with noise, variations in brightness, and deformations pose a significant challenge. In this regard, a simple and effective shape descriptor using the Quadratic Bezier Triangle (QBT) targeting shape matching and retrieval is presented in this work. The mechanism commences with triangular tessellation of each image followed by determining their side-wise intensity differences that are then mapped to QBT vertices to yield the order-wise control points. The maxima of the resulting points are transformed into the binary equivalent of the given shape that is compacted into octal values. Later, these localized values are globally transformed into shape histograms to yield the QBT-based Feature (QBTF) descriptor representing the given input. QBTFs performance is exhaustively analyzed in terms of Bulls Eye Retrieval (BER) score and classification accuracy using the public datasets namely Kimia-99, MPEG-7 CE-1 part B, PHOS, and Tari-1000. Relative BER investigations witnessed across diverse datasets reveal a consistent and improved score of 93% achieved by this scheme over its peers. Mathematical analysis of invariance, noise resilience, and the complexities involved in realizing QBTF, establish the robustness and suitability of this scheme for real-time shape description.
L-shaped geometry-based pattern descriptor serving shape retrieval
2023, Expert Systems with Applications
Citation Excerpt :
As the l-shaped descriptor is highly localized and congruent, this resulted in acute shape characterization that yielded good recognition accuracy. The slight improvement over the l-Shaped descriptor witnessed for BoF-USDL (L. Yang et al., 2021) is attributed to the hierarchical merging of BoF features by the DL model. Also, the computational dimensions of the diverse BoF feature highly influence the models’ complexity undermining its usefulness when extended to real-time environment.
Feature representation patterns serving shape retrieval have gained considerable attention over recent years. Accordingly, a geometry-based characterization arrangement based on the l-shape pattern is adopted in this paper for shape description. The presented l-shape pattern descriptor bounds shape edges for providing highly localized distinct features supporting characterization. Then a novel feature representation scheme fabricates these shapes into histograms subsequently, employed for matching and retrieval. The metric Bull’s Eye Retrieval (BER) rate is deployed for retrieval analysis on the Kimia-99, MPEG-7 and Tari-1000 datasets that reveal a uniform and remarkable performance higher than 90% over its predecessors. The congruence nature of the l-shaped geometrical arrangement ensures its robustness towards diverse affine transformations and warrants increased performance. The complexity associated with descriptor realization along space and time reveals its lightweight simplicity and efficiency.

View all citing articles on Scopus

View full text

Bag of shape descriptor using unsupervised deep learning for non-rigid shape recognition

Highlights

Abstract

Introduction

Section snippets

Related work

Overview of the proposed descriptor

High-level shape dictionary learning

Shape encoding and pooling

Result and analysis

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Pattern Recognit. Lett.

Pattern Recognit.

Pattern Recognit. Lett.

Inf. Sci. (Ny)

IEEE Trans. Pattern Anal. Mach. Intell.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit. Lett.

Neurocomputing

Pattern Recognit.

Pattern Recognit.

Comput. Vis. Image Underst.

ISPRS J. Photogramm. Remote Sens.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Coordinated navigation of multiple independent disk-shaped robots

IEEE Trans. Robot.

Low-complexity hand gesture recognition system for continuous streams of digits and letters

IEEE Trans. Cybern.

Pedestrian detection inspired by appearance constancy and shape symmetry

An efficient and robust algorithm for shape indexing and retrieval

IEEE Trans. Multimedia

BoSCC: Bag of spatial context correlations for spatially enhanced 3D shape representation

IEEE Trans. Image Process.

Modeling point clouds with self-attention and gumbel subset sampling

Dynamic graph Cnn for learning on point clouds

ACM Trans. Graph.

Shape contexts enable efficient retrieval of similar shapes

Unsupervised 3D local feature learning by circle convolutional restricted Boltzmann machine

IEEE Trans. Image Process.

Unsupervised learning of 3-D local features from raw voxels based on a novel permutation voxelization strategy

IEEE Trans. Cybern.

Shape vocabulary: A robust and efficient shape representation for shape matching

IEEE Trans. Image Process.

Beyond bags of features: spatial pyramid matching for recognizing natural scene categories to cite this version: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories