FVCNN: Fusion View Convolutional Neural Networks for Non-rigid 3D Shape Classification and Retrieval

Zhou, Yan; Zeng, Fanzhi; Qian, Jiechang; Xiang, Yang; Feng, Zhijian

doi:10.1007/978-3-030-34120-6_46

FVCNN: Fusion View Convolutional Neural Networks for Non-rigid 3D Shape Classification and Retrieval

Yan Zhou¹⁴,
Fanzhi Zeng¹⁴,
Jiechang Qian¹⁴,
Yang Xiang¹⁴ &
…
Zhijian Feng¹⁴

Conference paper
First Online: 28 November 2019

1975 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11901))

Abstract

Most 3D shape classification and retrieval algorithms were based on rigid 3D shapes, deploying these algorithms directly to non-rigid 3D shapes may lead to poor performance due to complexity and changeability of non-rigid 3D shapes. To address this challenge, we propose a fusion view convolutional neural networks (FVCNN) framework to extract the deep fusion features for non-rigid 3D shape classification and retrieval. We first propose a projection module to transform the non-rigid 3D shape into a 2D view plane. We then propose a feature coding module to extract the new scale invariance heat kernel signature (NS) feature and structural relationship (SR) feature of the 3D shape, which are used as the pixel values on the projection points of the corresponding vertices to generate two views, respectively. Finally, we propose a fusion module based on CNNs to extract the view-based features, which are fused to extract the deep fusion features as the 3D shape descriptors. The experiments on standard dataset SHREC show that our method outperforms the state-of-the-art methods on non-rigid 3D shape classification and retrieval.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

With the development of computer graphics, 3D shapes play an important role in many domains with a wide range of applications. Accurate 3D shape classification and retrieval are necessary. In recent years, with remarkable advances in deep learning, various network structures have been proposed for 3D shape classification and retrieval, such as 3D ShapeNets [35], PointNet [3], RotationNet [12]. Quite significantly, view-based methods have the best performance so far. Using deep learning schemes to extract view descriptor typically refers to exploiting well-established models, such as VGGNet [28], GoogLeNet [32], and ResNet [9]. Although deep learning methods for 2D views have been well investigated [19, 36], the structural relationship and the local details of the 3D shapes are still unexplored. If these methods are directly deployed to non-rigid 3D shapes classification and retrieval, it may lead to poor performance because the non-rigid 3D shapes are more complex and changeable. Comparing with the rigid 3D shapes, such as desk, chair and bed, the structure of non-rigid 3D shapes is more complex, such as ant, cat and human. In addition, the non-rigid 3D shapes have varieties of postural changes, which leads to different shapes that can be very similar in one posture. Therefore, the classification and retrieval of non-rigid 3D shapes are more difficult and challenging.

To tackle this issue, we propose here a FVCNN framework. It contains three function modules: projection module, feature coding module and descriptor generation module, as shown in Fig. 1. First, the projection module follows the principles and laws of the human visual system, constructing an efficient coordinate system to project the vertices on the observable region of the 3D shape onto a 2D plane. Then a feature coding module is used to extract the NS feature and SR feature as the pixel of the 2D plane, so that two kinds of views are generated. Since the views contain the local details and structural relationship features of the 3D shapes, the views can comprehensively describe the 3D shapes if the content features are explored efficiently. Finally, we use an efficient CNNs fusion module to extract the features of the two views, which are fused to further extract the deep fusion features as the 3D shape descriptors. We evaluate the proposed method on SHREC and the experiment results show that it outperforms state-of-the-art methods.

Our main contributions of this paper are as follows:

We propose a projection and feature encoding module to generate the NS and SR views, which contain the local details and structural relationships of the 3D shape, allowing the content features to be explored efficiently such that the views can comprehensively describe the 3D shapes.
We develop a CNNs fusion module to extract the features from the views, then fuse the features and extract the deep fusion features as the 3D shapes descriptors. Through deep fusion features, the expression ability of the shape descriptor can be improved, and the limitations of single feature are overcome.

2 Related Works

In earlier research on 3D shape classification and retrieval, the features of 3D shapes can be divided into five categories: statistic features [21, 30], view features [4, 24], topological features [2, 16], function transformation features [5, 17] and fusion features [27, 40]. The key issues with these features are the their weaknesses in descriptive ability and meanwhile expensive in computation time to calculate. It has been generally realized that obtaining efficient features is the key to the classification and retrieval of 3D shapes. Recently, deep learning has been applied in many fields and achieved satisfying results. The deep feature in a 3D shape is a comprehensive one integrating the characteristics of all aspects of the object [8, 15, 18, 22, 33, 37, 39, 41]. Introducing deep learning into 3D shape classification and retrieval has been a hot research topic recently. There are two categories of approaches for the CNN-based 3D shape classification: one is voxel-based and the other 2D image-based.

(1)
voxel-based approaches

Charles et al. [3] introduced a hierarchical neural network called PointNet++ which applies PointNet recursively on a nested partition in the input point set. By exploiting the metric space distances, the network can learn local features by increasing contextual scales. The experimental results show that PointNet++ can learn deep point set features efficiently and robustly. Luciano et al. [41] brought forth a deep learning framework for efficient 3D shape classification by geodesic moments. It uses a two-layer stacked sparse autoencoder to learn deep features from geodesic moments by training the hidden layers individually in an unsupervised fashion followed by a softmax classifier. Ren et al. [23] developed a new definition about 2D multilayer dense representation (MDR) for 3D volumetric data to extract concise informative shape description. As a result, a novel adversarial network is designed to train a set of CNN, recurrent neural network (RNN) and an adversarial discriminator. The method improved the efficiency and effectiveness of 3D volumetric data processing.
(2)
2D image-based approaches

Bai et al. [1] presented a real-time 3D shape retrieving engine GIFT based on the projection image of 3D shapes, which combines GPU acceleration and Inverted File (Twice). As a result, this method achieved ultra-high time efficiency where every retrieval task can be finished within one second. Sinha et al. [29] adopted an approach by converting the 3D shape into a geometry image so that standard CNNs can be used to learn 3D shapes directly. By projecting and cutting the spherical parameterized shape, the original 3D shape is transformed into a flat and regular geometric image. Based on the geometric image, the shape descriptor is extracted by CNNs. Shi et al. [26] introduced a rotation-invariant deep representation for 3D shape classification and retrieval known as DeepPano that verifies the rotation invariance of the representation. A variant of CNN is specifically designed to learn the deep representations directly from such views. Different from a typical CNN, a row-wise max-pooling layer is inserted between the convolution and fully-connected layers, making the learned representation invariant to the rotation around a principle axis. Su et al. [31] proposed a new CNN architecture that combines information groups to provide better recognition performance from multiple views of 3D graphics to single compact shape descriptor. The same structure can also be used to identify the hand-drawn sketches of human bodies accurately.

3 Methodology

3.1 Overview

In human visual system, people recognize an object by observing its local details and structural relationships. Motivated by this observation, we propose a novel method to generate views. The views are formed by projecting the vertices in the visual area of the 3D shape onto a 2D plane where the object features labeled with pixel values, as shown in Fig. 2. We extract the features from the views, and then fuse them to extract the deep fusion features as the descriptors of a 3D shape through the CNNs fusion module. The architecture of the method contains the following main steps as illustrated in Fig. 1.

Step. 1: Projection module for non-rigid 3D shape

In order to ensure the consistency of the extracted features, the 3D shape is preprocessed by the method described in [34] that can eliminate the influence caused by rotation and translation. The 3D shape is eventually surrounded by a sphere. A projection module similar to a visual imaging system is established, and the partial vertices of the 3D shape are then projected onto the 2D plane.

Step. 2: Feature coding module for the views

In this step, we propose a feature coding module. The features of the 3D shape such as NS and SR are coded as the pixel values on the view plane from which the views are generated.

Step. 3: CNNs fusion module for non-rigid 3D shape

A feature extractor combining view-pooling and CNNs is developed to extract the 3D shape descriptors. The module can be iteratively updated by training until the number of iterations reaches a given threshold or the performance of the module convergence.

We will describe the design and analysis details for each key part of the model in the following sections.

3.2 Projection Module for Non-rigid 3D Shape

We shall first define a sphere with a radius r which surrounds the 3D shape, and establish a coordinate system, as shown in Fig. 3. The center of the sphere is defined as the coordinate origin O(0,0,0) which also is the centroid of the 3D shape. The Z axis is the long spindle of the coordinates. The view plane can then be set up as perpendicular to the Z axis with the size of $h \times h$ and its center at O’(0,0,d). At the viewpoint V, we can observe the 3D shape through the view plane where the partial vertices are projected onto. In order to achieve this, the coordinate of the viewpoint can be determined by $V(0,0,\alpha )$ where $\alpha = dr/(r-h/2)$ according to the theory of similar triangles. Using the projection function $E_{pro}:R^{3} \rightarrow R^{3}$ as shown in Eq. 1, we can calculate the coordinates of point p’ on the view plane by $p'=F_{pro}(p)$ where the point $p(p_{x},p_{y},p_{z})$ is the vertex on the 3D shape and the point $p'(p'_{x},p'_{y},p'_{z})$ is the projection point.

(1)

3.3 Feature Coding Module for the Views

Discrete Grid Division of the View Plane.

We set $S_{h}$ and $S_{v}$ as the horizontal and vertical step length such that $n_{h}=h/S_{h}$ and $n_{v}=h/S_{v}$ are the division number of the view plane. According to the step-length, the 2D plane can then be divided into the areas $A_{ij},i=1,2,\ldots ,n_{h},j=1,2,\ldots ,n_{v}$, according to

$$\begin{aligned} A_{ij}=\left\{ (x,y,z)\left| \begin{aligned} (i-1)S_{h}\le x< iS_{h}\\ (j-1)S_{v}\le x < jS_{v}\\ z=d \quad \quad \quad \quad \end{aligned} \right. \right\} \end{aligned}$$

(2)

The center point $(x^{*}_{i},y^{*}_{j},d)$ of the area $A_{ij}$ can be calculated as

$$\begin{aligned} \left\{ \begin{aligned} x^{*}_{i}=(i-1)S_{h}+ \frac{1}{2} S_{h}\\ y^{*}_{j}=(j-1)S_{v}+ \frac{1}{2} S_{v} \end{aligned} \right. \end{aligned}$$

(3)

In order to simplify the expression, a local filter function can be defined as

(4)

The pixel value of each center point of $A_{ij}$ can finally be calculated accordingly

$$\begin{aligned}&F_{NS}(x^{*}_{i},y^{*}_{j})= \mathop {avg}\limits _{p}(H_{NS}(p)I_{A_{ij}}(p)) \end{aligned}$$

(5)

$$\begin{aligned}&F_{SR}(x^{*}_{i},y^{*}_{j})= \max _{p}(H_{SR}(p)I_{A_{ij}}(p)) \end{aligned}$$

(6)

We obtain two categories of view: NS view such as $(x^{*}_{i},y^{*}_{j},F_{NS}(x^{*}_{i},y^{*}_{j}))$ and SR view such as $(x^{*}_{i},y^{*}_{j},F_{SR}(x^{*}_{i},y^{*}_{j}))$. $H_{NS}(p)$ and $H_{SR}(p)$ are the feature coding functions as described in the following section.

Feature Coding for the Views. We design two coding functions to code the NS and SR features as the pixel value of the non-rigid 3D shape. The views developed this way that contain not only the local shape features of the object observed, but also the positional relationship between the features. The pixel values reflect geometric features such as structural relationship, local details and topological structure of the 3D shape. So the features based on views are aligned into the comprehensive descriptors of the 3D shape.

(1)
NS features as the pixel value for the views

The NS features [38] of a 3D shape can be used to describe the local structure and details. A view with its pixel values constructed with NS features is named NS view here. By optimizing the heat kernel signature (HKS) features, we can obtain the NS features on the vertices of the 3D shape according to

$$\begin{aligned} H_{NS}(p)=F[\frac{d}{d\tau }logK_{\beta ^{\tau }}(p)] \quad and \quad K_{\beta ^{\tau }}(p)=\sum _{i\le 0} e^{-\varLambda _{i}\beta _{\tau }}\varPhi ^{2}_{i}(x) \end{aligned}$$

(7)

where $\varLambda _{i}$ and $\varPhi _{i}$ are the eigenvalues and eigenfunctions of the discrete Laplace-Beltrami operator, $\beta $ is a constant, and $\tau \in [lb(t_{min}), lb(t_{max})]$ in which $t_{min}$ and $t_{max}$ are the critical time values beyond which NS features of the 3D shape no longer change.

As shown in Fig. 4, the same colors of vertices indicate that their NS features are similar. The NS features have isometric invariance and robustness under small perturbation such as small topological change or noise.

As shown in Fig. 5(a) and (b) or (c) and (d) are the same 3D models with different scales, but their NS features are similar. Although the scale is changed, the NS features are still robust. For different types of 3D shapes, such as (a) and (c) or (b) and (d), their NS features are distinctly different.

(2)
SR features as the pixel value for the views

The minimum circumferential sphere enclosing the 3D shape is adopted as shown in Fig. 6. According to Eq. 8, we can obtain the pixel value $H_{SR}$ to describe the global structural features on the vertex p of the 3D shape.

$$\begin{aligned} H_{SR}(p)=(cos\theta + cos\varphi )dis(Op) \end{aligned}$$

(8)

where $(\theta ,\varphi ,r)$ is the spherical coordinates of the vertex p of the 3D shape, dis(Op) is the distance between point O and point p.

3.4 CNNs Fusion Module for Non-rigid 3D Shape

In this section, we develop two CNNs: the convolutional neural networks based on traditional networks (CNNs-T) and the convolutional neural networks based on ResNet (CNNs-R), as shown in Figs. 7 and 8, respectively. For each CNN, the input data has two categories of views obtained in the previously defined modules.

Motivated by [10], we define a composite function of three consecutive operations for each block of each CNN: batch normalization (BN) [11], followed by a rectified linear unit (ReLU) [6] and a $3 \times 3$ convolution (Conv). We train two kinds of views in $CNN_{1}$ to extract the features of the corresponding views, and then fuse the features at the view-pooling layer and input them to $CNN_{2}$ for further extraction.

CNNs-T. The training process of CNNs-T is illustrated in Fig. 7. Traditional convolutional feed-forward networks use the output of the $l_{th}$ layer as the input to the $(l+1)_{th}$ layer [13].

CNNs-R. The training process of CNNs-R is illustrated in Fig. 8. In ResNet [9], a skip-connection is added, bypassing the non-linear transformation by an identity function. The output of the $l_{th}$ layer is used as the input to the $(l+1)_{th}$ layer and $(l+2)_{th}$ layer. The advantage of ResNet is that the gradient can flow directly from the later layers to the earlier layers by the identity function.

View-Pooling Layer. View-pooling layers are closely related to max-pooling layers, and the only difference is that the pooling operations are carried out in three dimensions.

Implementation Details. There are five blocks for the $CNN_{1}$, and each block has the same number of layers. For a $3\times 3$ convolutional layer, each side of the input is zero-padded by one pixel for the purpose of fixing the feature-map size. At the end of the $CNN_{1}$, a view-pooling is performed and $CNN_{2}$ is attached, so it forms three parts. At the end of the $CNN_{2}$, two fully-connected layers and one softmax classifier are used. In addition, the numbers of the feature-map for each block are 32, 32, 64, 64, 64, 128, 128 and 256.

In our experiments, we use the above two network structures to extract the descriptors of the 3D shapes and implement the classification and retrieval of 3D shapes.

4 Experiment Result and Analysis

All algorithms proposed in this work are implemented and tested using Matlab2017b on a PC with the following specifications, CPU: Intel(R) Core(TM) i9-7960X 2.80 GHz, GPU: NVIDIA GeForce GTX1080TI, RAM: 16 GB DDR4, OS: Windows10 SP1 of 64 bits.

4.1 Dataset

We evaluate our method on the SHREC [14] database of watertight meshes. SHREC contains 600 3D shapes from 30 categories, among them 480 and 120 3D shapes are used for training and testing, respectively. We randomly select 30 shapes from 30 categories, are shown in Fig. 9.

4.2 The NS and SR Features of the 3D Shapes

Based on the feature coding module in Sect. 3.3, we extract the NS features and SR features of 30 selected 3D shapes, respectively. Using the color represents the NS features and SR features of the 3D shapes, as shown in Fig. 10(a) and (b). These color blocks reflect the local details and structural relations of the 3D shapes, and the similarity between the NS features and SR features from different categories is low.

4.3 The Examples of NS and SR Views for Non-rigid 3D Shapes

Based on the projection and feature coding module described in Sect. 3, we extracted the views of the 3D shapes from 30 categories to analyze their expression capabilities, as shown in Fig. 11(a) and (b), the NS views and SR views correspond to the 3D shapes in Fig. 9. We can find that the views reflect well the local details and structural relations of the 3D shapes and the similarities between the views of different categories are low. Through these two views, the 3D shapes can be described efficiently and the similarities and differences between shapes can be distinguished.

4.4 Non-rigid 3D Shape Classification and Retrieval Efficiency Analysis

The Comparison of CNNs-T and CNNs-R. The results in Fig. 12(a) show that CNNs-R utilize parameters more efficiently, consistently outperforming CNNs-T in reducing top1-errors when they both have the same parameters. Moreover, CNNs-R also explores the view features more effectively as it delivers better performance in accuracy using the same visual view (e.g., 83.76% vs 66.35%, 89.29% vs 75.68%, 97.44% vs 78.41%), as shown in Fig. 12(b).

The Retrieval Results of CNNs-T and CNNs-R. In the experiment, 3D man and ant shapes are chosen as the query example shapes. We compared the 3D shape retrieval performance of CNNs-R and CNNs-T, and the results are shown in Fig. 13. In Fig. 13(a) and (b), we can see that all shapes in retrieval results are relevant. Although the 3D ant shape can be complex and many forms of deformation exist, CNNs-R delivered good retrieval performance. In contrast, the retrieval results obtained from CNNs-T have one irrelevant retrieval in the 3D man shape and two irrelevant ones in the 3D ant shape, as shown in Fig. 13(c) and (d). These results verify the superior performance of the proposed CNNs-R approach.

Table 1. Comparison results among different algorithms on the SHREC dataset.

Full size table

Comparative Analysis with the State of the Art Methods. We now compare our methods with state-of-the-art approaches, including Zer [20], LFD [4], SN [35], Conf [7], Sph [25], Geometry Image [29]. The results of non-rigid 3D shape classification and retrieval are summarized in Table 1 and in Fig. 14. We can see that both CNNs-T and CNNs-R have better performance. CNNs-T delivered classification accuracy and MAP retrieval reaching 82.7% and 76%, respectively, which is 4% higher than Geometry Image [29] in MAP retrieval. Moreover, CNNs-R has the best performance of them all with the classification accuracy of 97.4% and MAP retrieval of 81% which are 0.8% and 9% higher than Geometry Image [29], respectively.

5 Conclusion

In this paper, we bring forward a FVCNN framework for classifying and retrieving non-rigid 3D shapes. Firstly, we propose a projection module to transform the non-rigid 3D shape onto a 2D view plane and a feature coding module to extract the NS features and SR features of the 3D shape. And then the NS views and SR views are generated by using the NS features and SR features as the pixel values, respectively, which are able to express the 3D shapes efficiently. Finally, we propose a CNNs fusion module to extract the view-based features and fuse them to extract the deep fusion features as the 3D shape descriptors. The method in this paper use neural network architecture and outperformed a more traditional non-learning based approach, these is still much space for improvement.

In the future we wish to build upon these insights for generative models of 3D shape with encoded views instead of traditional images. An future direction is to consider integrating the discriminative power of view-based approaches and the robustness approaches reasoning more locally with geometry.

References

Bai, S., Bai, X., Zhou, Z., Zhang, Z., Latecki, L.J.: Gift: a real-time and scalable 3D shape search engine. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5023–5032, June 2016. https://doi.org/10.1109/CVPR.2016.543
Barra, V., Biasotti, S.: 3D shape retrieval using kernels on extended Reeb graphs. Pattern Recogn. 46(11), 2985–2999 (2013)
Article Google Scholar
Charles, R.Q., Su, H., Kaichun, M., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 77–85, July 2017. https://doi.org/10.1109/CVPR.2017.16
Chen, D., Tian, X., Shen, Y., Ouhyoung, M.: On visual similarity based 3D model retrieval. Comput. Graph. Forum 22(3), 223–232 (2010)
Article Google Scholar
Daras, P., Zarpalas, D., Tzovaras, D., Strintzis, M.G.: Efficient 3-D model search and retrieval using generalized 3-D radon transforms. IEEE Trans. Multimedia 8(1), 101–114 (2006)
Article Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
Google Scholar
Gu, X., Wang, Y., Chan, T.F., Thompson, P.M.: Genus zero surface conformal mapping and its application to brain surface mapping. IEEE Trans. Med. Imaging 23(8), 949–958 (2003)
Article Google Scholar
Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: VITON: an image-based virtual try-on network. CoRR abs/1711.08447 (2017). arxiv:1711.08447
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016. https://doi.org/10.1109/CVPR.2016.90
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift, pp. 448–456 (2015)
Google Scholar
Kanezaki, A.: Rotationnet: Learning object classification using unsupervised viewpoint estimation. CoRR abs/1603.06208 (2016). arxiv:1603.06208
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Laga, H., Schreck, T., Ferreira, A., Godil, A.: Shrec’11 track: shape retrieval on non-rigid 3D, pp. 79–88 (2012). Editors, I.P., Meshes, W., Lian, Z., Godil, A., Bustos, B., Daoudi, M
Google Scholar
Leng, B., Liu, Y., Yu, K., Zhang, X., Xiong, Z.: 3D object understanding with 3D convolutional neural networks. Inf. Sci. 366, 188–201 (2016)
Article MathSciNet Google Scholar
Li, C., Hamza, A.B.: Symmetry Discovery and Retrieval of Nonrigid 3D Shapes Using Geodesic Skeleton Paths. Kluwer Academic Publishers, Dordrecht (2014)
Book Google Scholar
Lian, Z., et at.: A comparison of methods for non-rigid 3D shape retrieval. Pattern Recogn. 46(1), 449–461 (2013)
Article Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2999–3007 (2017)
Google Scholar
Liu, B., Jing, L., Li, J., Yu, J., Gittens, A., Mahoney, M.W.: Group collaborative representation for image set classification. Int. J. Comput. Vis. 4, 1–26 (2018)
Google Scholar
Novotni, M., Klein, R.: Shape retrieval using 3D Zernike descriptors. Comput.-Aided Des. 36(11), 1047–1062 (2004)
Article Google Scholar
Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Trans. Graph. 21(4), 807–832 (2002)
Article MathSciNet Google Scholar
Patricia, M., Daniela, S.: Multi-objective optimization for modular granular neural networks applied to pattern recognition. Inf. Sci. 460–461, 594–610 (2018)
Google Scholar
Ren, M., Niu, L., Fang, Y.: 3D-a-nets: 3D deep dense descriptor for volumetric shapes with adversarial networks. CoRR abs/1711.10108 (2017). arxiv:1711.10108
Sang, M.Y., Kuijper, A.: View-based 3D model retrieval using compressive sensing based classification. In: International Symposium on Image and Signal Processing and Analysis, pp. 437–442 (2011)
Google Scholar
Shen, L., Makedon, F.: Spherical mapping for processing of 3D closed surfaces. Image Vis. Comput. 24(7), 743–761 (2006)
Article Google Scholar
Shi, B., Bai, S., Zhou, Z., Bai, X.: Deeppano: deep panoramic representation for 3-D shape recognition. IEEE Sig. Process. Lett. 22(12), 2339–2343 (2015)
Article Google Scholar
Shih, J.L., Chen, H.Y.: A 3D model retrieval approach using the interior and exterior 3D shape information. Multimedia Tools Appl. 43(1), 45–62 (2009)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sinha, A., Bai, J., Ramani, K.: Deep learning 3D shape surfaces using geometry images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 223–240. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_14
Chapter Google Scholar
Sipiran, I., Bustos, B., Schreck, T.: Data-Aware 3D Partitioning for Generic Shape Retrieval. Pergamon Press, Inc., Oxford (2013)
Article Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: IEEE International Conference on Computer Vision, pp. 945–953 (2015)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9, June 2015. https://doi.org/10.1109/CVPR.2015.7298594
Tang, H., Xiao, B., Li, W., Wang, G.: Pixel convolutional neural network for multi-focus image fusion. Inf. Sci. 433–434, 125–141 (2018)
Article MathSciNet Google Scholar
Vranic, D.: 3D model retrieval. University of Leipzig (2004)
Google Scholar
Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920, June 2015. https://doi.org/10.1109/CVPR.2015.7298801
Xu, C., Govindarajan, L.N., Zhang, Y., Cheng, L.: Lie-x: Depth image based articulated object pose estimation, tracking, and action recognition on lie groups. CoRR abs/1609.03773 (2016). arxiv:1609.03773
Yuan, J., Li, W., Zhang, Z., Fleet, D., Shotton, J.: Guest editorial: human activity understanding from 2D and 3D data. Int. J. Comput. Vis. 118(2), 113–114 (2016)
Article MathSciNet Google Scholar
Zeng, F., Qian, J., Zhou, Y., Yuan, C., Wu, C.: Improved three-dimensional model feature of non-rigid based on HKS. In: Qiu, M. (ed.) SmartCom 2017. LNCS, vol. 10699, pp. 427–437. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73830-7_42
Chapter Google Scholar
Zhou, Y., Yuan, C., Zeng, F., Qian, J., Wu, C.: An object detection algorithm for deep learning based on batch normalization. In: Qiu, M. (ed.) SmartCom 2017. LNCS, vol. 10699, pp. 438–448. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73830-7_43
Chapter Google Scholar
Zhou, Y., Zeng, F.: 2D compressive sensing and multi-feature fusion for effective 3D shape retrieval. Inf. Sci. 409–410, 101–120 (2017)
Article Google Scholar
Zhou, Y., Zeng, F., Qian, J., Han, X.: 3D shape classification and retrieval based on polar view. Inf. Sci. 474, 205–220 (2019)
Article MathSciNet Google Scholar

Download references

Acknowledgement

This work is partially supported by the following projects in China: the National Natural Science Foundation of China (no. 61602116), Natural Science Foundation of Guangdong Province (no. 2017A030313388), Engineering Technology Research Center of Foshan City (no. 2017GA00015, 2016GA10156), Engineering Technology Research Center of Guangdong Province (no. G601624), and Special Fund for Science and Technology Innovation of Foshan City (no. 2015AG10008).

Author information

Authors and Affiliations

Department of Computer Science, Foshan University, Foshan, 528000, China
Yan Zhou, Fanzhi Zeng, Jiechang Qian, Yang Xiang & Zhijian Feng

Authors

Yan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Fanzhi Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Jiechang Qian
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhijian Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiechang Qian .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
The Australian National University, Canberra, Australia
Nick Barnes
Peking University, Beijing, China
Baoquan Chen
The Technical University of Munich, Munich, Bayern, Germany
Rüdiger Westermann
Zhejiang University, Hangzhou, China
Xiangwei Kong
Beijing Jiaotong University, Beijing, China
Chunyu Lin

Ethics declarations

We confirm that there are no potential conflicts of interest; also, the work involves no human participants and/or animals. All authors consent to the submission of the paper.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y., Zeng, F., Qian, J., Xiang, Y., Feng, Z. (2019). FVCNN: Fusion View Convolutional Neural Networks for Non-rigid 3D Shape Classification and Retrieval. In: Zhao, Y., Barnes, N., Chen, B., Westermann, R., Kong, X., Lin, C. (eds) Image and Graphics. ICIG 2019. Lecture Notes in Computer Science(), vol 11901. Springer, Cham. https://doi.org/10.1007/978-3-030-34120-6_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-34120-6_46
Published: 28 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34119-0
Online ISBN: 978-3-030-34120-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)