Elsevier

Neurocomputing

Volume 174, Part B, 22 January 2016, Pages 988-998
Neurocomputing

An efficient and effective convolutional auto-encoder extreme learning machine network for 3d feature learning

https://doi.org/10.1016/j.neucom.2015.10.035Get rights and content

Abstract

3D shape features play a crucial role in graphics applications, such as 3D shape matching, recognition, and retrieval. Various 3D shape descriptors have been developed over the last two decades; however, existing descriptors are handcrafted features that are labor-intensively designed and cannot extract discriminative information for a large set of data. In this paper, we propose a rapid 3D feature learning method, namely, a convolutional auto-encoder extreme learning machine (CAE-ELM) that combines the advantages of the convolutional neuron network, auto-encoder, and extreme learning machine (ELM). This method performs better and faster than other methods. In addition, we define a novel architecture based on CAE-ELM. The architecture accepts two types of 3D shape representation, namely, voxel data and signed distance field data (SDF), as inputs to extract the global and local features of 3D shapes. Voxel data describe structural information, whereas SDF data contain details on 3D shapes. Moreover, the proposed CAE-ELM can be used in practical graphics applications, such as 3D shape completion. Experiments show that the features extracted by CAE-ELM are superior to existing hand-crafted features and other deep learning methods or ELM models. Moreover, the classification accuracy of the proposed architecture is superior to that of other methods on ModelNet10 (91.4%) and ModelNet40 (84.35%). The training process also runs faster than existing deep learning methods by approximately two orders of magnitude.

Introduction

3D shape feature extraction is a vital issue covered in the high-level understanding of 3D shapes. Extensive efforts have been exerted to solve this important problem with the aid of recent advances on deep learning techniques. Existing feature extraction approaches based on deep learning can be broadly categorized as semi-automatic and fully-automatic methods.

In semi-automatic methods such as [1], [2], researchers first extract several popular hand-crafted features from input 3D shapes and then utilize deep learning methods to combine these features further. This category of methods relies strongly on the adopted human-designed features. The extraction of these features consumes much time; hence, these methods cannot handle large-scale 3D datasets.

Numerous fully automatic deep learning methods have been proposed recently, such as convolutional deep belief network (CDBN) [3], auto-encoder (AE) [4], deep Boltzmann machines [5], convolutional neuron network (CNN) [6], and stacked local convolutional AE [7] approaches. These techniques are utilized to learn 3D features given the feature learning capability of these methods. In addition, these methods were first proposed for 2D image classification tasks.

3D shapes with reasonable resolutions have the same dimensions as high-resolution images. Thus, training deep networks on large-scale 3D datasets is time consuming. Furthermore, mastering this category of feature learning methods consumes time because of the black-box property of the deep learning method. Most of these deep learning methods convert 3D shapes into 2D representations for input [7], [8], [9]; thus, much of the 3D geometry information of 3D shapes is lost. Several works [3], [10] attempt to apply 3D cubes, such as the volumetric representations of 3D shapes, as inputs. However, the training processes of these works are time consuming because of the additional dimension of input data. Therefore, the input resolution of these methods is limited.

To overcome the shortcomings of the existing methods, we propose a novel 3D shape feature extraction method called convolutional AE extreme learning machine (CAE-ELM) in this paper. This approach combines the advantages of CNN, AE, and extreme learning machine (ELM). AE is a typical unsupervised learning algorithm that can extract good features without supervised labels. However, the AE network is fully connected; thus, additional parameters must be learned. CNN restricts the connections between the hidden layer and the input layer through locally connected networks. Nevertheless, this network is an extensive computational method that is used with 3D shape datasets because of its convolutional operation. To reduce computational complexity, ELM [11] is often considered for its high efficiency and effectiveness.

Additionally, different input representations exert varied effects. For example, voxel data describe the structural information of 3D shapes because these data are expressed only as 0 and 1, which indicate that the voxel is outside and inside the mesh surface, respectively. Signed distance field (SDF) data are represented as a grid sampling of the minimum distance to the surface of an object that is represented as a polygonal model. The convention of applying negative and positive values within and outside the object, respectively, is frequently applied; thus, additional 3D shape details can be derived. To extract the global and local features collectively, we define a novel architecture that accepts both voxel and SDF as inputs. By combining these two types of data, our architecture can classify 3D shapes effectively.

The proposed CAE-ELM can also be used in practical graphics applications, such as in 3D shape completion. Optical acquisition devices often generate incomplete 3D shape data because of occlusion and unfavorable surface reflectance properties. These incomplete 3D shapes are challenging to repair; to fix incomplete data, we compare the features of broken and complete shapes before the CAE-ELM classifier as well as obtain the broken locations and values. Although the completion results are imperfect, CAE-ELM serves as a new approach to solve this problem.

The contributions of our approach are summarized as follows:

  • (1)

    CAE-ELM: We propose a new ELM-based designed network that performs well and learns quickly. To the best of our knowledge, our proposed model is the first to combine the advantages of CNN, AE, and ELM to learn the features of 3D shapes. This method has been used in practical graphics applications. We provide the source code1 so that researchers can master it in a short time.

  • (2)

    Increased classification accuracy: The classification accuracy of the designed architecture is higher than that of other methods [10], [12], [13], [9] on ModelNet10 (91.41%) and ModelNet40 (84.35%).

  • (3)

    3D shape completion: CAE-ELM can repair a broken 3D shape by using the features before the classifier.

  • (4)

    Rapid 3D shape feature extraction: Our method runs faster than existing deep learning methods by approximately two orders of magnitude, thus facilitating large-scale 3D shape analysis.

The experiment results show that the features learned by CAE-ELM significantly outperform hand-crafted features and other deep learning methods in terms of 3D shape classification. CAE-ELM can also repair the broken locations of 3D shapes with learned features for 3D shape completion. Furthermore, our method is efficient and easy-to-implement; thus, it is practical for real 3D applications.

Section snippets

3D shape descriptors

3D shape descriptors play a crucial role in graphics applications such as 3D shape matching, recognition, and retrieval [14], [15], [16], [17].

A variety of 3D shape descriptors have been developed during the last two decades [18], [13], [19], [15]. Existing 3D descriptors are hand-crafted features which are labor-intensively designed and are unable to extract discriminative information from the data. Instead, we learn shape features from 3D shapes using automatically feature learning method.

3D feature learning via deep learning

Convolutional auto-encoder ELM for 3D feature learning

In this section, the model (CAE-ELM) for extracting features from 3D shapes is formulated and described in detail.

Experiments

In this section, we demonstrate the performance of CAE-ELM and explore its applicability. First, the classification accuracy and training time of this method are determined with 3D shape datasets. Subsequently, the performance on 2D images is described. Finally, we apply the features extracted by CAE-ELM to repair broken 3D shapes. We implemented this method in MATLAB 2014b, which runs on a computer with an Intel(R) Xeon E5-2650 2.0 GHz CPU and 64 GB RAM.

Conclusion

In this paper, we propose a new method called CAE-ELM that can extract features from 3D shapes. In contrast to existing 3D shape feature learning methods, our method combines the advantages of convolution, pooling, and AE processes; moreover, this technique uses both voxel and SDF data as inputs to improve performance. In the future, we will examine this approach further in three directions. First, the CAE-ELM in this work is a single-layer network. Multi-layer models can extract considerably

Acknowledgements

The authors would like to thank the anonymous reviewers for their helpful comments. This work was supported by the National Natural Science Foundation of China (No. 61125201, 61402499, 61379103 and U1435219).

Yueqing Wang was born in 1988. He received his B.S. degree in Computer Science and Technology in Tsinghua University, in 2010, and received his M.S. degree in Computer Science and Technology at National University of Defense Technology in 2012, and now he is a Ph.D. candidate at National University of Defense Technology. His research interests include high performance computer architecture, parallel computing, and machine learning.

References (40)

  • Z. Zhu, X. Wang, S. Bai, C. Yao, X. Bai, Deep learning representation using autoencoder for 3d shape retrieval, CoRR...
  • Y. LeCun, Y. Bengio, Convolutional networks for images, speech, and time series, The handbook of brain theory and...
  • Z. Zhu, X. Wang, S. Bai, C. Yao, X. Bai, Deep learning representation using autoencoder for 3d shape retrieval, arXiv...
  • Z. Xie, K. Xu, W. Shan, L. Liu, Y. Xiong, H. Huang, Projective feature learning for 3d shapes with multi-view depth...
  • Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, J. Xiao, 3d shapenets: a deep representation for volumetric...
  • D.-Y. Chen, X.-P. Tian, Y.-T. Shen, M. Ouhyoung, On visual similarity based 3d model retrieval, in: Computer graphics...
  • M. Kazhdan, T. Funkhouser, S. Rusinkiewicz, Rotation invariant spherical harmonic representation of 3d shape...
  • T. Funkhouser et al.

    A search engine for 3d models

    ACM Trans. Graph. (TOG)

    (2003)
  • L. Shapira et al.

    Contextual part analogies in 3D objects

    Int. J. Comput. Vis.

    (2010)
  • P. Heider, A. Pierre-Pierre, R. Li, C. Grimm, Local shape descriptors, a survey and evaluation, in: Proceedings of...
  • Cited by (91)

    • A 3D-CAE-CNN model for Deep Representation Learning of 3D images

      2022, Engineering Applications of Artificial Intelligence
    • Graph-based relational reasoning in a latent space for skeleton-based action recognition

      2022, Journal of Visual Communication and Image Representation
    • Prediction of coalbed methane production based on deep learning

      2021, Energy
      Citation Excerpt :

      In addition, the row and column sizes of matrix-type data in CBM well data vary with sample size, so there is a multiscale problem. Spatial pyramid pooling can solve the problem of undersampling for a multiscale matrix [38]. Therefore, in this paper, based on a convolutional autoencoder, spatial pyramid pooling is proposed to ensure dimensional consistency of eigenvectors.

    • Transcoding across 3D shape representations for unsupervised learning of 3D shape feature

      2020, Pattern Recognition Letters
      Citation Excerpt :

      Data-driven approaches thus have been proposed to obtain expressive 3D shape feature. Sharma et al. [5], Brock et al. [26], and Wang et al. [45] proposed volumetric autoencoders that accept 3D shapes represented as voxels. Wu et al. [27] devised a GAN for voxels called 3D-GAN.

    • An analysis on the use of autoencoders for representation learning: Fundamentals, learning task case studies, explainability and challenges

      2020, Neurocomputing
      Citation Excerpt :

      Extracting features from three-dimensional shapes usually has a high computational cost but it is fundamental for tasks such as 3D object retrieval and matching. There are several AE-based models for automatic feature extraction that can help model this type of data [93–95]. These range from simple stacked AEs to combinations of convolutional AEs and extreme learning machines.

    View all citing articles on Scopus

    Yueqing Wang was born in 1988. He received his B.S. degree in Computer Science and Technology in Tsinghua University, in 2010, and received his M.S. degree in Computer Science and Technology at National University of Defense Technology in 2012, and now he is a Ph.D. candidate at National University of Defense Technology. His research interests include high performance computer architecture, parallel computing, and machine learning.

    Zhige Xie was born in 1984. He received his M.S. degree in Computer Science and Technology at National University of Defense Technology in 2010, and now he is a Ph.D. candidate at National University of Defense Technology. His research interests include computer graphics and machine learning.

    Kai Xu was born in 1982. He is an Assistant Professor at School of Computer Science, National University of Defense Technology (NUDT). His research interests are in the areas of computer graphics, especially in geometry processing. His current topics of interests include shape analysis, high-level geometry processing, and data-driven shape modeling.

    Yong Dou was born in 1966, professor, Ph.D. supervisor, senior membership of China Computer Federation (E200009248). He received his B.S., M.S., and Ph.D. degrees in Computer Science and Technology at National University of Defense Technology in 1995. His research interests include high performance computer architecture, high performance embedded microprocessor, reconfigurable computing, and bioinformatics. He is a member of the IEEE and the ACM.

    Yuanwu Lei was born in 1982. He received his B.S. degree in Computer Science and Technology in North China Electric Power University, Baoding, China, in 2005, and received his M.S. degree in Computer Science and Technology at National University of Defense Technology in 2007, and now he is a Ph.D. candidate at National University of Defense Technology. His research interests include high performance computer architecture, high-precision computation, parallel computing, and reconfigurable computing.

    View full text