An efficient and effective convolutional auto-encoder extreme learning machine network for 3d feature learning

doi:10.1016/j.neucom.2015.10.035

Neurocomputing

Volume 174, Part B, 22 January 2016, Pages 988-998

https://doi.org/10.1016/j.neucom.2015.10.035 Get rights and content

Abstract

3D shape features play a crucial role in graphics applications, such as 3D shape matching, recognition, and retrieval. Various 3D shape descriptors have been developed over the last two decades; however, existing descriptors are handcrafted features that are labor-intensively designed and cannot extract discriminative information for a large set of data. In this paper, we propose a rapid 3D feature learning method, namely, a convolutional auto-encoder extreme learning machine (CAE-ELM) that combines the advantages of the convolutional neuron network, auto-encoder, and extreme learning machine (ELM). This method performs better and faster than other methods. In addition, we define a novel architecture based on CAE-ELM. The architecture accepts two types of 3D shape representation, namely, voxel data and signed distance field data (SDF), as inputs to extract the global and local features of 3D shapes. Voxel data describe structural information, whereas SDF data contain details on 3D shapes. Moreover, the proposed CAE-ELM can be used in practical graphics applications, such as 3D shape completion. Experiments show that the features extracted by CAE-ELM are superior to existing hand-crafted features and other deep learning methods or ELM models. Moreover, the classification accuracy of the proposed architecture is superior to that of other methods on ModelNet10 (91.4%) and ModelNet40 (84.35%). The training process also runs faster than existing deep learning methods by approximately two orders of magnitude.

Introduction

3D shape feature extraction is a vital issue covered in the high-level understanding of 3D shapes. Extensive efforts have been exerted to solve this important problem with the aid of recent advances on deep learning techniques. Existing feature extraction approaches based on deep learning can be broadly categorized as semi-automatic and fully-automatic methods.

In semi-automatic methods such as [1], [2], researchers first extract several popular hand-crafted features from input 3D shapes and then utilize deep learning methods to combine these features further. This category of methods relies strongly on the adopted human-designed features. The extraction of these features consumes much time; hence, these methods cannot handle large-scale 3D datasets.

Numerous fully automatic deep learning methods have been proposed recently, such as convolutional deep belief network (CDBN) [3], auto-encoder (AE) [4], deep Boltzmann machines [5], convolutional neuron network (CNN) [6], and stacked local convolutional AE [7] approaches. These techniques are utilized to learn 3D features given the feature learning capability of these methods. In addition, these methods were first proposed for 2D image classification tasks.

3D shapes with reasonable resolutions have the same dimensions as high-resolution images. Thus, training deep networks on large-scale 3D datasets is time consuming. Furthermore, mastering this category of feature learning methods consumes time because of the black-box property of the deep learning method. Most of these deep learning methods convert 3D shapes into 2D representations for input [7], [8], [9]; thus, much of the 3D geometry information of 3D shapes is lost. Several works [3], [10] attempt to apply 3D cubes, such as the volumetric representations of 3D shapes, as inputs. However, the training processes of these works are time consuming because of the additional dimension of input data. Therefore, the input resolution of these methods is limited.

To overcome the shortcomings of the existing methods, we propose a novel 3D shape feature extraction method called convolutional AE extreme learning machine (CAE-ELM) in this paper. This approach combines the advantages of CNN, AE, and extreme learning machine (ELM). AE is a typical unsupervised learning algorithm that can extract good features without supervised labels. However, the AE network is fully connected; thus, additional parameters must be learned. CNN restricts the connections between the hidden layer and the input layer through locally connected networks. Nevertheless, this network is an extensive computational method that is used with 3D shape datasets because of its convolutional operation. To reduce computational complexity, ELM [11] is often considered for its high efficiency and effectiveness.

Additionally, different input representations exert varied effects. For example, voxel data describe the structural information of 3D shapes because these data are expressed only as 0 and 1, which indicate that the voxel is outside and inside the mesh surface, respectively. Signed distance field (SDF) data are represented as a grid sampling of the minimum distance to the surface of an object that is represented as a polygonal model. The convention of applying negative and positive values within and outside the object, respectively, is frequently applied; thus, additional 3D shape details can be derived. To extract the global and local features collectively, we define a novel architecture that accepts both voxel and SDF as inputs. By combining these two types of data, our architecture can classify 3D shapes effectively.

The proposed CAE-ELM can also be used in practical graphics applications, such as in 3D shape completion. Optical acquisition devices often generate incomplete 3D shape data because of occlusion and unfavorable surface reflectance properties. These incomplete 3D shapes are challenging to repair; to fix incomplete data, we compare the features of broken and complete shapes before the CAE-ELM classifier as well as obtain the broken locations and values. Although the completion results are imperfect, CAE-ELM serves as a new approach to solve this problem.

The contributions of our approach are summarized as follows:

(1)
CAE-ELM: We propose a new ELM-based designed network that performs well and learns quickly. To the best of our knowledge, our proposed model is the first to combine the advantages of CNN, AE, and ELM to learn the features of 3D shapes. This method has been used in practical graphics applications. We provide the source code¹ so that researchers can master it in a short time.
(2)
Increased classification accuracy: The classification accuracy of the designed architecture is higher than that of other methods [10], [12], [13], [9] on ModelNet10 (91.41%) and ModelNet40 (84.35%).
(3)
3D shape completion: CAE-ELM can repair a broken 3D shape by using the features before the classifier.
(4)
Rapid 3D shape feature extraction: Our method runs faster than existing deep learning methods by approximately two orders of magnitude, thus facilitating large-scale 3D shape analysis.

The experiment results show that the features learned by CAE-ELM significantly outperform hand-crafted features and other deep learning methods in terms of 3D shape classification. CAE-ELM can also repair the broken locations of 3D shapes with learned features for 3D shape completion. Furthermore, our method is efficient and easy-to-implement; thus, it is practical for real 3D applications.

Section snippets

3D shape descriptors

3D shape descriptors play a crucial role in graphics applications such as 3D shape matching, recognition, and retrieval [14], [15], [16], [17].

A variety of 3D shape descriptors have been developed during the last two decades [18], [13], [19], [15]. Existing 3D descriptors are hand-crafted features which are labor-intensively designed and are unable to extract discriminative information from the data. Instead, we learn shape features from 3D shapes using automatically feature learning method.

3D feature learning via deep learning

Convolutional auto-encoder ELM for 3D feature learning

In this section, the model (CAE-ELM) for extracting features from 3D shapes is formulated and described in detail.

Experiments

In this section, we demonstrate the performance of CAE-ELM and explore its applicability. First, the classification accuracy and training time of this method are determined with 3D shape datasets. Subsequently, the performance on 2D images is described. Finally, we apply the features extracted by CAE-ELM to repair broken 3D shapes. We implemented this method in MATLAB 2014b, which runs on a computer with an Intel(R) Xeon E5-2650 2.0 GHz CPU and 64 GB RAM.

Conclusion

In this paper, we propose a new method called CAE-ELM that can extract features from 3D shapes. In contrast to existing 3D shape feature learning methods, our method combines the advantages of convolution, pooling, and AE processes; moreover, this technique uses both voxel and SDF data as inputs to improve performance. In the future, we will examine this approach further in three directions. First, the CAE-ELM in this work is a single-layer network. Multi-layer models can extract considerably

Acknowledgements

The authors would like to thank the anonymous reviewers for their helpful comments. This work was supported by the National Natural Science Foundation of China (No. 61125201, 61402499, 61379103 and U1435219).

Yueqing Wang was born in 1988. He received his B.S. degree in Computer Science and Technology in Tsinghua University, in 2010, and received his M.S. degree in Computer Science and Technology at National University of Defense Technology in 2012, and now he is a Ph.D. candidate at National University of Defense Technology. His research interests include high performance computer architecture, parallel computing, and machine learning.

References (40)

S. Bu et al.
Local deep feature learning framework for 3d shape
Comput. Graph.
(2015)
B. Leng et al.
A 3d model recognition mechanism based on deep Boltzmann machines
Neurocomputing 151, Part 2
(2015)
B. Leng et al.
3d object retrieval with stacked local convolutional autoencoder
Signal Process.
(2015)
G.-B. Huang et al.
Extreme learning machinetheory and applications
Neurocomputing
(2006)
G.-B. Huang et al.
Convex incremental extreme learning machine
Neurocomputing
(2007)
G.-B. Huang et al.
Convex incremental extreme learning machine
Neurocomputing
(2007)
Q. He et al.
Parallel extreme learning machine for regression based on mapreduce
Neurocomputing
(2013)
B. Wang et al.
Parallel online sequential extreme learning machine based on mapreduce
Neurocomputing
(2015)
F.-w. Qin et al.
A deep learning approach to the classification of 3d cad models
J. Zhejiang University SCIENCE C
(2014)
Z. Wu, S. Song, A. Khosla, X. Tang, J. Xiao, 3d shapenets for 2.5d object recognition and next-best-view prediction,...

Z. Zhu, X. Wang, S. Bai, C. Yao, X. Bai, Deep learning representation using autoencoder for 3d shape retrieval, CoRR...

Y. LeCun, Y. Bengio, Convolutional networks for images, speech, and time series, The handbook of brain theory and...

Z. Zhu, X. Wang, S. Bai, C. Yao, X. Bai, Deep learning representation using autoencoder for 3d shape retrieval, arXiv...

Z. Xie, K. Xu, W. Shan, L. Liu, Y. Xiong, H. Huang, Projective feature learning for 3d shapes with multi-view depth...

Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, J. Xiao, 3d shapenets: a deep representation for volumetric...

D.-Y. Chen, X.-P. Tian, Y.-T. Shen, M. Ouhyoung, On visual similarity based 3d model retrieval, in: Computer graphics...

M. Kazhdan, T. Funkhouser, S. Rusinkiewicz, Rotation invariant spherical harmonic representation of 3d shape...

T. Funkhouser et al.

A search engine for 3d models

ACM Trans. Graph. (TOG)

(2003)

L. Shapira et al.

Contextual part analogies in 3D objects

Int. J. Comput. Vis.

(2010)

P. Heider, A. Pierre-Pierre, R. Li, C. Grimm, Local shape descriptors, a survey and evaluation, in: Proceedings of...

Cited by (91)

A 3D-CAE-CNN model for Deep Representation Learning of 3D images
2022, Engineering Applications of Artificial Intelligence
Deep Representation Learning technologies based on supervised Convolutional Neural Networks (CNNs) have attained significant interest mainly due to their superior performance for learning abstract and robust features used in object detection and image classification tasks. However, to efficiently train such models requires a large number of labeled instances especially when these instances are high dimensional such as for 3-Dimensional (3D) Image inputs. Due to this extra dimension the dimensionality of such instances increases drastically. Therefore, the utilization of Unsupervised CNNs topologies such 3D Convolutional AutoEncoders (3D-CAE) have also been proposed. CAEs can learn features (and later used for classification tasks using common machine learning classifiers), without relying on instance labels and thus they are not prone to label limitation. Nevertheless, it is not clear if the features that CAEs learn, are relevant regarding the classification or object detection task since these features are learned via no target output class. For these reasons, in this work we combine 3D-CAE and 3D-CNN to work synergistically together in order to build a hybrid deep representation learning framework model which exploits the advantages of both unsupervised and supervised representation/feature learning approaches, applied on 3D Image inputs. In order to evaluate our strategy, we performed extensive experimental simulations for the DeepFake and Pneumonia detection problems utilizing Video and 3D Scans datasets respectively. Our proposed framework outperformed all the other utilized frameworks, revealing the efficiency of our applied methodology.
Graph-based relational reasoning in a latent space for skeleton-based action recognition
2022, Journal of Visual Communication and Image Representation
Motivated by the powerful capability of deep neural networks in feature learning, a new graph-based neural network is proposed to learn local and global relational information on skeleton sequences represented as spatio-temporal graphs (STGs). The pipeline of our network architecture consists of three main stages. As the first stage, spatial–temporal sub-graphs (sub-STGs) are projected into a latent space in which every point is represented as a linear subspace. The second stage is based on message passing to acquire the localized correlated features of the nodes in the latent space. The third stage relies on graph convolutional networks (GCNs) to reason the long-range spatio-temporal dependencies through a graph representation of the latent space. Finally, the average pooling layer and the softmax classifier are then employed to predict the action categories based on the extracted local and global correlations. We validate our model in terms of action recognition using three challenging datasets: the NTU RGB+D, Kinetics Motion, and SBU Kinect Interaction datasets. The experimental results demonstrate the effectiveness of our approach and show that our proposed model outperforms the state-of-the-art methods.
Prediction of coalbed methane production based on deep learning
2021, Energy
Citation Excerpt :
In addition, the row and column sizes of matrix-type data in CBM well data vary with sample size, so there is a multiscale problem. Spatial pyramid pooling can solve the problem of undersampling for a multiscale matrix [38]. Therefore, in this paper, based on a convolutional autoencoder, spatial pyramid pooling is proposed to ensure dimensional consistency of eigenvectors.
Coalbed methane (CBM) is a clean energy source. The prediction of CBM production is a critical step during CBM exploitation and utilization, especially for geological well selection, engineering decision making, and production management. In past attempts, CBM production prediction methods have been limited to numerical simulation and shallow neural network. Compared with numerical simulation and shallow neural network methods, deep learning has a significant advantage in its ability to process big data with multiple sources and heterogeneity. Therefore, we developed a new method of CBM production prediction based on deep learning theory. The main novelties of this method are as follows. (1) A new feature extraction method for multiscale data sources is proposed by combining convolutional autoencoder and spatial pyramid pooling. (2) The CBM production prediction model based on deep learning is established by combining the affinity propagation (AP) algorithm and the long short-term memory (LSTM) network. Application and verification show that the accuracy of our new method is higher than that of the traditional numerical simulation and shallow neural network methods.
Transcoding across 3D shape representations for unsupervised learning of 3D shape feature
2020, Pattern Recognition Letters
Citation Excerpt :
Data-driven approaches thus have been proposed to obtain expressive 3D shape feature. Sharma et al. [5], Brock et al. [26], and Wang et al. [45] proposed volumetric autoencoders that accept 3D shapes represented as voxels. Wu et al. [27] devised a GAN for voxels called 3D-GAN.
Unsupervised learning of 3D shape feature is a challenging yet important problem for organizing a large collection of 3D shape models that do not have annotations. Recently proposed neural network-based approaches attempt to learn meaningful 3D shape feature by autoencoding a single 3D shape representation such as voxel, 3D point set, or multiview 2D images. However, using single shape representation isn't sufficient in training an effective 3D shape feature extractor, as none of existing shape representation can fully describe geometry of 3D shapes by itself. In this paper, we propose to use transcoding across multiple 3D shape representations as the unsupervised method to obtain expressive 3D shape feature. A neural network called Shape Auto-Transcoder (SAT) learns to extract 3D shape features via cross-prediction of multiple heterogeneous 3D shape representations. Architecture and training objective of SAT are carefully designed to obtain effective feature embedding. Experimental evaluation using 3D model retrieval and 3D model classification scenarios demonstrates high accuracy as well as compactness of the proposed 3D shape feature. The code of SAT is available at https://github.com/takahikof/ShapeAutoTranscoder.
An analysis on the use of autoencoders for representation learning: Fundamentals, learning task case studies, explainability and challenges
2020, Neurocomputing
Citation Excerpt :
Extracting features from three-dimensional shapes usually has a high computational cost but it is fundamental for tasks such as 3D object retrieval and matching. There are several AE-based models for automatic feature extraction that can help model this type of data [93–95]. These range from simple stacked AEs to combinations of convolutional AEs and extreme learning machines.
In many machine learning tasks, learning a good representation of the data can be the key to building a well-performant solution. This is because most learning algorithms operate with the features in order to find models for the data. For instance, classification performance can improve if the data is mapped to a space where classes are easily separated, and regression can be facilitated by finding a manifold of data in the feature space. As a general rule, features are transformed by means of statistical methods such as principal component analysis, or manifold learning techniques such as Isomap or locally linear embedding. From a plethora of representation learning methods, one of the most versatile tools is the autoencoder. In this paper we aim to demonstrate how to influence its learned representations to achieve the desired learning behavior. To this end, we present a series of learning tasks: data embedding for visualization, image denoising, semantic hashing, detection of abnormal behaviors and instance generation. We model them from the representation learning perspective, following the state of the art methodologies in each field. A solution is proposed for each task employing autoencoders as the only learning method. The theoretical developments are put into practice using a selection of datasets for the different problems and implementing each solution, followed by a discussion of the results in each case study and a brief explanation of other six learning applications. We also explore the current challenges and approaches to explainability in the context of autoencoders. All of this helps conclude that, thanks to alterations in their structure as well as their objective function, autoencoders may be the core of a possible solution to many problems which can be modeled as a transformation of the feature space.
Non-iterative and Fast Deep Learning: Multilayer Extreme Learning Machines
2020, Journal of the Franklin Institute
In the past decade, deep learning techniques have powered many aspects of our daily life, and drawn ever-increasing research interests. However, conventional deep learning approaches, such as deep belief network (DBN), restricted Boltzmann machine (RBM), and convolutional neural network (CNN), suffer from time-consuming training process due to fine-tuning of a large number of parameters and the complicated hierarchical structure. Furthermore, the above complication makes it difficult to theoretically analyze and prove the universal approximation of those conventional deep learning approaches. In order to tackle the issues, multilayer extreme learning machines (ML-ELM) were proposed, which accelerate the development of deep learning. Compared with conventional deep learning, ML-ELMs are non-iterative and fast due to the random feature mapping mechanism. In this paper, we perform a thorough review on the development of ML-ELMs, including stacked ELM autoencoder (ELM-AE), residual ELM, and local receptive field based ELM (ELM-LRF), as well as address their applications. In addition, we also discuss the connection between random neural networks and conventional deep learning.

View all citing articles on Scopus

Zhige Xie was born in 1984. He received his M.S. degree in Computer Science and Technology at National University of Defense Technology in 2010, and now he is a Ph.D. candidate at National University of Defense Technology. His research interests include computer graphics and machine learning.

Kai Xu was born in 1982. He is an Assistant Professor at School of Computer Science, National University of Defense Technology (NUDT). His research interests are in the areas of computer graphics, especially in geometry processing. His current topics of interests include shape analysis, high-level geometry processing, and data-driven shape modeling.

Yong Dou was born in 1966, professor, Ph.D. supervisor, senior membership of China Computer Federation (E200009248). He received his B.S., M.S., and Ph.D. degrees in Computer Science and Technology at National University of Defense Technology in 1995. His research interests include high performance computer architecture, high performance embedded microprocessor, reconfigurable computing, and bioinformatics. He is a member of the IEEE and the ACM.

Yuanwu Lei was born in 1982. He received his B.S. degree in Computer Science and Technology in North China Electric Power University, Baoding, China, in 2005, and received his M.S. degree in Computer Science and Technology at National University of Defense Technology in 2007, and now he is a Ph.D. candidate at National University of Defense Technology. His research interests include high performance computer architecture, high-precision computation, parallel computing, and reconfigurable computing.

View full text

An efficient and effective convolutional auto-encoder extreme learning machine network for 3d feature learning

Abstract

Introduction

Section snippets

3D shape descriptors

3D feature learning via deep learning

Convolutional auto-encoder ELM for 3D feature learning

Experiments

Conclusion

Acknowledgements

Comput. Graph.

Neurocomputing 151, Part 2

Signal Process.

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

A deep learning approach to the classification of 3d cad models

J. Zhejiang University SCIENCE C

A search engine for 3d models

ACM Trans. Graph. (TOG)

Contextual part analogies in 3D objects

Int. J. Comput. Vis.