Abstract
In order to fully utilize potential feature information of RGB-D images, current popular algorithms mainly use convolutional neural network (CNN) to execute both feature extraction and classification. Such methods could achieve impressive results but usually on the basis of an extremely huge and complex network. What’s more, since the fully connected layers in CNN form a classical neural network classifier, which is trained by gradient descent-based implementations, the generalization ability is limited and sub-optimal. To address these problems, this paper introduce a multi-view CNN-SPMP-RNN-ELM (MCSPMPR-ELM) model for RGB-D object recognition, which combines the power of MCSPMPR and fast training of ELM. It uses the MCSPMPR algorithm to extract discriminative features from raw RGB images and depth images separately. Then the abstracted features are fed to a nonlinear ELM classifier, which leads to better generalization performance with faster learning speed. At last, co-training is employed to learn from the unlabeled data using the two distinct feature sets by semi-supervised learning method. Experimental results on widely used RGB-D object datasets show that our method achieves competitive performance compared with other state-of-the-art algorithms specifically designed for RGB-D data.
Similar content being viewed by others
References
Lecun, Y., Kavukcuoglu, K., Farabet, C.: Convolutional networks and applications in vision. IEEE Int. Symp. Circuits Syst. 14(5), 253–256 (2010)
Socher, R., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: International Conference on Neural Information Processing System (NIPS), pp. 665–673 (2012)
Bo, L., Ren, X., Fox, D.: Hierarchical matching pursuit for image classification: architecture and fast algorithms. In: International Conference on Neural Information Processing System (NIPS), pp. 2115–2123 (2011)
Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. In: International Symposium on Experimental Robotics (ISER), pp. 1–15 (2012)
Blum, M., Springenberg, J.T., Wulfing, J., Riedmiller, M.: A learned feature descriptor for object recognition in RGB-D data. In: IEEE International Conference on Robotics & Automation, pp. 1298–1303 (2012)
Yu, K., Lin, Y., Lafferty, J.: Learning image representations from the pixel level via hierarchical sparse coding. IEEE Conf. Comput. Vis. Pattern Recognit. 42(7), 1713–1720 (2011)
Wang, J., et al.: Locality-constrained linear coding for image classification. IEEE Conf. Comput. Vis. Pattern Recognit. 119(5), 3360–3367 (2010)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Eleventh Conference on Computational Learning Theory, pp. 92–100 (1998)
Balcan, M.F., Blum, A., Yang, K.: Co-training and expansion: towards bridging theory and practice. Int. Conf. Neural Inf. Process. Syst. 8(1), 89–96 (2004)
Cheng, Y., Zhao, X., Huang, K., Tan, T.: Semi-supervised learning for RGB-D object recognition. Int. Conf. Pattern Recognit, 2377–2382 (2014)
Cheng, Y., Zhao, X., Huang, K., Tan, T.: Semi-supervised learning and feature evluation for RGB-D object recognition. Comput. Vis. Image Underst. 139(C), 149–160 (2015)
Razavian, A.S., Azizpour, H., et al.: CNN Features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 512–519 (2014)
Azizpour, H., Razavian, A.S., et al.: From generic to specific deep representations for visual recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 36–45 (2015)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine:a new learning scheme of feedforward neural networks. IEEE Int. Joint Conf. Neural Netw. 2(2), 985–990 (2004)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Huang, G.-B., et al.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B 42(2), 513 (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Bo, L., Ren, X., Fox, D.: Learning hierarchical sparse features for RGB-(D) object recognition. Int. J. Robot. Res. 33(4), 581–599 (2014)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2(17), 2169–2178 (2006)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(2), 2012 (2012)
Abdel-Hakim, A.E., Farag, A.A.: CSIFT: a sift descriptor with color invariant characteristics. In: IEEE Computer Society Conference on Computer Vision&Pattern Recognition (CVPR), vol. 2, pp. 1978–1983 (2006)
Socher, R., Lin, C.C., Ng, A., Manning, C.: Parsing natural scenes and natural language with recursive neural networks. In: International Conference on International Conference on Machine Learning( ICML), pp. 129–136 (2011)
Huang, G.B.: Learning capability and storage capacity of two hidden-layer feedforward networks. IEEE Trans. Neural Netw. 14(2), 274–281 (2003)
Feng, G., Huang, G.B., Lin, Q., Gay, R.: Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans. Neural Netw. 20(8), 1352–1357 (2009)
Ding, S., Zhang, N., Xu, X., Guo, L., Zhang, J.: Deep extreme learning machine and its application in EEG classification. Math. Probl. Eng. 1(1), 1–11 (2015)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. IEEE Int. Conf. Robot. Automat. 47(10), 1817–1824 (2011)
Acknowledgements
This work is funded by National Natural Science Foundation of China (Grant No. 61402368) and Science and Technology on Transient Impact Laboratory Foundation (Grant No. 61426060103162606007). The authors thank all the anonymous reviewers for their very helpful comments to improve the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yin, Y., Li, H. Multi-view CSPMPR-ELM feature learning and classifying for RGB-D object recognition. Cluster Comput 22 (Suppl 4), 8181–8191 (2019). https://doi.org/10.1007/s10586-018-1695-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-1695-0