Abstract
Nowadays RGB-D object recognition has been a challenging and important task in computer vision field. Convolutional Neural Network is a current popular algorithm for feature extraction from RGB and Depth modality separately, which cannot fully exploit some potential and complementary information between different modalities. The conventional training methods designed for CNN involve many gradient-descent searching, and usually face some troubles such as time-consuming convergence, local minima. In order to solve these problems, we propose a Joint Deep Radom Kernel Convolution and ELM (JDRKC-ELM) method for object recognition, which integrating the power of CNN feature extraction and fast training of ELM-AE. Our JDRKC-ELM can learn feature representations from raw RGB-D data directly. In this structure, Radom Kernel Convolutional neural network (RKCNN) is used for lower-level feature extraction from RGB and Depth modality separately. And then, combining these features from different modality by a feature fusion layer and feeding these fusion features to a Double-layer ELM-AE (DLELM-AE) for higher-level features. At last, the final feature representations are sent to a standard ELM for the object classification. We evaluate the quality of the JDRKC-ELM method on the RGB-D Object Dataset. The results show that the proposed method achieves high recognition accuracy and good generalization performance in comparison with deep learning methods and other ELM methods.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Blum M, Springenberg JT, Wuelfing J, Riedmiller M (2012) A learned feature descriptor for object recognition in RGB-D data. IEEE Int Conf Robot Autom ICRA 44(8):1298–1303. https://doi.org/10.1109/icra.2012.6225188
Bo LF, Ren XF, Fox D (2011a) Depth kernel descriptors for object recognition. IEEE RSJ Int Conf Intell Robots Syst IROS 32(14):821–826. https://doi.org/10.1109/CVPR.2011.5995719
Bo LF, Ren XF, Fox D (2011b) Hierarchical matching pursuit for image classification: architecture and fast algorithms. In: Proceedings of the 24th international conference on neural information processing systems, Granada, Spain, pp 2115–2123
Bo LF, Ren XF, Fox D (2012) Unsupervised feature learning for RGB-D based object recognition. Proc Int Symp Exp Robot ISER 88:387–402. https://doi.org/10.1007/978-3-319-00065-7_27
Browatzki B, Fischer J, Graf B, Bulthoff H, Wallraven C (2011) Going into depth: evaluating 2D and 3D cues for object classification on a new, large-scale object dataset. IEEE Int Conf Comput Vis Workshops 28(5):1189–1195. https://doi.org/10.1109/ICCVW.2011.6130385
Castro D, Hickson S, Bettadapura V, Thomaz E, Abowd G et al (2015) Predicting daily activities from egocentric images using deep learning. In: Proceedings of the 2015 ACM international symposium on wearable computers, Osaka, Japan, pp 75–82. https://doi.org/10.1145/2802083.2808398
Chen X, Ji D, Xu LF, Wu CZ, Li XH (2018) Image denoising via deep network based on edge enhancement. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-1036-4
Cheng YH, Zhao X, Huang KQ, Tan TN (2014) Semi-supervised learning for RGB-D object recognition. Int Conf Pattern Recognit. https://doi.org/10.1109/ICPR.2014.412
Cheng YH, Zhao X, Huang KQ, Tan TN (2015) Semi-supervised learning and feature evluation for RGB-D object recognition. Comput Vis Image Underst 139(C):149–160. https://doi.org/10.1016/j.cviu.2015.05.007
Chikhaoui B, Ye B, Mihailidis A (2017) Aggressive and agitated behavior recognition from accelerometer data using non-negative matrix factorization. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-017-0537-x
Cho KH, Raiko T, Ilin A (2013) Gaussian–Bernoulli deep Bolzmann machine. In: Proceedings of the 2013 international joint conference on neural networks (IJCNN). https://doi.org/10.1109/IJCNN.2013.6706831
Ding SF, Zhang N, Xu XZ, Guo LL, Zhang J (2015) Deep extreme learning machine and its application in EEG classification. Math Probl Eng 2015(1):1–12. https://doi.org/10.1155/2015/129021
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labelling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929. https://doi.org/10.1109/TPAMI.2012.231
Feng GR, Huang GB, Lin QP, Gay R (2009) Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans Neural Netw 20(8):1352–1357. https://doi.org/10.1109/TNN.2009.2024147
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2014.81
Hariharan B, Arbeláez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. Proc Eur Conf Comput Vis ECCV 8695:297–312. https://doi.org/10.1007/978-3-319-10584-0_20
Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. Proc Int Joint Conf Neural Netw IJCNN 2(2):985–990. https://doi.org/10.1109/IJCNN.2004.1380068
Huang GB, Zhu QY, Siew CK (2006a) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501. https://doi.org/10.1016/j.neucom.2005.12.126
Huang GB, Zhu QY, Siew CK (2006b) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892. https://doi.org/10.1109/TNN.2006.875977
Huang GB, Zhou HM, Ding XJ, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529. https://doi.org/10.1109/TSMCB.2011.2168604
Huang WH, Hong HK, Song GJ, Xie KQ (2014) Deep process neural network for temporal deep learning. In: Proceedings of the 2014 international joint conference on neural networks (IJCNN), pp 465–472. https://doi.org/10.1109/IJCNN.2014.6889533
Huang GB, Bai Z, Kasun LLC, Chi MV (2015) Local receptive fields based extreme learning machine. Proc IEEE Comput Intell Mag 10(2):18–29. https://doi.org/10.1109/MCI.2015.2405316
Huang R, Feng W, Fan MY, Guo Q, Sun JZ (2017) Learning multi-path CNN for mural deterioration detection. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-017-0656-4
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition, pp 1725–1732. https://doi.org/10.1109/CVPR.2014.223
Kasun LLC, Zhou HM, Huang GB, Vong CM (2013) Representational learning with extreme learning machine for big data. IEEE Intell Syst 28(6):31–34. https://doi.org/10.1109/MIS.2013.140
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Lai K, Bo LF, Ren XF, Fox D (2011) A large-scale hierarchical multiview RGB-D object dataset. IEEE Int Conf Robot Autom ICRA 47(10):1817–1824. https://doi.org/10.1109/ICRA.2011.5980382
Lai K, Bo LF, Ren XF, Fox D (2013) RGB-D object recognition features, algorithms, and a large scale benchmark. Springer, London, pp 167–192. https://doi.org/10.1007/978-1-4471-4640-7_9
Lcun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Liu HP, Li FX, Xu XY, Sun FC (2017) Multi-modal local receptive field extreme learning machine for object recognition.Neurocomputing 277:1696-1701. https://doi.org/10.1016/j.neucom.2017.04.077
Münzner S, Schmidt P, Reiss A, Hanselmann M, Stiefelhagen R et al (2017) CNN-based sensor fusion techniques for multimodal human activity recognition. In: Proceedings of the 2017 ACM international symposium on wearable computers, Maui, Hawaii, pp 158–165. https://doi.org/10.1145/3123021.3123046
Scherer D, Müller A, Behnke S (2010) Evaluation of pooling operations in convolutional architectures for object recognition. Int Conf Artif Neural Netw 6354:92–101. https://doi.org/10.1007/978-3-642-15825-4_10
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Proc Neural Inf Process Syst NIPS 1(4):568–576
Socher R, Huval B, Bhat B, Manning CD, Ng AY (2012) Convolutional-recursive deep learning for 3D object classification. Proc Neural Inf Process Syst NIPS 1:665–673
Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep Boltzmann machines. Proc Neural Inf Process Syst NIPS 15(8):2222–2230
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: 2014 IEEE conference on computer vision and pattern recognition, pp 1653–1660. https://doi.org/10.1109/CVPR.2014.214
Wang XY, Han M (2014) Multivariate time series prediction based on multiple kernel extreme learning machine. In: 2014 international joint conference on neural networks (IJCNN), pp 198–201. https://doi.org/10.1109/IJCNN.2014.6889479
Wang JJ, Yang JC, Yu K, Lv FJ, Huang T, Gong YH (2010) Locality-constrained linear coding for image classification. Comput Vis Pattern Recognit 119(5):3360–3367. https://doi.org/10.1109/cvpr.2010.5540018
Wang A, Lu JW, Cai JF, Cham T-J, Wang G (2015) Large-margin multi-modal deep learning for RGB-D object recognition. IEEE Trans Multimedia 17(11):1887–1898. https://doi.org/10.1109/tmm.2015.2476655
Xia ZQ, Feng XY, Lin J, Hadid A (2017) Deep convolutional hashing using pairwise multi-label supervision for large-scale visual search. Signal Process Image Commun 59:109–116. https://doi.org/10.1016/j.image.2017.06.008
Yang YM, Wu QMJ (2015) Mutilayer extreme learning machine with subnetwork nodes for representation learning. IEEE Trans Cybern 46(11):2570–2583. https://doi.org/10.1109/tcyb.2015.2481713
Yu K, Lin YQ, Lafferty J (2011) Learning image representations from the pixel level via hierarchical sparse coding. In: Proc Comput Vis Pattern Recognit CVPR 42(7):1713–1720. https://doi.org/10.1109/cvpr.2011.5995732
Zhang ZY, Tian ZS, Zhou M (2018) HandSense: smart multimodal hand gesture recognition based on deep neural networks. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-0989-7
Acknowledgements
This work is funded by National Natural Science Foundation of China (Grant No. 61402368). The authors thank all the reviewers for their very helpful comments to improve the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yin, Y., Li, H. RGB-D object recognition based on the joint deep random kernel convolution and ELM. J Ambient Intell Human Comput 11, 4337–4346 (2020). https://doi.org/10.1007/s12652-018-1067-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-1067-x