Abstract
RGB-D image is a multimodal data. Previous works have proved that using color and depth images together can dramatically increase the RGB-D based object recognition accuracy, but most of them either simply take all modalities as input, ignoring information about specific modalities, or train a first layer representation for each modality separately and concatenate them ignoring correlated modality information. In this paper, we use a variant of the sparse auto-encoder (SAE) which can specify how mode-sparse or mode-dense the features should be. A new deep learning network combining the variant SAE with the recursive neural networks (RNNs) was proposed. Through it, we got very discriminating features and obtained state of the art performance on a standard RGB-D object dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lai, K., Bo, L., Ren, X.: A Large-scale Hierarchical Multi-view RGB-D Object Dataset. In: ICRA, pp. 1817–1824 (2011)
Bo, L., Ren, X., Fox, D.: Depth Kernel Descriptors for Object Recognition. In: IROS, pp. 821–826 (2011)
Lai, K., Bo, L., Ren, X.: Sparse Distance Learning for Object Recognition Combining RGB and Depth Information. In: ICRA, pp. 4007–4013 (2011)
Blum, M., Springenberg, J.T., Wulfing, J.: A Learned Feature Descriptor for Object Recognition in RGB-D Data. In: ICRA, pp. 1298–1302 (2012)
Bo, L., Ren, X., Fox, D.: Unsupervised Feature Learning for RGB-D Based Object Recognition. In: Desai, J.P., Dudek, G., Khatib, O., Kumar, V. (eds.) Experimental Robotics. STAR, vol. 88, pp. 387–402. Springer, Heidelberg (2013)
Socher, R., Huval, B., Bath, B.P., et al.: Convolutional-Recursive Deep Learning for 3D Object Classification. In: NIPS, pp. 665–673 (2012)
Cireşan, D.C., Meier, U., Masci, J.: Flexible, High Performance Convolutional Neural Networks for Image Classification. In: IJCAI, pp. 1237–1242 (2011)
Ngiam, J., Khosla, A., Kim, M.: Multimodal Deep Learning. In: ICML, pp. 689–696 (2011)
Lenz, I., Lee, H., Saxena, A.: Deep Learning for Detecting Robotic Grasps. arXiv preprint arXiv 1301.3592 (2013)
Ng, A.: Sparse autoencoder. CS294A Lecture notes, 72 (2011)
Jalali, A., Ravikumar, P.D., Sanghavi, S., et al.: A Dirty Model for Multi-task Learning. In: NIPS, pp. 77–105 (2010)
Socher, R., Lin, C.C., Manning, C.: Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In: ICML, pp. 129–136 (2011)
Socher, R., Pennington, J., Huang, E.H.: Semi-supervised Recursive AutoEncoders for Predicting Sentiment Distributions. In: EMNLP, pp. 151–161 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bai, J., Wu, Y. (2014). SAE-RNN Deep Learning for RGB-D Based Object Recognition. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theory. ICIC 2014. Lecture Notes in Computer Science, vol 8588. Springer, Cham. https://doi.org/10.1007/978-3-319-09333-8_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-09333-8_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09332-1
Online ISBN: 978-3-319-09333-8
eBook Packages: Computer ScienceComputer Science (R0)