MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition

Zhou, Feng; Hu, Yong; Shen, Xukun

doi:10.1007/s00371-018-1559-x

MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition

Original Article
Published: 29 May 2018

Volume 35, pages 1583–1594, (2019)
Cite this article

The Visual Computer Aims and scope Submit manuscript

485 Accesses
12 Citations
Explore all metrics

Abstract

This paper researches on the problem of object recognition using RGB-D data. Although deep convolutional neural networks have so far made progress in this area, they are still suffering a lot from lack of large-scale manually labeled RGB-D data. Labeling large-scale RGB-D dataset is a time-consuming and boring task. More importantly, such large-scale datasets often exist a long tail, and those hard positive examples of the tail can hardly be recognized. To solve these problems, we propose a multimodal self-augmentation and adversarial network (MSANet) for RGB-D object recognition, which can augment the data effectively at two levels while keeping the annotations. Toward the first level, series of transformations are leveraged to generate class-agnostic examples for each instance, which supports the training of our MSANet. Toward the second level, an adversarial network is proposed to generate class-specific hard positive examples while learning to classify them correctly to further improve the performance of our MSANet. Via the above schemes, the proposed approach wins the best results on several available RGB-D object recognition datasets, e.g., our experimental results indicate a 1.5% accuracy boost on benchmark Washington RGB-D object dataset compared with the current state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

RGB-D Object Recognition Using the Knowledge Transferred from Relevant RGB Images

Small Object Recognition Based on the Generative Adversarial Network and Multi-instance Learning

Semi-supervised Multi-category Classification with Generative Adversarial Networks

References

Bai, J., Wu, Y.: SAE-RNN deep learning for RGB-d based object recognition. In: International Conference on Intelligent Computing, Springer, pp. 235–240 (2014)
Blum, M., Springenberg, J.T., Wülfing, J., Riedmiller, M.: A learned feature descriptor for object recognition in RGB-d data. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 1298–1303 (2012)
Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 821–826 (2011)
Bo, L., Ren, X., Fox, D.: Hierarchical matching pursuit for image classification: architecture and fast algorithms. In: Advances in Neural Information Processing Systems, pp. 2115–2123 (2011)
Bo, L., Ren, X., Fox, D.: Unsupervised Feature Learning for RGB-D Based Object Recognition. Springer, Berlin (2013)
Book Google Scholar
Browatzki, B., Fischer, J., Graf, B., Bülthoff, H.H., Wallraven, C.: Going into depth: Evaluating 2d and 3d cues for object classification on a new, large-scale object dataset. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), IEEE, pp. 1189–1195 (2011)
Cheng, Y., Cai, R., Zhang, C., Li, Z., Zhao, X., Huang, K., Rui, Y.: Query adaptive similarity measure for RGB-d object recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 145–153 (2015)
Cheng, Y., Cai, R., Zhao, X., Huang, K.: Convolutional fisher kernels for RGB-d object recognition. In: 2015 International Conference on 3D Vision (3DV), IEEE, pp. 135–143 (2015)
Cheng, Y., Zhao, X., Cai, R., Li, Z., Huang, K., Rui, Y.: Semi-supervised multimodal deep learning for RGB-d object recognition. In: International Joint Conference on Artificial Intelligence, pp. 3345–3351 (2016)
Cheng, Y., Zhao, X., Huang, K., Tan, T.: Semi-supervised learning for RGB-d object recognition. In: 2014 22nd International Conference on Pattern Recognition (ICPR), IEEE, pp. 2377–2382 (2014)
Cheng, Y., Zhao, X., Huang, K., Tan, T.: Semi-supervised learning and feature evaluation for RGB-D object recognition. Comput. Vis. Image Underst. 139, 149–160 (2015)
Article Google Scholar
Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 3642–3649 (2012)
Cireşan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: High-performance neural networks for visual object classification. arXiv preprint arXiv:1102.0183 (2011)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, IEEE, pp. 248–255 (2009)
Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in neural information processing systems, pp. 1486–1494 (2015)
Dosovitskiy, A., Fischer, P., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1734–1747 (2016)
Article Google Scholar
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust rgb-d object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 681–687 (2015)
Goodfellow, I.: Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, Springer, pp. 345–360 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Jhuo, I.H., Gao, S., Zhuang, L., Lee, D., Ma, Y.: Unsupervised feature learning for RGB-D image classification. In: Asian Conference on Computer Vision, Springer, pp. 276–289 (2014)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, ACM, pp. 675–678 (2014)
Johnson, A.E., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 433–449 (1999)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 1817–1824 (2011)
Lai, K., Bo, L., Ren, X., Fox, D.: Sparse distance learning for object recognition combining rgb and depth information. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 4007–4013 (2011)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lin, S., Chen, Y., Lai, Y.K., Martin, R.R., Cheng, Z.Q.: Fast capture of textured full-body avatar with RGB-D cameras. Vis. Comput. 32(6–8), 681–691 (2016)
Article Google Scholar
Loshchilov, I., Hutter, F.: Online batch selection for faster training of neural networks. arXiv preprint arXiv:1511.06343 (2015)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. Comput. Sci. 3, 2672–2680 (2014)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Rasool, S., Sourin, A.: Real-time haptic interaction with rgbd video streams. Vis. Comput. 32(10), 1311–1321 (2016)
Article Google Scholar
Redondo-Cabrera, C., López-Sastre, R.J., Acevedo-Rodriguez, J., Maldonado-Bascón, S.: Surfing the point clouds: Selective 3d spatial pyramids for category-level object recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 3458–3465 (2012)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
Schwarz, M., Schulz, H., Behnke, S.: Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 1329–1335 (2015)
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Socher, R., Huval, B., Bath, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3d object classification. In: Advances in Neural Information Processing Systems, pp. 656–664 (2012)
Song, X., Zhong, F., Wang, Y., Qin, X.: Estimation of kinect depth confidence through self-training. Vis. Comput. 30(6–8), 855–865 (2014)
Article Google Scholar
Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2377–2385 (2015)
Takác, M., Bijral, A.S., Richtárik, P., Srebro, N.: Mini-batch primal and dual methods for SVMS. In: ICML, vol. 3, pp. 1022–1030 (2013)
Tang, Y., Tong, R., Tang, M., Zhang, Y.: Depth incorporating with color improves salient object detection. Vis. Comput. 32(1), 111–121 (2016)
Article Google Scholar
Wang, A., Cai, J., Lu, J., Cham, T.J.: Mmss: Multi-modal sharable and specific feature learning for RGB-D object recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1125–1133 (2015)
Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2015)
Wang, X., Shrivastava, A., Gupta, A.: A-fast-rcnn: Hard positive generation via adversary for object detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Wei, Y., Feng, J., Liang, X., Cheng, M.M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: IEEE CVPR (2017)
Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: sequence generative adversarial nets with policy gradient. In: AAAI, pp. 2852–2858 (2017)

Download references

Acknowledgements

This work is supported in part by MS-RA CCRP Funding FY16-RES-THEME-039. The authors thank all the anonymous reviewers for their very helpful comments to improve this paper.

Author information

Authors and Affiliations

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Feng Zhou & Xukun Shen
School of New Media Art and Design, Beihang University, Beijing, China
Yong Hu

Authors

Feng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xukun Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xukun Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, F., Hu, Y. & Shen, X. MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition. Vis Comput 35, 1583–1594 (2019). https://doi.org/10.1007/s00371-018-1559-x

Download citation

Published: 29 May 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s00371-018-1559-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition

Abstract

Access this article

Similar content being viewed by others

RGB-D Object Recognition Using the Knowledge Transferred from Relevant RGB Images

Small Object Recognition Based on the Generative Adversarial Network and Multi-instance Learning

Semi-supervised Multi-category Classification with Generative Adversarial Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition

Abstract

Access this article

Similar content being viewed by others

RGB-D Object Recognition Using the Knowledge Transferred from Relevant RGB Images

Small Object Recognition Based on the Generative Adversarial Network and Multi-instance Learning

Semi-supervised Multi-category Classification with Generative Adversarial Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation