Modality-specific and hierarchical feature learning for RGB-D hand-held object recognition

Lv, Xiong; Liu, Xinda; Li, Xiangyang; Li, Xue; Jiang, Shuqiang; He, Zhiqiang

doi:10.1007/s11042-016-3375-5

Modality-specific and hierarchical feature learning for RGB-D hand-held object recognition

Published: 04 March 2016

Volume 76, pages 4273–4290, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xiong Lv¹,
Xinda Liu²,
Xiangyang Li¹,
Xue Li^1,3,
Shuqiang Jiang¹ &
…
Zhiqiang He⁴

470 Accesses
Explore all metrics

Abstract

Hand-held object recognition is an important research topic in image understanding and plays an essential role in human-machine interaction. With the easily available RGB-D devices, the depth information greatly promotes the performance of object segmentation and provides additional channel information. While how to extract a representative and discriminating feature from object region and efficiently take advantage of the depth information plays an important role in improving hand-held object recognition accuracy and eventual human-machine interaction experience. In this paper, we focus on a special but important area called RGB-D hand-held object recognition and propose a hierarchical feature learning framework for this task. First, our framework learns modality-specific features from RGB and depth images using CNN architectures with different network depth and learning strategies. Secondly a high-level feature learning network is implemented for a comprehensive feature representation. Different with previous works on feature learning and representation, the hierarchical learning method can sufficiently dig out the characteristics of different modal information and efficiently fuse them in a unified framework. The experimental results on HOD dataset illustrate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Complex-Valued Representation for RGB-D Object Recognition

RGB-D Object Recognition Using the Knowledge Transferred from Relevant RGB Images

Viewpoint invariant semantic object and scene categorization with RGB-D sensors

Article 05 July 2018

References

Beck C, Broun A, Mirmehdi M, Pipe A, Melhuish C (2014) Text line aggregation. Int Conf Pattern Recogn Appl Methods (ICPRAM), pp 393–401
Bo L, Ren X (2011) Depth kernel descriptors for object recognition. In: IROS, pp 821–826
Chai X, Li G, Lin Y, Xu Z, Tang Y, Chen X, Zhou M (2013) Sign language recognition and translation with kinect. In: ICAFGR
Fu Y, Cao L, Guo G, Huang TS (2008) Multiple feature fusion by subspace learning. In: Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, pp 127– 134
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587
Gupta S, Arbeláez P, Girshick R, images JM (2014) Indoor scene understanding with rgb-d Bottom-up segmentation, object detection and semantic segmentation. IJCV, pp 1–17
Gupta S, Girshick RB, Arbelaez P, Malik J (2014) Learning rich features from RGB-d images for object detection and segmentation. CoRR, abs/1407:5736
Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Kanezaki A, Marton Z-C, Pangercic D, Harada T, Kuniyoshi Y, Beetz M (2011) Voxelized shape and color histograms for rgb-d. In: IROS Workshop on Active Semantic Perception. Citeseer
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105
Liu S, Wang S, Wu L, Jiang S (2014) Multiple feature fusion based hand-held object recognition with rgb-d data. In: Proceedings of International Conference on Internet Multimedia Computing and Service. ACM, pp 303–306
Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vis Image Understand 118:50–60
Article Google Scholar
Lv X, Jiang S-Q, Herranz L, Wang S (2015) Rgb-d hand-held object recognition based on heterogeneous feature fusion. J Comput Sci Technol 30(2):340–352
Article Google Scholar
Lv X, Wang S, Li X, Jiang S (2014) Combining heterogenous features for 3d hand-held object recognition. In: Proceedings SPIE, Optoelectronic Imaging and Multimedia Technology III, vol 9273, pp 92732I–92732I–10
Marton Z-C, Pangercic D, Rusu Radu B, Holzbach A, Beetz M (2010) Hierarchical object geometric categorization and appearance classification for mobile manipulation. In: Proceedings of the IEEE-RAS International Conference on Humanoid Robots, TN, USA
Ren X, Gu C (2010) Figure-ground segmentation improves handled object recognition in egocentric video. In: CVPR, pp 3137–3144
Rivera-Rubio J, Idrees S, Alexiou I, Hadjilucas L, Bharath AA (2014) Small hand-held object recognition test (short). In: WACV, pp 524–531
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409:1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. CoRR, abs/1409:4842
Wohlkinger W, Vincze M (2011) Ensemble of shape functions for 3d object classification. In: ROBIO, pp 2987–2992
Wu F, Jing X-Y, You X, Yue D, Hu R, Yang J-Y (2016) Multi-view low-rank dictionary learning for image classification. Pattern Recogn 50:143–154
Article Google Scholar
Xu RYD, Jin JS (2006) Individual object interaction for camera control and multimedia synchronization. In: ICASSP, vol 5
Zeiler MD, Fergus R (2013) Visualizing and understanding convolutional networks. CoRR, abs/1311:2901
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: ECCV, pp 818–833
Zha Z-J, Yang Y, Tang J, Wang M, Chua T-S (2015) Robust multiview feature learning for rgb-d image understanding. ACM Trans Intell Syst Technol, vol 6, pp 15:115:19
Zhang B, Perina A, Li Z, Murino V (2016) Bounding multiple gaussians uncertainty with application to object tracking. IJCV
Zhang B, Li Z, Perina A, Del Bue A, Murino V (2015) Adaptive local movement modelling for object tracking. In: WACV, pp 25–32
Zhang K, Zhang L, Yang M-H (2014) Fast compressive tracking. IEEE Trans Pattern Anal Mach Intell 36(10):2002–2015
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Basic Research 973 Program of China under Grant No. 2012CB316400, the National Natural Science Foundation of China under Grant Nos. 61532018, and 61322212 the National High Technology Research and Development 863 Program of China under Grant No. 2014AA015202. This work is also funded by Lenovo Outstanding Young Scientists Program (LOYS).

Author information

Authors and Affiliations

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Xiong Lv, Xiangyang Li, Xue Li & Shuqiang Jiang
School of Mathematics and Computer Science, Ningxia University, Ningxia, 750021, China
Xinda Liu
College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao, Shandong Province, China
Xue Li
Lenovo Corporate Research, Beijing, 100085, China
Zhiqiang He

Authors

Xiong Lv
View author publications
You can also search for this author inPubMed Google Scholar
Xinda Liu
View author publications
You can also search for this author inPubMed Google Scholar
Xiangyang Li
View author publications
You can also search for this author inPubMed Google Scholar
Xue Li
View author publications
You can also search for this author inPubMed Google Scholar
Shuqiang Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Zhiqiang He
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhiqiang He.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lv, X., Liu, X., Li, X. et al. Modality-specific and hierarchical feature learning for RGB-D hand-held object recognition. Multimed Tools Appl 76, 4273–4290 (2017). https://doi.org/10.1007/s11042-016-3375-5

Download citation

Received: 03 December 2015
Revised: 09 February 2016
Accepted: 15 February 2016
Published: 04 March 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s11042-016-3375-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modality-specific and hierarchical feature learning for RGB-D hand-held object recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Complex-Valued Representation for RGB-D Object Recognition

RGB-D Object Recognition Using the Knowledge Transferred from Relevant RGB Images

Viewpoint invariant semantic object and scene categorization with RGB-D sensors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now