Abstract
Depth as well as intensity of a pixel plays a significant role in labeling objects in 3D environments. This paper presents a novel approach of labeling objects from multi-view video sequences by incorporating rich depth information. The depth map of a scene is estimated from focus-cues using the Gaussian–Hermite moments (GHMs) of local neighboring pixels. It is expected that the depth map obtained from GHMs provides robust features as compared to that provided by other popular depth maps such as those obtained from Kinect and defocus cue. We use the rich depth and intensity values of a pixel to score every point of a video frame for generating labeled probability maps in a 3D environment. These maps are then used to create a 3D scene wherein available objects are labeled distinctively. Experimental results reveal that our proposed approach yields excellent performance of object labeling for different multi-view scenes taken from RGB-D object dataset, in particular showing significant improvements in precision–recall characteristics and F1-score.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems, pp. 821–826. San Francisco, CA, USA (2011)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)
Chuang, Y.Y., Curless, B., Salesin, D.H., Szeliski, R.: A Bayesian Approach to Digital Matting, vol. 2, pp. 264–271. Kauai, Hawaii (2001)
Collet, A., Berenson, D., Srinivasa, S.S., Ferguson, D.: Object recognition and full pose registration from a single image for robotic manipulation. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 48–55. Kobe, Japan (2009)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 886–893. Washington, DC, USA (2005)
Das, S., Koperski, M., Bremond, F., Francesca, G.: Action recognition based on a mixture of RGB and depth based skeleton. In: Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1–6. Lecce, Italy (2017)
Douillard, B., Fox, D., Ramos, F., Durrant-Whyte, H.: Classification and semantic mapping of urban environments. Int. J. Robot. Res. 30(1), 5–32 (2011)
Engelcke, M., Rao, D., Zeng Wang, D., Hay Tong, C., Posner, I.: Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 1355–1361. Singapore (2017)
Haque, S., Rahman, S.M.M., Hatzinakos, D.: Gaussian-Hermite moment-based depth estimation from single still image for stereo vision. J. Vis. Commun. Image Represent. 41, 281–295 (2016)
Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using depth cameras for dense 3D modeling of indoor environments. In: Khatib, O., Kumar, V., Sukhatme, G. (eds.) Experimental Robotics: Springer Tracts in Advanced Robotics, vol. 79, pp. 477–491. Springer (2014)
Lai, K., Bo, L., Fox, D.: Unsupervised feature learning for 3D scene labeling. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 3050–3057. Hong Kong, China (2014)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 1817–1824 (2011)
Lai, K., Bo, L., Ren, X., Fox, D.: Sparse distance learning for object recognition combining RGB and depth information. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 4007–4013. Shanghai, China (2011)
Lai, K., Bo, L., Ren, X., Fox, D.: Detection-based object labeling in 3D scenes. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 1330–1337. Saint Paul, MN, USA (2012)
Lai, K., Fox, D.: Object recognition in 3D point clouds using web data and domain adaptation. Int. J. Robot. Res. 29(8), 1019–1037 (2010)
Levin, A., Lischinski, D., Weiss, Y.: A closed form solution to natural image matting. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 61–68. Washington, DC, USA (2006)
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A.J., Bartlett, P.J. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)
Quigley, M., Batra, S., Gould, S., Klingbeil, E., Le, Q., Wellman, A., Ng, A.Y.: High-accuracy 3D sensing for mobile manipulation: improving object detection and door opening. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 2816–2822. Kobe, Japan (2009)
Rahman, S.M.M., Lata, S.P., Howlader, T.: Bayesian face recognition using 2D Gaussian-Hermite moments. EURASIP J. Image Video Process. 2015(35), 1–20 (2015)
Ren, X., Ramanan, D.: Histograms of sparse codes for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3246–3253. Portland, OR, USA (2013)
Salakhutdinov, R., Torralba, A., Tenenbaum, J.: Learning to share visual appearance for multiclass object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1481–1488. Los Alamitos, CA, USA (2011)
Shen, J., Shen, W., Shen, D.: On geometric and orthogonal moments. Int. J. Pattern Recognit. Artif. Intell. 14(07), 875–894 (2000)
Su, H., Huang, Q., Mitra, N.J., Li, Y., Guibas, L.: Estimating image depth using shape collections. ACM Trans. Graph. 33(4), 37:1–37:11 (2014)
Triebel, R., Schmidt, R., Mozos, O.M., Burgard, W.: Instance-based AMN classification for improved object recognition in 2D and 3D laser range data. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2225–2230. Hyderabad, India (2007)
Xiong, X., Munoz, D., Bagnell, J.A., Hebert, M.: 3-D scene analysis via sequenced predictions over points and regions. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 2609–2616. Shanghai, China (2011)
Xu, Y., Hu, X., Peng, S.: Sharp image estimation from a depth-involved motion-blurred image. Neurocomputing 171(C), 1185–1192 (2016)
Zhuo, S., Sim, T.: Defocus map estimation from a single image. Pattern Recognit. 44(9), 1852–1858 (2011)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Enan, S.S., Mahbubur Rahman, S.M., Haque, S., Howlader, T., Hatzinakos, D. (2020). Object Labeling in 3D from Multi-view Scenes Using Gaussian–Hermite Moment-Based Depth Map. In: Chaudhuri, B., Nakagawa, M., Khanna, P., Kumar, S. (eds) Proceedings of 3rd International Conference on Computer Vision and Image Processing. Advances in Intelligent Systems and Computing, vol 1024. Springer, Singapore. https://doi.org/10.1007/978-981-32-9291-8_8
Download citation
DOI: https://doi.org/10.1007/978-981-32-9291-8_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9290-1
Online ISBN: 978-981-32-9291-8
eBook Packages: EngineeringEngineering (R0)