Abstract
Traditionally, the indoor scene classification problem has been approached from a 2D image recognition point of view. In most visual scene classification systems, a descriptor for the input image is generated to obtain a suitable representation that includes features related to color, shape or spatial information. Techniques based on the use of a spatial pyramid have proven to be adequate to perform this step. In the past years, on the other hand, 3D sensors have become widely available, which allows to include new information sources to the framework previously described. In this work we rely on RGB-D data to extend the spatial pyramid approach, aimed at building descriptors that can lead to a more robust representation against changing lighting conditions. The proposed descriptors are evaluated on the RobotVision@ImageCLEF-2013 benchmark dataset, remarkably outperforming state-of-the-art 3D local and global descriptors.
Similar content being viewed by others
References
Alexandre, L.A.: 3D descriptors for object and category recognition: a comparative evaluation. In: Workshop on Color-Depth Camera Fusion in Robotics at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2012)
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer Vision—ECCV 2006, pp. 404–417. Springer, New York (2006)
Ben-Chen, M., Gotsman, C.: Characterizing shape using conformal factors. In: 3DOR, pp. 1–8 (2008)
Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 821–826. IEEE, New York (2011)
Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8 (2008)
Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Computer Vision—ECCV 2006, pp. 517–530. Springer, New York (2006)
Bosch, A., Zisserman, A., Muñoz, X.: Image classification using random forests and ferns. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8. IEEE, New York (2007)
Bosch, A., Zisserman, A., Muñoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 712–727 (2008)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(27), 1–27 (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chapelle, O., Haffner, P., Vapnik, V.: Support vector machines for histogram-based image classification. IEEE Trans. Neural Netw. 10(5), 1055–1064 (1999)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, p. 22 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 1, pp. 886–893 (2005)
Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An evaluation of the RGB-D SLAM system. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), pp. 1691–1696 (2012)
Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 2, pp. 524–531 (2005)
Filipe, S., Alexandre, L.: A Comparative Evaluation of 3D Keypoint Detectors in a RGB-D Object Dataset, pp. 476–483 (2014)
Garcia, S., Derrac, J., Cano, J.R., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Gatzke, T., Grimm, C., Garland, M., Zelinka, S.: Curvature maps for local shape comparison. In: 2005 International Conference Shape Modeling and Applications, pp. 244–253. IEEE, New York (2005)
Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31(5), 647–663 (2012)
Horn, B.: Extended Gaussian images. Proc. IEEE 72(12), 1671–1686 (1984)
Krainin, M., Curless, B., Fox, D.: Autonomous generation of complete 3D object models using next best view manipulation planning. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 5031–5037. IEEE, New York (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824 (2011)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE, New York (2006)
Li, J., Allinson, N.M.: A comprehensive review of current local features for computer vision. Neurocomputing 71(10–12), 1771–1787 (2008)
Linde, O., Lindeberg, T.: Object recognition using composed receptive field histograms of higher dimensionality. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 2, pp. 1–6. IEEE, New York (2004)
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Martinez-Gomez, J., Caputo, B.: Towards semi-supervised learning of semantic spatial concepts. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1936–1943. IEEE, New York (2011)
Martínez-Gómez, J., García-Varea, I., Cazorla, M., Caputo, B.: Overview of the imageCLEF 2013 robot vision task. In: Working Notes for CLEF 2013 Conference, Valencia, 23–26 September 2013 (2013)
Martinez Mozos, O., Stachniss, C., Burgard, W.: Supervised learning of places from range data using AdaBoost. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, 2005. ICRA 2005, pp. 1730–1735. IEEE, New York (2005)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)
Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc, New York (1997)
Nowak, E., Jurie, F., Triggs, B.: Sampling Strategies for bag-of-features image classification. In: Computer Vision ECCV 2006. Lecture Notes in Computer Science, vol. 3954, pp. 490–503. Springer, Berlin (2006)
Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Trans. Graph. 21(4), 807–832 (2002)
Park, H.S., Jun, C.H.: A simple and fast algorithm for \(K\)-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)
Pronobis, A., Martinez Mozos, O., Caputo, B.: SVM-based discriminative accumulation scheme for place recognition. In: IEEE International Conference on Robotics and Automation, 2008. ICRA 2008, pp. 522–529. IEEE, New York (2008)
Pronobis, A., Martínez Mozos, O., Caputo, B., Jensfelt, P.: Multi-modal semantic place classification. Int. J. Robot. Res. (2009). doi:10.1177/0278364909356483
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 413–420 (2009)
Redondo-Cabrera, C., López-Sastre, R.J., Acevedo-Rodríguez, J., Maldonado-Bascón, S.: Surfing the point clouds: selective 3D spatial pyramids for category-level object recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3458–3465. IEEE, New York (2012)
Redondo-Cabrera, C., López-Sastre, R.J., Acevedo-Rodríguez, J., Maldonado-Bascón, S.: Recognizing in the depth: selective 3D spatial pyramid matching kernel for object and scene categorization. Image Vis. Comput. 32(12), 965–978 (2014)
Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2759–2766 (2012)
Romero-González, C.: Clasificación automática de espacios utilizando información visual y de profundidad. Master’s thesis, University of Castilla-La Mancha, Spain (2012)
Rusu, R., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3D registration. In: IEEE International Conference on Robotics and Automation, 2009. ICRA ’09, pp. 3212–3217 (2009). doi:10.1109/ROBOT.2009.5152473
Rusu, R., Bradski, G., Thibaux, R., Hsu, J.: Fast 3D recognition and pose using the viewpoint feature histogram. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2155–2162 (2010). doi:10.1109/IROS.2010.5651280
Rusu, R., Marton, Z., Blodow, N., Beetz, M.: Learning informative point classes for the acquisition of object model maps. In: 10th International Conference on Control, Automation, Robotics and Vision, 2008. ICARCV 2008, pp. 643–650 (2008)
Rusu, R.B., Cousins, S.: 3D is here: point cloud library (PCL). In: IEEE International Conference on Robotics and Automation (ICRA), Shanghai (2011)
Sinha, A., Banerji, S., Liu, C.: New color GPHOG descriptors for object and scene image classification. Mach. Vis. Appl. 25(2), 361–375 (2014)
Socher, R., Huval, B., Bath, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: Advances in Neural Information Processing Systems, pp. 665–673 (2012)
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: A RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)
Steder, B., Rusu, R.B., Konolige, K., Burgard, W.: Point feature extraction on 3D range scans taking into account object boundaries. In: 2011 IEEE International Conference on Robotics and automation (ICRA), pp. 2601–2608. IEEE, New York (2011)
Stückler, J., Steffens, R., Holz, D., Behnke, S.: Efficient 3D object perception and grasp planning for mobile manipulation in domestic environments. Robot. Auton. Syst. 61(10), 1106–1115 (2013)
Tangelder, J., Veltkamp, R.: A survey of content based 3D shape retrieval methods. Multimed. Tools Appl. 39(3), 441–471 (2008)
Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision—ECCV 2010. Lecture Notes in Computer Science, vol. 6313, pp. 356–369. Springer, Berlin (2010)
Tombari, F., Salti, S., Di Stefano, L.: Performance evaluation of 3D keypoint detectors. Int. J. Comput. Vis. 102(1–3), 198–220 (2013)
Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, 2003, pp. 273–280. IEEE, New York (2003)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of the International Conference on Multimedia, pp. 1469–1472. ACM, New York (2010)
Wang, M., Gao, Y., Lu, K., Rui, Y.: View-based discriminative probabilistic modeling for 3D object retrieval and recognition. IEEE Trans. Image Process. 22(4), 1395–1407 (2013)
Wohlkinger, W., Vincze, M.: Ensemble of shape functions for 3D object classification. In: 2011 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 2987–2992 (2011). doi:10.1109/ROBIO.2011.6181760
Yamauchi, B., Langley, P.: Place recognition in dynamic environments. J. Robot. Syst. 14(2), 107–120 (1997)
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, pp. 197–206. ACM, New York (2007)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 1794–1801 (2009)
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis. 73(2), 213–238 (2007)
Zhang, M.L., Zhou, Z.H.: A \(k\)-nearest neighbor based algorithm for multi-label classification. In: 2005 IEEE International Conference on Granular Computing, vol. 2, pp. 718–721. IEEE, New York (2005)
Zhong, Y.: Intrinsic shape signatures: a shape descriptor for 3D object recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689–696 (2009)
Zou, Q., Cao, Y., Li, Q., Mao, Q., Wang, S.: Automatic inpainting by removing fence-like structures in RGBD images. Mach. Vis. Appl. 25(7), 1841–1858 (2014)
Acknowledgments
This work has been partially funded by FEDER funds and the Spanish Government (MICINN) through project TIN2013-46638-C3-3-P and by Consejería de Educación, Cultura y Deportes of the JCCM regional government through project PPII-2014-015-P. Cristina Romero-González is also funded by the MECD grant FPU12/04387, and Jesus Martínez-Gómez is also funded by the JCCM grant POST2014/8171.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Romero-González, C., Martínez-Gómez, J., García-Varea, I. et al. 3D spatial pyramid: descriptors generation from point clouds for indoor scene classification. Machine Vision and Applications 27, 263–273 (2016). https://doi.org/10.1007/s00138-015-0744-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-015-0744-4