Abstract
Because depth information has shown its effectiveness in scene classification, RGB-D sensor-based scene classification has received wide attention. However, when images are polluted by noise in the transmission process, the recognition rate will decline significantly. Furthermore, after adopting feature representation schemes, the dimensionality of concatenated features that are extracted from the RGB image and depth image pair is very high. Therefore, a new dimensional reduction algorithm called Cauchy estimator discriminant learning (CEDL) is presented in this paper. CEDL simultaneously addresses two goals: (1) to decrease negative influences to some extent when there is noise in the input samples; (2) to preserve the local and global geometry structure of the input samples. Experiments with the frequently used NYU Depth V1 dataset suggest the effectiveness of CEDL compared with other state-of-the-art scene classification methods.
Similar content being viewed by others
References
Bai S (2014) Sparse code LBP and SIFT features together for scene categorization. Audio, Language and Image Processing (ICALIP), 2014 International Conference on IEEE, Jul. 2014, pp 200–205
Bo L, Ren X, Fox D (2011) Depth kernel descriptors for object recognition. 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sep. 2011, pp 821–826
Bo L, Ren X, Fox D (2013) Unsupervised feature learning for RGB-D based object recognition. Springer Tracts Adv Robot 88:387–402
Cai D, He X, Han J, Zhang H (2006) Orthogonal Laplacianfaces for face recognition. IEEE Trans Image Process 15(11):3608–3614
Chen D, Cao X, Wen F, Sun J (2013) Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification. 2013 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2013, pp 3025–3032
Chen Y, Wang JZ, Krovetz R (2003) Content-based image retrieval by clustering. Digital Image Processing, pp 193–200
Desingh K, Krishna KM, Jawahar CV, Rajan D (2013) Depth really matters: improving visual salient region detection with depth. BMVC, pp 1–11
Duan L, Yue K, Jin C, Xu W, Liu W (2015) Tracing errors in probabilistic databases based on the Bayesian network. Database Systems for Advanced Applications. Springer International Publishing, Apr. 2015, pp 104–119
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Graham DB, Allinson NM (1998) Characterizing virtual eigensignatures for general purpose face recognition. In: Wechsler H, Phillips PJ, Bruce V, Fogelman-Soulie F, Huang TS (eds) Face recognition: from theory to applications, vol 163, pp 446–456
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft Kinect sensor: a review. IEEE Trans Cybern 43(5):1318–1334
He X, Niyogi P (2003) Locality preserving projections. Neural Inf Process Syst 16:153
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441
Huang D, Shan C, Ardabilian M, Wang Y, Chen L (2011) Local binary patterns and its application to facial image analysis: a survey. IEEE Trans Syst Man Cybern Part C Appl Rev 41(6):765–781
Janoch A, Karayev S, Jia Y, Barron JT, Fritz M, Saenko K, Darrell T (2011) A category-level 3D object dataset: putting the Kinect to work. Proceedings of ICCV Workshop on Advances in Computer Vision and Pattern Recognition, pp 141–165
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE International Conference on Computer Vision and Pattern Recognition, Jun. 2006, pp 2167–2178
Li L, Su H, Lim Y, Li F (2010) Objects as attributes for scene classification. ECCV 2010 Workshops, Sep. 2010, pp 57–69
Liang Y, Song M, Bu J, Chen C (2014) Colorization for gray scale facial image by locality-constrained linear coding. J Signal Process Syst 74(1):59–67
Liu T, Tao D Classification with Noisy Labels by Importance Reweighting. IEEE Trans Pattern Anal Mach Intell (T-PAMI) doi: 10.1109/TPAMI.2015.2456899
Madokoro H, Utsumi Y, Sato K (2012) Scene classification using unsupervised neural networks for mobile robot vision. IEEE Proceedings of SICE Annual Conference, pp 1568–1573
Mariscal-Ramirez JA, Fernandez-Prieto JA, Canada-Bago J, Gadeo-Martos MA (2015) A new algorithm to monitor noise pollution adapted to resource-constrained devices. Multimedia Tools Appl 74:9175–9189
Mizera I, Muller CH (2002) Breakdown points of Cauchy regression-scale estimators. Stat Probab Lett 57(1):79–89
Monay F, Gatica-Perez D (2003) On image auto-annotation with latent space models. Proceedings of the eleventh ACM international conference on MultimediaACM, pp 275–278
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Shao L, Han J, Xu D, Shotton J (2013) Computer vision for RGB-D sensors: Kinect and its applications [special issue intro.]. IEEE Trans Cybern 43(5):1314–1317
Shao L, Liu L, Li X (2014) Feature learning for image classification via multiobjective genetic programming. IEEE Trans Neural Netw Learn Syst 25:1359–1371
Shao Y, Zhou Y, He X, Cai D, Bao H (2009) Semi-supervised topic modeling for image annotation. In Proceedings of the 17th ACM International Conference on Multimedia, pp 521–524
Silberman N, Fergus R (2011) Indoor scene segmentation using a structured light sensor. In Proceedings ICCV Workshop 3-D Representation Recognition, Nov. 2011, pp 601–608
Smeulders AW, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Tao D, Li X, Wu X, Maybank S (2007) General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29(10):1700–1715
Tao D, Li X, Wu X, Maybank S (2009) Geometric mean for subspace selection. IEEE Trans Pattern Anal Mach Intell 31(2):260–274
Tenenbaum J, Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Tom M, Babu RV, Praveen RG (2015) Compressed domain human action recognition in H.264/AVC video streams. Multimedia Tools Appl 74:9328–9338
Vailaya A, Figueiredo MAT, Jain AK, Zhang H-J (2001) Image classification for content-based indexing. IEEE Trans Image Process 10(1):117–130
Wang D (2005) The time dimension for scene analysis. IEEE Trans Neural Netw 16(6):1401–1426
Wang X, Hou C, Pu L, Hou Y (2015) A depth estimating method from a single image using FoE CRF. Multimedia Tools Appl 74:9491–9506
Wang X, Hou Z, Tan M, Wang Y, Wang X (2008) Corridor-scene classification for mobile robot using spiking neurons. IEEE International Conference on Natural Computation, pp 125–129
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. IEEE International Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp 3360–3367
Xu C, Tao D, Xu C Multi-view intact space learning. IEEE Trans Patten Anal Mach Intell doi: 10.1109/TPAMI.2015.2417578
Yao Y, Fu Y (2012) Real-time hand pose estimation from RGB-D sensor. IEEE International Conference on Multimedia and ExpoIEEE Computer Society, Jul 2012, pp 705–710
Zhang T, Tao D, Li X, Yang J (2009) Patch alignment for dimensionality reduction. IEEE Trans Knowl Data Eng 21(9):1299–1313
Zhang L, Zhang L, Tao D, Du B (2015) A sparse and discriminative tensor to vector projection for human gait feature representation. Signal Process 106:245–252
Zhang L, Zhang Q, Zhang L, Tao D, Huang X, Du B (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recogn 48(10):3102–3112
Zhu F, Shao L (2014) Weakly-supervised cross-domain dictionary learning for visual recognition. Int J Comput Vis 109:42–59
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):262–286
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant 61572486, 61402458, 61301242, 61271407 and 61263048, the Guangdong Natural Science Funds under Grant 2014A030310252, the Shenzhen Technology Project under Grant JCYJ20140901003939001, Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering Program under Grant 2014KLA01, the Young and Middle-Aged Backbone Teachers’ Cultivation Plan of Yunnan University under Grant XT412003, the Fundamental Research Funds for the Central Universities, China University of Petroleum (East China) under Grant 14CX02203A.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tao, D., Yang, X., Liu, W. et al. Cauchy Estimator Discriminant Learning for RGB-D Sensor-based Scene Classification. Multimed Tools Appl 76, 4471–4489 (2017). https://doi.org/10.1007/s11042-016-3370-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3370-x