ABSTRACT
A scene is usually abstract that consists of several less abstract entities such as objects or themes. It is very difficult to reason scenes from visual features due to the semantic gap between the abstract scenes and low-level visual features. Some alternative works recognize scenes with a two-step framework by representing images with intermediate representations of objects or themes. However, the object co-occurrences between scenes may lead to ambiguity for scene recognition. In this paper, we propose a framework to represent images with intermediate (object) representations with spatial layout, i.e., object-to-object relation (OOR) representation. In order to better capture the spatial information, the proposed OOR is adapted to RGB-D data. In the proposed framework, we first apply object detection technique on RGB and depth images separately. Then the detected results of both modalities are combined with a RGB-D proposal fusion process. Based on the detected results, we extract semantic feature OOR and regional convolutional neural network (CNN) features located by bounding boxes. Finally, different features are concatenated to feed to the classifier for scene recognition. The experimental results on SUN RGB-D and NYUD2 datasets illustrate the efficiency of the proposed method.
- Dan Banica and Cristian Sminchisescu. 2015. Second-Order Constrained Parametric Proposals and Sequential Search-Based Structured Prediction for Semantic Segmentation in RGB-D Images The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Jawadul H. Bappy, Sujoy Paul, and Amit K. Roy-Chowdhury. 2016. Online Adaptation for Joint Scene and Object Classification. Springer International Publishing, Cham, 227--243.Google Scholar
- Alessandro Bergamo and Lorenzo Torresani. 2014. Classemes and Other Classifier-based Features for Efficient Object Categorization IEEE Trans. on Pattern Anal. and Mach. Intell.Google Scholar
- A. Bosch, A. Zisserman, and X. Muoz. 2006. Scene classification via pLSA. In ECCV, Vol. Vol. 4. 517--530. Google ScholarDigital Library
- Inderjit S. Dhillon. 2001. Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '01). ACM, New York, NY, USA, 269--274. Google ScholarDigital Library
- Mandar Dixit, Si Chen, Dashan Gao, Nikhil Rasiwasia, and Nuno Vasconcelos. 2015. Scene Classification with Semantic Fisher Vectors. CVPR.Google Scholar
- P. Dollár, R. Appel, S. Belongie, and P. Perona. 2014. Fast Feature Pyramids for Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 8 (Aug. 2014), 1532--1545. 1007/978-3-319-10590-1_53 Google ScholarDigital Library
- Hongyuan Zhu, Jean-Baptiste Weibel, and Shijian Lu. 2016. Discriminative Multi-Modal Feature Fusion for RGBD Indoor Scene Recognition The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Index Terms
RGB-D Scene Recognition with Object-to-Object Relation
Recommendations
RGB-D Scene Recognition based on Object-Scene Relation and Semantics-Preserving Attention
ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalScene recognition is challenging due to intra-class diversity and inter-class similarity. Previous works recognize scenes either with global representations or with intermediate representations of objects. By contrast, we investigate more discriminative ...
A Real-Time Scene Recognition System Based on RGB-D Video Streams
ICMI '19: 2019 International Conference on Multimodal InteractionDepth data captured by the cameras such as Microsoft Kinect can bring depth information than traditional RGB data, which is also more robust to different environments, such as dim or dark lighting conditions. In this technical demonstration, we build a ...
Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images
The traditional scene analysis mainly focuses on outdoor scene recognition rather than indoor scene understanding. However, with the widespread use of depth cameras, we have a new opportunity to handle the indoor scene recognition problem. In this paper,...
Comments