Abstract
Given a single image, we propose a scene understanding framework that segments and categorizes the objects in the scene, and classifies the overall scene. A handful of frameworks already exist to perform these tasks coherently, but training of these models is time-consuming, thereby limiting their scalability. This paper presents a scalable framework by adopting an object-based approach, which sequentially performs unsupervised object discovery using multiple saliency detection algorithms, object segmentation by graph-cut, object classification using the bag-of-features model, and lastly, scene classification by binary decision trees. A novel region-of-interest (ROI) detector, based on morphological image processing techniques, is proposed to automatically provide object location priors from saliency maps. Additionally, for improving object discovery, multiple saliency detectors are combined using a novel method to produce the ROI map, which is then used to obtain the segmentation. We tested our system on a novel object-based scene dataset and obtained a high classification accuracy using the proposed object discovery step. Unlike other existing frameworks, the proposed algorithm maintains scalability due to the fully unsupervised object discovery step, and therefore it can easily accommodate more objects and scene categories.
Similar content being viewed by others
Notes
References
Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916
Bao SY, Sun M, Savarese S (2011) Toward coherent object detection and scene layout understanding. Image Vis Comput 29(9):569–579
Borji A, Sihite D, Itti, L (2012) Salient object detection: a benchmark. In: European conference on computer vision, lecture notes in computer science, pp 414–429
Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 517–530
Bosch A, Zisserman A, Muoz X (2007) Image classification using random forests and ferns. In: 11th international conference on computer vision, pp 1–8
Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239
Bruce N, Tsotsos J (2007) Attention based on information maximization. J Vis 7(9):950–950
Cabrerizo FJ, Moreno JM, Pérez IJ, Herrera-Viedma E (2010) Analyzing consensus approaches in fuzzy group decision making: advantages and drawbacks. Soft Comput 14(5):451–463
Cabrerizo FJ, Chiclana F, Al-Hmouz R, Morfeq A, Balamash AS, Herrera-Viedma E (2015) Fuzzy decision making and consensus: challenges. J Intell Fuzzy Syst 29(3):1109–1118
Chapelle O, Haffner P, Vapnik V (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064
Cheng MM, Zhang GX, Mitra NJ, Huang X, Hu SM (2011) Global contrast based salient region detection. In: IEEE conference on computer vision and pattern recognition, pp 409–416
Choi M, Lim J, Torralba A, Willsky A (2010) Exploiting hierarchical context on a large database of object categories. In: IEEE conference on computer vision and pattern recognition, pp 129–136
Congcong L, Kowdle A, Saxena A, Tsuhan C (2012) Toward holistic scene understanding: feedback enabled cascaded classification models. IEEE Trans Pattern Anal Mach Intell 34(7):1394–1408
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Dubey SR, Dixit P, Singh N, Gupta JP (2013) Infected fruit part detection using k-means clustering segmentation technique. Int J Interact Multimed Artif Intell 2(2):65–72
Eddins SL (2012) MATLAB R2012b documentation: morphological reconstruction
Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Gonzalez RC, Woods RE, Eddins SL (2010) Morphological reconstruction. Digital image processing using MATLAB
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. Adv Neural Inf Process Syst 19:545
Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Hou X, Zhang L (2009) Dynamic visual attention: searching for coding length increments. In: Advances in neural information processing systems, pp 681–688
Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201
Huang G, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Jia Y, Huang C, Darrell T (2012) Beyond spatial pyramids: receptive field learning for pooled image features. In: IEEE conference on computer vision and pattern recognition, pp 3370–3377
Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations, Technical Report. TR-2012-001, MIT-CSAIL
Ladicky L, Sturgess P, Alahari K, Russell C, Torr P (2010) What, where and how many? Combining object detectors and CRFS. In: European conference on computer vision, lecture notes in computer science. Springer, Berlin, pp 424–437
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE conference on computer vision and pattern recognition 2:2169–2178
Li Y, Sun J, Tang C, Shum H (2004) Lazy snapping. ACM Trans Graph (ToG) 23(3):303–308
Li L, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on computer vision and pattern recognition, pp 2036–2043
Li Y, Zhou Y, Yan J, Niu Z, Yang J (2010) Visual saliency based on conditional entropy. In: Asian conference on computer vision, lecture notes in computer Science, vol 5994, pp 246–257
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22
Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings of the international conference on image processing, vol 1, pp I–900–I–903. https://doi.org/10.1109/ICIP.2002.1038171
Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, vol 2, pp 1150–1157
Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the 8th international conference computer vision, vol 2, pp 416–423
Mottaghi R, Fidler S, Yuille A, Urtasun R, Parikh D (2016) Human-machine CRFS for identifying bottlenecks in scene understanding. IEEE Trans Pattern Anal Mach Intell 38(1):74–87
Nene S, Nayar S, Murase H et al (1996) Columbia object image library (coil-20), Technical report. Columbia University
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 490–503
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR)
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Rezazadegan Tavakoli H, Rahtu E, Heikkil J (2011) Fast and efficient saliency detection using sparse sampling and kernel density estimation. In: Heyden A, Kahl F (eds) Image analysis, lecture notes in computer science, vol 6688. Springer, Berlin, pp 666–675
Riche N, Mancas M, Duvinage M, Mibulumukini M, Gosselin B, Dutoit T (2013) Rare 2012: a multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Process Image Commun 28(6):642–658
Schyns P, Oliva A (1994) From blobs to boundary edges: evidence for time-and spatial-scale-dependent scene recognition. Psychol Sci 5(4):195
Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 567–576
Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: IEEE international conference on computer vision, pp 606–613
Vikram TN, Tscherepanow M, Wrede B (2012) A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognit 45(9):3114–3124
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE computer society conference on computer vision and pattern recognition, vol 1, pp 511–518
Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3485–3492
Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW (2008) Sun: a Bayesian framework for saliency using natural statistics. J Vis 8(7):32
Zhou B, Khosla A, Lapedriza A, Torralba A, Oliva A (2016) Places: an image database for deep scene understanding. arXiv preprint: arXiv:1610.02055
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All the authors declare that there is no conflict of interest.
Human and animal rights
This article does not contain any studies with animal or human participants performed by any of the authors.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Ramesh, B., Jian, N.L.Z., Chen, L. et al. Scalable scene understanding via saliency consensus. Soft Comput 23, 2429–2443 (2019). https://doi.org/10.1007/s00500-017-2939-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2939-2