Skip to main content
Log in

Scalable scene understanding via saliency consensus

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Given a single image, we propose a scene understanding framework that segments and categorizes the objects in the scene, and classifies the overall scene. A handful of frameworks already exist to perform these tasks coherently, but training of these models is time-consuming, thereby limiting their scalability. This paper presents a scalable framework by adopting an object-based approach, which sequentially performs unsupervised object discovery using multiple saliency detection algorithms, object segmentation by graph-cut, object classification using the bag-of-features model, and lastly, scene classification by binary decision trees. A novel region-of-interest (ROI) detector, based on morphological image processing techniques, is proposed to automatically provide object location priors from saliency maps. Additionally, for improving object discovery, multiple saliency detectors are combined using a novel method to produce the ROI map, which is then used to obtain the segmentation. We tested our system on a novel object-based scene dataset and obtained a high classification accuracy using the proposed object discovery step. Unlike other existing frameworks, the proposed algorithm maintains scalability due to the fully unsupervised object discovery step, and therefore it can easily accommodate more objects and scene categories.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://sites.google.com/site/bharathramesh03/gallery.

  2. A demo can be found at https://sites.google.com/site/bharathramesh03/research.

References

  • Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916

    Article  Google Scholar 

  • Bao SY, Sun M, Savarese S (2011) Toward coherent object detection and scene layout understanding. Image Vis Comput 29(9):569–579

    Article  Google Scholar 

  • Borji A, Sihite D, Itti, L (2012) Salient object detection: a benchmark. In: European conference on computer vision, lecture notes in computer science, pp 414–429

  • Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 517–530

  • Bosch A, Zisserman A, Muoz X (2007) Image classification using random forests and ferns. In: 11th international conference on computer vision, pp 1–8

  • Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239

    Article  Google Scholar 

  • Bruce N, Tsotsos J (2007) Attention based on information maximization. J Vis 7(9):950–950

    Article  Google Scholar 

  • Cabrerizo FJ, Moreno JM, Pérez IJ, Herrera-Viedma E (2010) Analyzing consensus approaches in fuzzy group decision making: advantages and drawbacks. Soft Comput 14(5):451–463

    Article  Google Scholar 

  • Cabrerizo FJ, Chiclana F, Al-Hmouz R, Morfeq A, Balamash AS, Herrera-Viedma E (2015) Fuzzy decision making and consensus: challenges. J Intell Fuzzy Syst 29(3):1109–1118

    Article  MathSciNet  MATH  Google Scholar 

  • Chapelle O, Haffner P, Vapnik V (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064

    Article  Google Scholar 

  • Cheng MM, Zhang GX, Mitra NJ, Huang X, Hu SM (2011) Global contrast based salient region detection. In: IEEE conference on computer vision and pattern recognition, pp 409–416

  • Choi M, Lim J, Torralba A, Willsky A (2010) Exploiting hierarchical context on a large database of object categories. In: IEEE conference on computer vision and pattern recognition, pp 129–136

  • Congcong L, Kowdle A, Saxena A, Tsuhan C (2012) Toward holistic scene understanding: feedback enabled cascaded classification models. IEEE Trans Pattern Anal Mach Intell 34(7):1394–1408

    Article  Google Scholar 

  • Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22

  • Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255

  • Dubey SR, Dixit P, Singh N, Gupta JP (2013) Infected fruit part detection using k-means clustering segmentation technique. Int J Interact Multimed Artif Intell 2(2):65–72

    Google Scholar 

  • Eddins SL (2012) MATLAB R2012b documentation: morphological reconstruction

  • Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70

    Article  Google Scholar 

  • Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  • Gonzalez RC, Woods RE, Eddins SL (2010) Morphological reconstruction. Digital image processing using MATLAB

  • Harel J, Koch C, Perona P (2007) Graph-based visual saliency. Adv Neural Inf Process Syst 19:545

    Google Scholar 

  • Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: IEEE conference on computer vision and pattern recognition, pp 1–8

  • Hou X, Zhang L (2009) Dynamic visual attention: searching for coding length increments. In: Advances in neural information processing systems, pp 681–688

  • Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201

    Article  Google Scholar 

  • Huang G, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529

    Article  Google Scholar 

  • Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259

    Article  Google Scholar 

  • Jia Y, Huang C, Darrell T (2012) Beyond spatial pyramids: receptive field learning for pooled image features. In: IEEE conference on computer vision and pattern recognition, pp 3370–3377

  • Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations, Technical Report. TR-2012-001, MIT-CSAIL

  • Ladicky L, Sturgess P, Alahari K, Russell C, Torr P (2010) What, where and how many? Combining object detectors and CRFS. In: European conference on computer vision, lecture notes in computer science. Springer, Berlin, pp 424–437

  • Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE conference on computer vision and pattern recognition 2:2169–2178

    Google Scholar 

  • Li Y, Sun J, Tang C, Shum H (2004) Lazy snapping. ACM Trans Graph (ToG) 23(3):303–308

    Article  Google Scholar 

  • Li L, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on computer vision and pattern recognition, pp 2036–2043

  • Li Y, Zhou Y, Yan J, Niu Z, Yang J (2010) Visual saliency based on conditional entropy. In: Asian conference on computer vision, lecture notes in computer Science, vol 5994, pp 246–257

  • Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22

    Google Scholar 

  • Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings of the international conference on image processing, vol 1, pp I–900–I–903. https://doi.org/10.1109/ICIP.2002.1038171

  • Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, vol 2, pp 1150–1157

  • Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the 8th international conference computer vision, vol 2, pp 416–423

  • Mottaghi R, Fidler S, Yuille A, Urtasun R, Parikh D (2016) Human-machine CRFS for identifying bottlenecks in scene understanding. IEEE Trans Pattern Anal Mach Intell 38(1):74–87

    Article  Google Scholar 

  • Nene S, Nayar S, Murase H et al (1996) Columbia object image library (coil-20), Technical report. Columbia University

  • Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 490–503

  • Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  • Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR)

  • Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  • Rezazadegan Tavakoli H, Rahtu E, Heikkil J (2011) Fast and efficient saliency detection using sparse sampling and kernel density estimation. In: Heyden A, Kahl F (eds) Image analysis, lecture notes in computer science, vol 6688. Springer, Berlin, pp 666–675

  • Riche N, Mancas M, Duvinage M, Mibulumukini M, Gosselin B, Dutoit T (2013) Rare 2012: a multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Process Image Commun 28(6):642–658

    Article  Google Scholar 

  • Schyns P, Oliva A (1994) From blobs to boundary edges: evidence for time-and spatial-scale-dependent scene recognition. Psychol Sci 5(4):195

    Article  Google Scholar 

  • Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298

    Article  Google Scholar 

  • Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 567–576

  • Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: IEEE international conference on computer vision, pp 606–613

  • Vikram TN, Tscherepanow M, Wrede B (2012) A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognit 45(9):3114–3124

    Article  Google Scholar 

  • Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE computer society conference on computer vision and pattern recognition, vol 1, pp 511–518

  • Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3485–3492

  • Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW (2008) Sun: a Bayesian framework for saliency using natural statistics. J Vis 8(7):32

    Article  Google Scholar 

  • Zhou B, Khosla A, Lapedriza A, Torralba A, Oliva A (2016) Places: an image database for deep scene understanding. arXiv preprint: arXiv:1610.02055

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bharath Ramesh.

Ethics declarations

Conflict of interest

All the authors declare that there is no conflict of interest.

Human and animal rights

This article does not contain any studies with animal or human participants performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramesh, B., Jian, N.L.Z., Chen, L. et al. Scalable scene understanding via saliency consensus. Soft Comput 23, 2429–2443 (2019). https://doi.org/10.1007/s00500-017-2939-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2939-2

Keywords

Navigation