Abstract
Stereoscopic images have become more and more prevalent following the rapid advances in 3D capturing and display techniques. However, there has been little research on visual content analysis for stereoscopic images. In this paper, we address the challenging problem of object detection and classification for stereoscopic images. An iterative method that can mutually boost salient object detection and object classification is proposed for stereoscopic images. This method includes two steps. In the first step, a 3D saliency detection method, which includes the contrastive and occlusion cues contained in each stereoscopic image pair along with the discriminative features provided by the SVM classifier, is proposed to localize object of interest in the stereoscopic images. In the second step, the bag of word features of foreground and background is pooled by using the localization information, and then is applied to train the SVM classifier. Each of the two steps benefits from the gradual improvement result in the other, no matter in the training or the testing process. To evaluate the performance of our approach, a 6-object class dataset of stereoscopic images real objects viewed under general lighting conditions, poses and viewpoints is set up. Our experimental results on the dataset, for object localization and object classification, demonstrate the effectiveness of the method.








Similar content being viewed by others
References
Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Bilen H, Namboodiri VP, Gool LJV (2011) Object and action classification with latent variables. In: British machine vision conference (BMVC)
Bruce N, Tsotsos J (2005) An attentional framework for stereo vision. In: Proceedings of the Canadian conference on computer and robot vision
Bruce N, Tsotsos J (2006) Saliency based on information maximization. In: Advances in neural information processing systems (NIPS), vol. 18, p. 155–162
Chai Y, Lempitsky V, Zisserman A (2011) Bicos: A bi-level co-segmentation method for image classification. In: IEEE international conference on computer vision
Chamaret C, Godeffroy S, Lopez P, Meur OL (2010) Adaptive 3d rendering based on region-of-interest. In: Proceedings of SPIE
Cheng M, Zhang G, Mitra N, Huang X, Hu S (2011) Global contrast based salient region detection. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Delaitre V, Laptev I, Sivic J (2010) Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: British Machine vision conference (BMVC)
Gao D, Han S, Vasconcelos N (2009) Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transcations on Pattern Anal Machine Intell (PAMI) 31(6):989–1005
He K, Sun J, Tang X (2010) Guided image filtering. In: The European conference on computer vision (ECCV)
Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Transcations Pattern Anal Machine Intell (PAMI) 20:1254–1259
Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurbiology 4:219–227
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Li F, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Liu W, Tao D (2013) Multiview hessian regularization for image annotation. IEEE Trans Image Process 22:2676–2687
Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118:50–60
Mai L, Niu Y, Liu F (2013) Saliency aggregation: a data-driven approach. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Maki A, Nordlund P, Eklundh J (1996) A computational model of depth-based attention. In: proceedings of the international conference on pattern recognition
Murphy K, Torralba A, Eaton D, Freeman W (2006) Object detection and localization using local and global features. In: Toward category-level object recognition, springer berlin heidelberg
Murray N, Vanrell M, Otazu X, Parraga CA (2011) Saliency estimation using a non-parametric low level vision model. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Nguyen M H, Torresani L, de la Torre F, Rother C (2009) Weakly supervised discriminative localization and classification: a joint learning process. In: IEEE International conference on computer vision
Niu Y, Geng Y, Li X (2012) Leveraging stereopsis for saliency analysis. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Ouerhani N, Hugli H (2000) Computing visual attention from scene depth. In: Proceedings of the international conference on pattern recognition
Potapova E, Zillich M, Vincze M (2011) Learning what matters: combining probabilistic models of 2d and 3d saliency cues. Comput Vis Syst:132–142
Rapantzikos K, Avrithis Y, Kollias S (2009) Dense saliency-based spationtemporal feature points for action recognition. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR)
Reynolds J, Desimone R (2003) Interacting roles of attention and visual salience in v4, vol 37, pp 853–863
Rhemann C, Hosni A, Bleyer M, Rother C, Gelautz M (2011) Fast cost-volume filtering for visual correspondence and beyond. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Russakovsky O, Lin Y, Yu K, Fei-Fei L (2012) Object-centric spatial pooling for image classification. In: The European conference on computer vision (ECCV)
Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: IEEE international conference on computer vision (ICCV)
Tatler B, Baddeley R, Gilchrist I (2005) Visual correlates of fixation selection: effects of scale and time. Vis Res 45:643–659
van Zoest W, Donk M (2004) Bottom-up and top-down control in visual search, vol 33. PERCEPTION LONDON, pp 927–938
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR)
Wolfe JM, Horowitz TS (2004) What attributes guide the deployment of visual attention and how do they do it? Nat Rev Neurosci 5:1–7
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Yao B, Khosla A, Li F (2011) Combining randomization and discrimination for fine-grained image categorization. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Zhai Y, Shah M (2006) Visual attention detection in video sequences using spatiotemporal cues. ACM Trans Multimed:815–824
Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW (2008) Sun: a Bayesian framework for saliency using natural statistics. Journal of Vision 8(7):1–20
Zhang Y, Jiang G, Yu M, Chen K (2010) Stereoscopic visual attention model for 3d video. Adv Multimed Model:314–324
Zha Z-J, Wang M, Zheng Y-T, Yang Y, Hong R, Chua T-S (2012) Interactive video indexing with statistical active learning. IEEE Trans Multimed 14(1):17–27
Zha Z-J, Zhang H, et al (2013) Detecting Group Activities with Multi-Camera Context. IEEE transactions on circuits and systems for video technologies 23(5):856–869
Zha Z-J, Yang Y, Tang J, Wang M, Chua T-S (2014) Robust multi-view feature learning for RGB-D image understanding, ACM transactions on intelligent systems and technology
Acknowledgments
We would like to thanks the Flickr users and the NVIDIA 3D Vision Live sharers for their sharing photos. We also would like to thank Yuzhen Niu, Yujie Geng, Xueqing Li and Feng Liu for they providing the website links.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kang, K., Cao, Y., Zhang, J. et al. Salient object detection and classification for stereoscopic images. Multimed Tools Appl 75, 1443–1457 (2016). https://doi.org/10.1007/s11042-014-2142-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2142-8