Abstract
Salient object segmentation is an important computer vision problem having applications in numerous areas such as video surveillance, scene parsing, autonomous navigation etc. For images, this task is quite challenging due to clutter/texture present in the background, low resolution and/or low contrast of the object(s) of interest etc. In case of videos, additional issues such as object deformation, camera motion and presence of multiple moving objects make the foreground object segmentation a significantly difficult and open problem. However, motion pattern can also act as an important cue to identify the foreground objects against the background. This is exploited by the recent approaches via aggregation of temporally perturbed information from a series of consecutive frames. Unfortunately for images, this additional cue is not available. In this paper, we propose to emulate the effect of such perturbations by constructing a bag of multiple augmentations applied on a single input image. Saliency features are estimated independently from each perturbed image in this bag, which are further combined using a novel aggregation strategy based on a convolutional gated recurrent encoder-decoder unit. Through extensive experiments on the benchmark datasets, we show better or very competitive performance when compared with the state-of-the-art methods. We further observe that even with a bag constructed using simple affine transformations, we achieve impressive performances, proving the robustness of the proposed framework.
Similar content being viewed by others
References
Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: 2009. cvpr 2009. IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1597–1604
Baradel F, Wolf C, Mille J, Taylor GW (2018) Glimpse clouds: Human activity recognition from unstructured feature points. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 469–478
Batra D, Kowdle A, Parikh D, Luo J, Chen T (2010) icoseg: Interactive co-segmentation with intelligent scribble guidance. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 3169–3176
Borji A, Sihite DN, Itti L (2012) Salient object detection: a benchmark. In: Computer Vision–ECCV 2012, Springer, pp 414–429
Caelles S, Maninis KK, Pont-Tuset J, Leal-Taixé L, Cremers D, Van Gool L (2017) One-shot video object segmentation. In: CVPR 2017. IEEE
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 801–818
Cheng J, Tsai YH, Wang S, Yang MH (2017) Segflow: Joint learning for video object segmentation and optical flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 686–695
Cheng MM, Hou QB, Zhang SH, Rosin PL (2017) Intelligent visual media processing: When graphics meets vision. J Comput Sci Technol 32(1):110–121
Cheng MM, Zhang FL, Mitra NJ, Huang X, Hu SM (2010) Repfinder: Finding approximately repeated scene elements for image editing. In: ACM Transactions on Graphics (TOG), vol 29. ACM, p 83
Craye C, Filliat D, Goudou JF (2016) Environment exploration for object-based visual saliency learning. In: 2016 IEEE International Conference on Robotics and automation (ICRA), IEEE, pp 2303–2309
Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4690–4699
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2012) The PASCAL Visual Object Classes Challenge (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. In: Proceedings of the British Machine Vision Conference, BMVA Press
Gao D, Vasconcelos N (2005) Discriminant saliency for visual recognition from cluttered scenes. In: Advances in Neural Information Processing Systems, pp 481–488
Gao Y, Wang M, Tao D, Ji R, Dai Q (2012) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Process 21(9):4290–4303
Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7036–7045
Guo C, Zhang L (2010) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans Image Process 19(1):185–198
He J, Feng J, Liu X, Cheng T, Lin TH, Chung H, Chang SF (2012) Mobile product search with bag of hash bits and boundary reranking. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 3005–3012
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Hou Q, Cheng MM, Hu XW, Borji A, Tu Z, Torr P (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Itti L (2004) Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans Image Process 13(10):1304–1318
Jain SD, Xiong B, Grauman K (2017) Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Jampani V, Gadde R, Gehler PV (2017) Video propagation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Jang WD, Kim CS (2017) Online video object segmentation via convolutional trident network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5849– 5858
Jiang M, Huang S, Duan J, Zhao Q (2015) Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1072–1080
Keuper M, Andres B, Brox T (2015) Motion trajectory segmentation via minimum cost multicuts. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3271–3279
Khoreva A, Perazzi F, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: 30Th International Conference on Computer Vision and Pattern Recognition
Koh YJ, Kim CS (2017) Primary object segmentation in videos based on region augmentation and reduction. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 7417–7425
Krähenbühl P, Koltun V (2011) Efficient inference in fully connected crfs with gaussian edge potentials. In: Advances in Neural Information Processing Systems, pp 109–117
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105
Kruthiventi SS, Gudisa V, Dholakiya JH, Venkatesh Babu R (2016) Saliency unified: a deep architecture for simultaneous eye fixation prediction and salient object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5781–5790
Kuen J, Wang Z, Wang G (2016) Recurrent attentional networks for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3668–3677
Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5455–5463
Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 478–487
Liu N, Han J (2016) Dhsnet: Deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 678–686
Liu N, Han J, Zhang D, Wen S, Liu T (2015) Predicting eye fixations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 362–370
Luo Z, Mishra A, Achkar A, Eichel J, Li S, Jodoin PM (2017) Non-local deep features for salient object detection. In: IEEE CVPR
Märki N, Perazzi F, Wang O, Sorkine-Hornung A (2016) Bilateral space video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 743–751
Pan J, Sayrol E, Giro-i Nieto X, McGuinness K, O’Connor NE (2016) Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 598–606
Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1777–1784
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Computer Vision and Pattern Recognition
Pinheiro PO, Lin TY, Collobert R, Dollár P (2016) Learning to refine object segments. In: European Conference on Computer Vision, Springer, pp 75–91
Qin Y, Lu H, Xu Y, Wang H (2015) Saliency detection via cellular automata. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 110–119
Setlur V, Takagi S, Raskar R, Gleicher M, Gooch B (2005) Automatic image retargeting. In: Proceedings of the 4th international conference on Mobile and ubiquitous multimedia, ACM, pp 59–68
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Tokmakov P, Alahari K, Schmid C (2017) Learning motion patterns in videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Tokmakov P, Alahari K, Schmid C (2017) Learning video object segmentation with visual memory. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Tsai YH, Yang MH, Black MJ (2016) Video segmentation via object flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3899–3908
Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for video object segmentation. British Machine Vision Conference
Wang L, Wang L, Lu H, Zhang P, Ruan X (2019) Salient object detection with recurrent fully convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(7):1734–1746
Wang Q, Gao J, Li X (2019) Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes. IEEE Transactions on Image Processing
Wang Q, Lin J, Yuan Y (2016) Salient band selection for hyperspectral image classification via manifold ranking. IEEE Transactions on Neural Networks and Learning Systems 27(6):1279–1289
Wang T, Borji A, Zhang L, Zhang P, Lu H (2017) A stagewise refinement model for detecting salient objects in images. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4019–4028
Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403
Yan Q, Xu L, Shi J, Jia J (2013) Hierarchical saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1155–1162
Yang C, Zhang L, Lu H, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp s3166–3173
Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cyber 47(12):4014–4024
Yu J, Zhang B, Kuang Z, Lin D, Fan J (2016) iprivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Foren Sec 12(5):1005–1016
Zhang P, Wang D, Lu H, Wang H, Ruan X (2017) Amulet: Aggregating multi-level convolutional features for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 202–211
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2018) M2det: A single-shot object detector based on multi-level feature pyramid network
Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1265–1274
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Majumder, A., Babu, R.V. & Chakraborty, A. PerSeg : segmenting salient objects from bag of single image perturbations. Multimed Tools Appl 79, 2473–2493 (2020). https://doi.org/10.1007/s11042-019-08388-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08388-1