Skip to main content
Log in

PerSeg : segmenting salient objects from bag of single image perturbations

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Salient object segmentation is an important computer vision problem having applications in numerous areas such as video surveillance, scene parsing, autonomous navigation etc. For images, this task is quite challenging due to clutter/texture present in the background, low resolution and/or low contrast of the object(s) of interest etc. In case of videos, additional issues such as object deformation, camera motion and presence of multiple moving objects make the foreground object segmentation a significantly difficult and open problem. However, motion pattern can also act as an important cue to identify the foreground objects against the background. This is exploited by the recent approaches via aggregation of temporally perturbed information from a series of consecutive frames. Unfortunately for images, this additional cue is not available. In this paper, we propose to emulate the effect of such perturbations by constructing a bag of multiple augmentations applied on a single input image. Saliency features are estimated independently from each perturbed image in this bag, which are further combined using a novel aggregation strategy based on a convolutional gated recurrent encoder-decoder unit. Through extensive experiments on the benchmark datasets, we show better or very competitive performance when compared with the state-of-the-art methods. We further observe that even with a bag constructed using simple affine transformations, we achieve impressive performances, proving the robustness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: 2009. cvpr 2009. IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1597–1604

  2. Baradel F, Wolf C, Mille J, Taylor GW (2018) Glimpse clouds: Human activity recognition from unstructured feature points. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 469–478

  3. Batra D, Kowdle A, Parikh D, Luo J, Chen T (2010) icoseg: Interactive co-segmentation with intelligent scribble guidance. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 3169–3176

  4. Borji A, Sihite DN, Itti L (2012) Salient object detection: a benchmark. In: Computer Vision–ECCV 2012, Springer, pp 414–429

  5. Caelles S, Maninis KK, Pont-Tuset J, Leal-Taixé L, Cremers D, Van Gool L (2017) One-shot video object segmentation. In: CVPR 2017. IEEE

  6. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 801–818

    Chapter  Google Scholar 

  7. Cheng J, Tsai YH, Wang S, Yang MH (2017) Segflow: Joint learning for video object segmentation and optical flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 686–695

  8. Cheng MM, Hou QB, Zhang SH, Rosin PL (2017) Intelligent visual media processing: When graphics meets vision. J Comput Sci Technol 32(1):110–121

    Article  Google Scholar 

  9. Cheng MM, Zhang FL, Mitra NJ, Huang X, Hu SM (2010) Repfinder: Finding approximately repeated scene elements for image editing. In: ACM Transactions on Graphics (TOG), vol 29. ACM, p 83

  10. Craye C, Filliat D, Goudou JF (2016) Environment exploration for object-based visual saliency learning. In: 2016 IEEE International Conference on Robotics and automation (ICRA), IEEE, pp 2303–2309

  11. Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4690–4699

  12. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2012) The PASCAL Visual Object Classes Challenge (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html

  13. Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. In: Proceedings of the British Machine Vision Conference, BMVA Press

  14. Gao D, Vasconcelos N (2005) Discriminant saliency for visual recognition from cluttered scenes. In: Advances in Neural Information Processing Systems, pp 481–488

  15. Gao Y, Wang M, Tao D, Ji R, Dai Q (2012) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Process 21(9):4290–4303

    Article  MathSciNet  Google Scholar 

  16. Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7036–7045

  17. Guo C, Zhang L (2010) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans Image Process 19(1):185–198

    Article  MathSciNet  Google Scholar 

  18. He J, Feng J, Liu X, Cheng T, Lin TH, Chung H, Chang SF (2012) Mobile product search with bag of hash bits and boundary reranking. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 3005–3012

  19. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670

    MathSciNet  MATH  Google Scholar 

  20. Hou Q, Cheng MM, Hu XW, Borji A, Tu Z, Torr P (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  21. Itti L (2004) Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans Image Process 13(10):1304–1318

    Article  Google Scholar 

  22. Jain SD, Xiong B, Grauman K (2017) Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  23. Jampani V, Gadde R, Gehler PV (2017) Video propagation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  24. Jang WD, Kim CS (2017) Online video object segmentation via convolutional trident network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5849– 5858

  25. Jiang M, Huang S, Duan J, Zhao Q (2015) Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1072–1080

  26. Keuper M, Andres B, Brox T (2015) Motion trajectory segmentation via minimum cost multicuts. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3271–3279

  27. Khoreva A, Perazzi F, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: 30Th International Conference on Computer Vision and Pattern Recognition

  28. Koh YJ, Kim CS (2017) Primary object segmentation in videos based on region augmentation and reduction. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 7417–7425

  29. Krähenbühl P, Koltun V (2011) Efficient inference in fully connected crfs with gaussian edge potentials. In: Advances in Neural Information Processing Systems, pp 109–117

  30. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105

  31. Kruthiventi SS, Gudisa V, Dholakiya JH, Venkatesh Babu R (2016) Saliency unified: a deep architecture for simultaneous eye fixation prediction and salient object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5781–5790

  32. Kuen J, Wang Z, Wang G (2016) Recurrent attentional networks for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3668–3677

  33. Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5455–5463

  34. Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 478–487

  35. Liu N, Han J (2016) Dhsnet: Deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 678–686

  36. Liu N, Han J, Zhang D, Wen S, Liu T (2015) Predicting eye fixations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 362–370

  37. Luo Z, Mishra A, Achkar A, Eichel J, Li S, Jodoin PM (2017) Non-local deep features for salient object detection. In: IEEE CVPR

  38. Märki N, Perazzi F, Wang O, Sorkine-Hornung A (2016) Bilateral space video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 743–751

  39. Pan J, Sayrol E, Giro-i Nieto X, McGuinness K, O’Connor NE (2016) Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 598–606

  40. Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1777–1784

  41. Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Computer Vision and Pattern Recognition

  42. Pinheiro PO, Lin TY, Collobert R, Dollár P (2016) Learning to refine object segments. In: European Conference on Computer Vision, Springer, pp 75–91

  43. Qin Y, Lu H, Xu Y, Wang H (2015) Saliency detection via cellular automata. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 110–119

  44. Setlur V, Takagi S, Raskar R, Gleicher M, Gooch B (2005) Automatic image retargeting. In: Proceedings of the 4th international conference on Mobile and ubiquitous multimedia, ACM, pp 59–68

  45. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  46. Tokmakov P, Alahari K, Schmid C (2017) Learning motion patterns in videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  47. Tokmakov P, Alahari K, Schmid C (2017) Learning video object segmentation with visual memory. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  48. Tsai YH, Yang MH, Black MJ (2016) Video segmentation via object flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3899–3908

  49. Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for video object segmentation. British Machine Vision Conference

  50. Wang L, Wang L, Lu H, Zhang P, Ruan X (2019) Salient object detection with recurrent fully convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(7):1734–1746

    Article  Google Scholar 

  51. Wang Q, Gao J, Li X (2019) Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes. IEEE Transactions on Image Processing

  52. Wang Q, Lin J, Yuan Y (2016) Salient band selection for hyperspectral image classification via manifold ranking. IEEE Transactions on Neural Networks and Learning Systems 27(6):1279–1289

    Article  Google Scholar 

  53. Wang T, Borji A, Zhang L, Zhang P, Lu H (2017) A stagewise refinement model for detecting salient objects in images. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4019–4028

  54. Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403

  55. Yan Q, Xu L, Shi J, Jia J (2013) Hierarchical saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1155–1162

  56. Yang C, Zhang L, Lu H, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp s3166–3173

  57. Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cyber 47(12):4014–4024

    Article  Google Scholar 

  58. Yu J, Zhang B, Kuang Z, Lin D, Fan J (2016) iprivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Foren Sec 12(5):1005–1016

    Article  Google Scholar 

  59. Zhang P, Wang D, Lu H, Wang H, Ruan X (2017) Amulet: Aggregating multi-level convolutional features for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 202–211

  60. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2018) M2det: A single-shot object detector based on multi-level feature pyramid network

  61. Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1265–1274

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avishek Majumder.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Majumder, A., Babu, R.V. & Chakraborty, A. PerSeg : segmenting salient objects from bag of single image perturbations. Multimed Tools Appl 79, 2473–2493 (2020). https://doi.org/10.1007/s11042-019-08388-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08388-1

Keywords

Navigation