Skip to main content
Log in

A semi-supervised recurrent neural network for video salient object detection

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

A semi-supervised, one-dimensional recurrent neural network (RNN) approach called RVS has been proposed in this paper for video salient object detection. The proposed RVS approach involves the processing of each frame independently without explicitly considering temporal information. The RNN is trained using one-dimensional superpixel features to classify the salient object regions into salient foreground and non-salient background superpixels. Deep learning algorithms generally exhibit heavy dependence on training data size and often take extremely long time for training. On the contrary, the proposed RVS approach involves the training of an RNN using a small data which results in significant reduction in training time. The RVS approach has been extensively evaluated and its results are compared with those of several state-of-the-art methods using the public-domain VideoSeg, SegTrack v1 and SegTrack v2 benchmark video datasets. Further, the RVS approach has been tested using the authors’ own video dataset and the complex DAVIS and video object segmentation datasets to evaluate the impact of motion and blur on its performance. The RVS approach delivers results superior to those of several approaches that strongly rely upon spatio-temporal features in detecting the salient objects from the video sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Abadi M et al (2016) TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conference on operating systems design and implementation. USENIX Association, Berkeley, CA, USA, pp 265–283

  2. Achanta R et al (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282

    Article  Google Scholar 

  3. Borji A et al (2014) Salient object detection: a survey. arXiv preprint arXiv:1411.5878

  4. Chollet F et al (2015) Keras: deep learning library for Theano and TensorFlow. https://www.keras.io/k7(8)

  5. Fukuchi K et al (2009) Saliency-based video segmentation with graph cuts and sequentially updated priors. In: Proceedings of the IEEE international conference on multimedia and expo (ICME), pp 638–641

  6. Goferman S, Zelnik-Manor L, Tal A (2012) Context-aware saliency detection. IEEE Trans Pattern Anal Mach Intell 34(10):1915–1926

    Article  Google Scholar 

  7. Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: Proceedings of the IEEE conference on acoustics, speech and signal processing (ICASSP), pp 6645–6649

  8. Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in neural information processing systems, pp 545–552

  9. He S et al (2015) SuperCNN: a superpixelwise convolutional neural network for salient object detection. Int J Comput Vis (IJCV) 115(3):330–344

    Article  MathSciNet  Google Scholar 

  10. Hu YT, Huang JB, Schwing AG (2018) Videomatch: matching based video object segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp. 54–70

  11. Itti L, Baldi P (2005) A principled approach to detecting surprising events in video. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), vol 1, pp 631–637

  12. Jiang H et al (2011) Automatic salient object segmentation based on context and shape prior. In: Proceedings of the British machine vision conference (BMVC), p 9

  13. Khoreva A et al (2017) Lucid data dreaming for object tracking. In: The DAVIS challenge on video object segmentation

  14. Kompella A, Kulkarni RV (2018) Weakly supervised recurrent neural network for video segmentation. In: PhD forum 2018—Proceedings of the 24th international conference on advanced computing and communications (ADCOM), pp 123–126

  15. Le TN, Sugimoto A (2018) Video salient object detection using spatiotemporal deep features. Proc IEEE Trans Image Process 27(10):5002–5015

    Article  MathSciNet  Google Scholar 

  16. Lee G, Tai YW, Kim J (2016) Deep saliency with encoded low level distance map and high level features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 660–668

  17. Li F et al (2013) Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2192–2199

  18. Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5455–5463

  19. Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 478–487

  20. Li J, Xia C, Chen X (2018) A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection. IEEE Trans Image Process 27(1):349–364

    Article  MathSciNet  Google Scholar 

  21. Li J et al (2015) Spatiotemporal saliency detection based on superpixel-level trajectory. Sig Process Image Commun 38:100–114

    Article  Google Scholar 

  22. Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101

  23. Liu Z et al (2017) Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans Circuits Syst Video Technol 27(12):2527–2542

    Article  Google Scholar 

  24. Mancas M et al (2011) Abnormal motion selection in crowds using bottom-up saliency. In: Proceedings of the IEEE 18th international conference on image processing (ICIP), pp 229–232

  25. Maninis KK et al (2018) Video object segmentation without temporal information. IEEE Trans Pattern Anal Mach Intell 41(6):1515–1530

    Article  Google Scholar 

  26. Perazzi F et al (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 724–732

  27. Perazzi F et al (2017) Learning video object segmentation from static images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2663–2672

  28. Rahtu E et al (2010) Segmenting salient objects from images and videos. In: Proceedings of the European conference on computer vision (ECCV) 2010. Springer, pp 366–379

  29. Russakovsky O et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  30. Shin Yoon J et al (2017) Pixel-level matching for video object segmentation using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 2167–2176

  31. Singh A et al (2015) Learning to predict video saliency using temporal superpixels. In: International conference on pattern recognition applications and methods (2), pp 201–209

  32. Szegedy C et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  33. Tang Y, Zou W, Jin Z, Chen Y, Hua Y, Li X (2019) Weakly supervised salient object detection with spatiotemporal cascade neural networks. IEEE Trans Circuits Syst Video Technol 29(7):1973–1984

    Article  Google Scholar 

  34. Tsai D et al (2012) Motion coherent tracking using multi-label MRF optimization. Int J Comput Vis (IJCV) 100(2):190–202

    Article  MathSciNet  Google Scholar 

  35. Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for video object segmentation. arXiv preprint arXiv:1706.09364

  36. Wang L et al (2015) Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3183–3192

  37. Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196

    Article  MathSciNet  Google Scholar 

  38. Wang W et al (2018) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intell 40(1):20–33

    Article  Google Scholar 

  39. Yao R et al (2019) Video object segmentation and tracking: a survey. arXiv preprint arXiv:1904.09172

  40. Zhao R et al (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1265–1274

Download references

Acknowledgements

Authors gratefully acknowledge the support received from M S Ramaiah University of Applied Sciences, Bengaluru, India. They also thank the anonymous reviewers of this paper for their constructive criticism.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditya Kompella.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1421 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kompella, A., Kulkarni, R.V. A semi-supervised recurrent neural network for video salient object detection. Neural Comput & Applic 33, 2065–2083 (2021). https://doi.org/10.1007/s00521-020-05081-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05081-5

Keywords

Navigation