Abstract
A semi-supervised, one-dimensional recurrent neural network (RNN) approach called RVS has been proposed in this paper for video salient object detection. The proposed RVS approach involves the processing of each frame independently without explicitly considering temporal information. The RNN is trained using one-dimensional superpixel features to classify the salient object regions into salient foreground and non-salient background superpixels. Deep learning algorithms generally exhibit heavy dependence on training data size and often take extremely long time for training. On the contrary, the proposed RVS approach involves the training of an RNN using a small data which results in significant reduction in training time. The RVS approach has been extensively evaluated and its results are compared with those of several state-of-the-art methods using the public-domain VideoSeg, SegTrack v1 and SegTrack v2 benchmark video datasets. Further, the RVS approach has been tested using the authors’ own video dataset and the complex DAVIS and video object segmentation datasets to evaluate the impact of motion and blur on its performance. The RVS approach delivers results superior to those of several approaches that strongly rely upon spatio-temporal features in detecting the salient objects from the video sequences.
Similar content being viewed by others
References
Abadi M et al (2016) TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conference on operating systems design and implementation. USENIX Association, Berkeley, CA, USA, pp 265–283
Achanta R et al (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282
Borji A et al (2014) Salient object detection: a survey. arXiv preprint arXiv:1411.5878
Chollet F et al (2015) Keras: deep learning library for Theano and TensorFlow. https://www.keras.io/k7(8)
Fukuchi K et al (2009) Saliency-based video segmentation with graph cuts and sequentially updated priors. In: Proceedings of the IEEE international conference on multimedia and expo (ICME), pp 638–641
Goferman S, Zelnik-Manor L, Tal A (2012) Context-aware saliency detection. IEEE Trans Pattern Anal Mach Intell 34(10):1915–1926
Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: Proceedings of the IEEE conference on acoustics, speech and signal processing (ICASSP), pp 6645–6649
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in neural information processing systems, pp 545–552
He S et al (2015) SuperCNN: a superpixelwise convolutional neural network for salient object detection. Int J Comput Vis (IJCV) 115(3):330–344
Hu YT, Huang JB, Schwing AG (2018) Videomatch: matching based video object segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp. 54–70
Itti L, Baldi P (2005) A principled approach to detecting surprising events in video. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), vol 1, pp 631–637
Jiang H et al (2011) Automatic salient object segmentation based on context and shape prior. In: Proceedings of the British machine vision conference (BMVC), p 9
Khoreva A et al (2017) Lucid data dreaming for object tracking. In: The DAVIS challenge on video object segmentation
Kompella A, Kulkarni RV (2018) Weakly supervised recurrent neural network for video segmentation. In: PhD forum 2018—Proceedings of the 24th international conference on advanced computing and communications (ADCOM), pp 123–126
Le TN, Sugimoto A (2018) Video salient object detection using spatiotemporal deep features. Proc IEEE Trans Image Process 27(10):5002–5015
Lee G, Tai YW, Kim J (2016) Deep saliency with encoded low level distance map and high level features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 660–668
Li F et al (2013) Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2192–2199
Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5455–5463
Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 478–487
Li J, Xia C, Chen X (2018) A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection. IEEE Trans Image Process 27(1):349–364
Li J et al (2015) Spatiotemporal saliency detection based on superpixel-level trajectory. Sig Process Image Commun 38:100–114
Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101
Liu Z et al (2017) Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans Circuits Syst Video Technol 27(12):2527–2542
Mancas M et al (2011) Abnormal motion selection in crowds using bottom-up saliency. In: Proceedings of the IEEE 18th international conference on image processing (ICIP), pp 229–232
Maninis KK et al (2018) Video object segmentation without temporal information. IEEE Trans Pattern Anal Mach Intell 41(6):1515–1530
Perazzi F et al (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 724–732
Perazzi F et al (2017) Learning video object segmentation from static images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2663–2672
Rahtu E et al (2010) Segmenting salient objects from images and videos. In: Proceedings of the European conference on computer vision (ECCV) 2010. Springer, pp 366–379
Russakovsky O et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Shin Yoon J et al (2017) Pixel-level matching for video object segmentation using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 2167–2176
Singh A et al (2015) Learning to predict video saliency using temporal superpixels. In: International conference on pattern recognition applications and methods (2), pp 201–209
Szegedy C et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tang Y, Zou W, Jin Z, Chen Y, Hua Y, Li X (2019) Weakly supervised salient object detection with spatiotemporal cascade neural networks. IEEE Trans Circuits Syst Video Technol 29(7):1973–1984
Tsai D et al (2012) Motion coherent tracking using multi-label MRF optimization. Int J Comput Vis (IJCV) 100(2):190–202
Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for video object segmentation. arXiv preprint arXiv:1706.09364
Wang L et al (2015) Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3183–3192
Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196
Wang W et al (2018) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intell 40(1):20–33
Yao R et al (2019) Video object segmentation and tracking: a survey. arXiv preprint arXiv:1904.09172
Zhao R et al (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1265–1274
Acknowledgements
Authors gratefully acknowledge the support received from M S Ramaiah University of Applied Sciences, Bengaluru, India. They also thank the anonymous reviewers of this paper for their constructive criticism.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Kompella, A., Kulkarni, R.V. A semi-supervised recurrent neural network for video salient object detection. Neural Comput & Applic 33, 2065–2083 (2021). https://doi.org/10.1007/s00521-020-05081-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05081-5