Abstract
Video saliency prediction aims to simulate human visual attention by locating the most pertinent and instructive areas within a video frame or sequence. While ignoring the audio aspect, time and space data are essential when measuring video saliency, especially with challenging factors like swift motion, changeable background, and nonrigid deformation. Additionally, video saliency detection is inappropriate when using image saliency models directly neglecting video temporal information. This paper suggests a novel Bidirectional Multi-scale SpatioTemporal Network (BMST-Net) for identifying prominent video objects to address the above problem. The BMST-Net yields notable results for any given frame sequence, employing an encoder and decoder technique to learn and map features over time and space. The BMST-Net model consists of bidirectional LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network), where the VGG16 (Visual Geometry Group) single layer is used for feature extraction of the input video frames. Our proposed approach produced noteworthy findings concerning qualitative and quantitative investigation of the publicly available challenging video datasets, achieving competitive performance concerning state-of-the-art saliency models.
Similar content being viewed by others
Data availability
Dataset used in this work is available publicly.
References
Wang, W., et al.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)
Wang, W., et al.: Revisiting video saliency prediction in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 220–237 (2019)
Guo, F., et al.: Video saliency detection using object proposals. IEEE Trans. Cybern. 48(11), 3159–3170 (2017)
Borji, A., et al.: Salient object detection: a survey. Comput. Vis. Media 5, 117–150 (2019)
Ahmed, K., et al.: Performance evaluation of salient object detection techniques. Multimed. Tools Appl. 81(15), 21741–21777 (2022)
Li, et al.: Motion guided attention for VSOD. In: Proceedings of the IEEE/CVF international conference on computer vision. (2019)
Fan, D.-P., et al.: Salient objects in clutter: bringing salient object detection to the foreground. In: Proceedings of the European conference on computer vision (ECCV). (2018)
Simonyan, K., et al.: Very deep convolutional networks for large-scale image recognition, Preprint at arXiv:1409.1556, (2014)
Wang, W., et al.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2017)
Li, G., et al.: Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3243–3252. (2018)
Wang, W., et al.: Revisiting video saliency: a large-scale benchmark and a new model. In: Proceedings of the IEEE Conference on computer vision and pattern recognition. (2018)
Fan, D.-P., et al.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019)
Liu, Z., et al.: Superpixel-based spatiotemporal saliency detection. IEEE Trans. Circuits Syst. Video Technol. 24(9), 1522–1540 (2014)
Rahtu, E., et al.: Segmenting salient objects from images and videos. In: Computer Vision-ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part V 11. Springer Berlin Heidelberg, (2010)
Kim, H., et al.: Spatiotemporal saliency detection for video sequences based on random walk with restart. IEEE Trans. Image Process. 24(8), 2552–2564 (2015)
Chen, C., et al.: Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE Trans. Image Process. 26(7), 3156–3170 (2017)
Le, T.N., et al.: Video salient object detection using spatiotemporal deep features. IEEE Trans. Image Process. 27(10), 5002–5015 (2018)
Sharma, G., et al.: W-Net Plus: Dnn For spatial saliency prediction in videos. In: 2023 1st International Conference on Innovations in High Speed Communication and Signal Processing (IHCSP), pp. 263–267. IEEE, (2023)
Patil, P.W., et al.: An unified recurrent video object segmentation framework for various surveillance environments. IEEE Trans. Image Process. 30, 7889–7902 (2021)
Patil, P.W., et al.: An end-to-end edge aggregation network for moving object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2020)
Zhou, X., et al.: STI-Net: spatiotemporal integration network for video saliency detection. Inf. Sci. 628, 134–147 (2023)
Hemraj, et al.: Novel dilated separable convolution networks for efficient video salient object detection in the wild. IEEE Trans. Instrum. Meas. (2023)
Mei, et al. Transvos: video object segmentation with transformers. Preprint at arXiv:2106.00588 (2021)
Chen, P., et al.: Confidence-guided adaptive gate and dual differential enhancement for video salient object detection. In: 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, (2021)
Cong, R., et al.: PSNet: parallel symmetric network for video salient object detection. IEEE Trans. Emerg. Top. Comput. Intell. 7(2), 402–414 (2022)
Shi, X., et al.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NIPS (2015)
Song, H., et al.: Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European conference on computer vision, pp. 715-731. (2018)
Liu, et al.: A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019)
Li, Y., et al.: The secrets of salient object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2014)
Wang, et al.: Learning unsupervised video object segmentation through visual attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3064-3074. (2019)
Zhao, et al. EGNet: edge guidance network for salient object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8779-8788. (2019)
Wu, Z., et al. Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019)
Zhou, et al.: Interactive two-stream decoder for accurate and fast saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9141-9150. (2020)
Tu, et al.: Real-time salient object detection with a minimum spanning tree. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2334-2342. (2016)
Xi, T., et al.: Salient object detection with spatiotemporal background priors for video. IEEE Trans. Image Process. 26(7), 3425–3436 (2016)
Chen, Y., et al.: SCOM: Spatiotemporal constrained optimization for salient object detection. IEEE Trans. Image Process. 27(7), 3345–3357 (2018)
Li, et al.: Unsupervised video object segmentation with motion-based bilateral networks. In: Proceedings of the European conference on computer vision (ECCV), pp. 207-223. (2018)
Gu, et al.: Pyramid constrained self-attention network for fast video salient object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, pp. 10869-10876. (2020)
Ji, Y., et al.: CASNet: a cross-attention siamese network for video salient object detection. IEEE Trans. Neural Netw. Learn. Syst. 32(6), 2676–2690 (2020)
Liu, J., et al.: DS-Net: dynamic spatiotemporal network for video salient object detection. Digit. Signal Process. 130, 103700 (2022)
Chen, C., et al.: Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans. Image Process. 30, 3995–4007 (2021)
Zhao, W., et al.: Weakly supervised video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2021)
Patil, P.W., et al.: Multi-frame recurrent adversarial network for moving object segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. (2021)
Patil, P.W., et al.: Multi?frame based adversarial learning approach for video surveillance. Pattern Recognit. 122, 108350 (2022)
Ji, et al.: Full-duplex strategy for video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 4922-4933. (2021)
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
All author(s) have equal contribution.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sharma, G., Singh, M., Kumain, S.C. et al. BMST-Net: bidirectional multi-scale spatiotemporal network for salient object detection in videos. SIViP 19, 99 (2025). https://doi.org/10.1007/s11760-024-03599-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03599-y