BMST-Net: bidirectional multi-scale spatiotemporal network for salient object detection in videos

Sharma, Gaurav; Singh, Maheep; Kumain, Sandeep Chand; Kumar, Kamal

doi:10.1007/s11760-024-03599-y

BMST-Net: bidirectional multi-scale spatiotemporal network for salient object detection in videos

Original Paper
Published: 09 December 2024

Volume 19, article number 99, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Gaurav Sharma¹,
Maheep Singh^1,2,
Sandeep Chand Kumain^1,3 &
…
Kamal Kumar^1,4

87 Accesses
Explore all metrics

Abstract

Video saliency prediction aims to simulate human visual attention by locating the most pertinent and instructive areas within a video frame or sequence. While ignoring the audio aspect, time and space data are essential when measuring video saliency, especially with challenging factors like swift motion, changeable background, and nonrigid deformation. Additionally, video saliency detection is inappropriate when using image saliency models directly neglecting video temporal information. This paper suggests a novel Bidirectional Multi-scale SpatioTemporal Network (BMST-Net) for identifying prominent video objects to address the above problem. The BMST-Net yields notable results for any given frame sequence, employing an encoder and decoder technique to learn and map features over time and space. The BMST-Net model consists of bidirectional LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network), where the VGG16 (Visual Geometry Group) single layer is used for feature extraction of the input video frames. Our proposed approach produced noteworthy findings concerning qualitative and quantitative investigation of the publicly available challenging video datasets, achieving competitive performance concerning state-of-the-art saliency models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A spatiotemporal bidirectional network for video salient object detection using multiscale transfer learning

Article 07 May 2024

Spatiotemporal context-aware network for video salient object detection

Article 20 May 2022

DeepVS2.0: A Saliency-Structured Deep Learning Method for Predicting Dynamic Visual Attention

Article 28 August 2020

Data availability

Dataset used in this work is available publicly.

References

Wang, W., et al.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)
Article MathSciNet MATH Google Scholar
Wang, W., et al.: Revisiting video saliency prediction in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 220–237 (2019)
Article MATH Google Scholar
Guo, F., et al.: Video saliency detection using object proposals. IEEE Trans. Cybern. 48(11), 3159–3170 (2017)
Article MATH Google Scholar
Borji, A., et al.: Salient object detection: a survey. Comput. Vis. Media 5, 117–150 (2019)
Article MATH Google Scholar
Ahmed, K., et al.: Performance evaluation of salient object detection techniques. Multimed. Tools Appl. 81(15), 21741–21777 (2022)
Article Google Scholar
Li, et al.: Motion guided attention for VSOD. In: Proceedings of the IEEE/CVF international conference on computer vision. (2019)
Fan, D.-P., et al.: Salient objects in clutter: bringing salient object detection to the foreground. In: Proceedings of the European conference on computer vision (ECCV). (2018)
Simonyan, K., et al.: Very deep convolutional networks for large-scale image recognition, Preprint at arXiv:1409.1556, (2014)
Wang, W., et al.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2017)
Article MathSciNet MATH Google Scholar
Li, G., et al.: Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3243–3252. (2018)
Wang, W., et al.: Revisiting video saliency: a large-scale benchmark and a new model. In: Proceedings of the IEEE Conference on computer vision and pattern recognition. (2018)
Fan, D.-P., et al.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019)
Liu, Z., et al.: Superpixel-based spatiotemporal saliency detection. IEEE Trans. Circuits Syst. Video Technol. 24(9), 1522–1540 (2014)
Article MATH Google Scholar
Rahtu, E., et al.: Segmenting salient objects from images and videos. In: Computer Vision-ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part V 11. Springer Berlin Heidelberg, (2010)
Kim, H., et al.: Spatiotemporal saliency detection for video sequences based on random walk with restart. IEEE Trans. Image Process. 24(8), 2552–2564 (2015)
Article MathSciNet MATH Google Scholar
Chen, C., et al.: Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE Trans. Image Process. 26(7), 3156–3170 (2017)
Article MathSciNet MATH Google Scholar
Le, T.N., et al.: Video salient object detection using spatiotemporal deep features. IEEE Trans. Image Process. 27(10), 5002–5015 (2018)
Article MathSciNet MATH Google Scholar
Sharma, G., et al.: W-Net Plus: Dnn For spatial saliency prediction in videos. In: 2023 1st International Conference on Innovations in High Speed Communication and Signal Processing (IHCSP), pp. 263–267. IEEE, (2023)
Patil, P.W., et al.: An unified recurrent video object segmentation framework for various surveillance environments. IEEE Trans. Image Process. 30, 7889–7902 (2021)
Article MATH Google Scholar
Patil, P.W., et al.: An end-to-end edge aggregation network for moving object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2020)
Zhou, X., et al.: STI-Net: spatiotemporal integration network for video saliency detection. Inf. Sci. 628, 134–147 (2023)
Article MATH Google Scholar
Hemraj, et al.: Novel dilated separable convolution networks for efficient video salient object detection in the wild. IEEE Trans. Instrum. Meas. (2023)
Mei, et al. Transvos: video object segmentation with transformers. Preprint at arXiv:2106.00588 (2021)
Chen, P., et al.: Confidence-guided adaptive gate and dual differential enhancement for video salient object detection. In: 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, (2021)
Cong, R., et al.: PSNet: parallel symmetric network for video salient object detection. IEEE Trans. Emerg. Top. Comput. Intell. 7(2), 402–414 (2022)
Article MATH Google Scholar
Shi, X., et al.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NIPS (2015)
Song, H., et al.: Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European conference on computer vision, pp. 715-731. (2018)
Liu, et al.: A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019)
Li, Y., et al.: The secrets of salient object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2014)
Wang, et al.: Learning unsupervised video object segmentation through visual attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3064-3074. (2019)
Zhao, et al. EGNet: edge guidance network for salient object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8779-8788. (2019)
Wu, Z., et al. Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019)
Zhou, et al.: Interactive two-stream decoder for accurate and fast saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9141-9150. (2020)
Tu, et al.: Real-time salient object detection with a minimum spanning tree. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2334-2342. (2016)
Xi, T., et al.: Salient object detection with spatiotemporal background priors for video. IEEE Trans. Image Process. 26(7), 3425–3436 (2016)
Article MathSciNet MATH Google Scholar
Chen, Y., et al.: SCOM: Spatiotemporal constrained optimization for salient object detection. IEEE Trans. Image Process. 27(7), 3345–3357 (2018)
Article MathSciNet MATH Google Scholar
Li, et al.: Unsupervised video object segmentation with motion-based bilateral networks. In: Proceedings of the European conference on computer vision (ECCV), pp. 207-223. (2018)
Gu, et al.: Pyramid constrained self-attention network for fast video salient object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, pp. 10869-10876. (2020)
Ji, Y., et al.: CASNet: a cross-attention siamese network for video salient object detection. IEEE Trans. Neural Netw. Learn. Syst. 32(6), 2676–2690 (2020)
Article MATH Google Scholar
Liu, J., et al.: DS-Net: dynamic spatiotemporal network for video salient object detection. Digit. Signal Process. 130, 103700 (2022)
Article MATH Google Scholar
Chen, C., et al.: Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans. Image Process. 30, 3995–4007 (2021)
Article MATH Google Scholar
Zhao, W., et al.: Weakly supervised video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2021)
Patil, P.W., et al.: Multi-frame recurrent adversarial network for moving object segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. (2021)
Patil, P.W., et al.: Multi?frame based adversarial learning approach for video surveillance. Pattern Recognit. 122, 108350 (2022)
Article MATH Google Scholar
Ji, et al.: Full-duplex strategy for video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 4922-4933. (2021)

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Uttarakhand, Srinagar Garhwal, India
Gaurav Sharma, Maheep Singh, Sandeep Chand Kumain & Kamal Kumar
Department of Computer Science, Doon University, Dehradun, India
Maheep Singh
School of Computer Sciences, UPES, Dehradun, India
Sandeep Chand Kumain
Department of Information Technology, IGDTUW, Delhi, India
Kamal Kumar

Authors

Gaurav Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Maheep Singh
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Chand Kumain
View author publications
You can also search for this author in PubMed Google Scholar
Kamal Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All author(s) have equal contribution.

Corresponding author

Correspondence to Maheep Singh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sharma, G., Singh, M., Kumain, S.C. et al. BMST-Net: bidirectional multi-scale spatiotemporal network for salient object detection in videos. SIViP 19, 99 (2025). https://doi.org/10.1007/s11760-024-03599-y

Download citation

Received: 07 January 2024
Revised: 18 October 2024
Accepted: 03 November 2024
Published: 09 December 2024
DOI: https://doi.org/10.1007/s11760-024-03599-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BMST-Net: bidirectional multi-scale spatiotemporal network for salient object detection in videos

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A spatiotemporal bidirectional network for video salient object detection using multiscale transfer learning

Spatiotemporal context-aware network for video salient object detection

DeepVS2.0: A Saliency-Structured Deep Learning Method for Predicting Dynamic Visual Attention

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

BMST-Net: bidirectional multi-scale spatiotemporal network for salient object detection in videos

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A spatiotemporal bidirectional network for video salient object detection using multiscale transfer learning

Spatiotemporal context-aware network for video salient object detection

DeepVS2.0: A Saliency-Structured Deep Learning Method for Predicting Dynamic Visual Attention

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation