Skip to main content

BMST-Net: bidirectional multi-scale spatiotemporal network for salient object detection in videos

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Video saliency prediction aims to simulate human visual attention by locating the most pertinent and instructive areas within a video frame or sequence. While ignoring the audio aspect, time and space data are essential when measuring video saliency, especially with challenging factors like swift motion, changeable background, and nonrigid deformation. Additionally, video saliency detection is inappropriate when using image saliency models directly neglecting video temporal information. This paper suggests a novel Bidirectional Multi-scale SpatioTemporal Network (BMST-Net) for identifying prominent video objects to address the above problem. The BMST-Net yields notable results for any given frame sequence, employing an encoder and decoder technique to learn and map features over time and space. The BMST-Net model consists of bidirectional LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network), where the VGG16 (Visual Geometry Group) single layer is used for feature extraction of the input video frames. Our proposed approach produced noteworthy findings concerning qualitative and quantitative investigation of the publicly available challenging video datasets, achieving competitive performance concerning state-of-the-art saliency models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Dataset used in this work is available publicly.

References

  1. Wang, W., et al.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  2. Wang, W., et al.: Revisiting video saliency prediction in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 220–237 (2019)

    Article  MATH  Google Scholar 

  3. Guo, F., et al.: Video saliency detection using object proposals. IEEE Trans. Cybern. 48(11), 3159–3170 (2017)

    Article  MATH  Google Scholar 

  4. Borji, A., et al.: Salient object detection: a survey. Comput. Vis. Media 5, 117–150 (2019)

    Article  MATH  Google Scholar 

  5. Ahmed, K., et al.: Performance evaluation of salient object detection techniques. Multimed. Tools Appl. 81(15), 21741–21777 (2022)

    Article  Google Scholar 

  6. Li, et al.: Motion guided attention for VSOD. In: Proceedings of the IEEE/CVF international conference on computer vision. (2019)

  7. Fan, D.-P., et al.: Salient objects in clutter: bringing salient object detection to the foreground. In: Proceedings of the European conference on computer vision (ECCV). (2018)

  8. Simonyan, K., et al.: Very deep convolutional networks for large-scale image recognition, Preprint at arXiv:1409.1556, (2014)

  9. Wang, W., et al.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  10. Li, G., et al.: Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3243–3252. (2018)

  11. Wang, W., et al.: Revisiting video saliency: a large-scale benchmark and a new model. In: Proceedings of the IEEE Conference on computer vision and pattern recognition. (2018)

  12. Fan, D.-P., et al.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019)

  13. Liu, Z., et al.: Superpixel-based spatiotemporal saliency detection. IEEE Trans. Circuits Syst. Video Technol. 24(9), 1522–1540 (2014)

    Article  MATH  Google Scholar 

  14. Rahtu, E., et al.: Segmenting salient objects from images and videos. In: Computer Vision-ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part V 11. Springer Berlin Heidelberg, (2010)

  15. Kim, H., et al.: Spatiotemporal saliency detection for video sequences based on random walk with restart. IEEE Trans. Image Process. 24(8), 2552–2564 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  16. Chen, C., et al.: Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE Trans. Image Process. 26(7), 3156–3170 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  17. Le, T.N., et al.: Video salient object detection using spatiotemporal deep features. IEEE Trans. Image Process. 27(10), 5002–5015 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  18. Sharma, G., et al.: W-Net Plus: Dnn For spatial saliency prediction in videos. In: 2023 1st International Conference on Innovations in High Speed Communication and Signal Processing (IHCSP), pp. 263–267. IEEE, (2023)

  19. Patil, P.W., et al.: An unified recurrent video object segmentation framework for various surveillance environments. IEEE Trans. Image Process. 30, 7889–7902 (2021)

    Article  MATH  Google Scholar 

  20. Patil, P.W., et al.: An end-to-end edge aggregation network for moving object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2020)

  21. Zhou, X., et al.: STI-Net: spatiotemporal integration network for video saliency detection. Inf. Sci. 628, 134–147 (2023)

    Article  MATH  Google Scholar 

  22. Hemraj, et al.: Novel dilated separable convolution networks for efficient video salient object detection in the wild. IEEE Trans. Instrum. Meas. (2023)

  23. Mei, et al. Transvos: video object segmentation with transformers. Preprint at arXiv:2106.00588 (2021)

  24. Chen, P., et al.: Confidence-guided adaptive gate and dual differential enhancement for video salient object detection. In: 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, (2021)

  25. Cong, R., et al.: PSNet: parallel symmetric network for video salient object detection. IEEE Trans. Emerg. Top. Comput. Intell. 7(2), 402–414 (2022)

    Article  MATH  Google Scholar 

  26. Shi, X., et al.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NIPS (2015)

  27. Song, H., et al.: Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European conference on computer vision, pp. 715-731. (2018)

  28. Liu, et al.: A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019)

  29. Li, Y., et al.: The secrets of salient object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2014)

  30. Wang, et al.: Learning unsupervised video object segmentation through visual attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3064-3074. (2019)

  31. Zhao, et al. EGNet: edge guidance network for salient object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8779-8788. (2019)

  32. Wu, Z., et al. Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019)

  33. Zhou, et al.: Interactive two-stream decoder for accurate and fast saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9141-9150. (2020)

  34. Tu, et al.: Real-time salient object detection with a minimum spanning tree. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2334-2342. (2016)

  35. Xi, T., et al.: Salient object detection with spatiotemporal background priors for video. IEEE Trans. Image Process. 26(7), 3425–3436 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  36. Chen, Y., et al.: SCOM: Spatiotemporal constrained optimization for salient object detection. IEEE Trans. Image Process. 27(7), 3345–3357 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  37. Li, et al.: Unsupervised video object segmentation with motion-based bilateral networks. In: Proceedings of the European conference on computer vision (ECCV), pp. 207-223. (2018)

  38. Gu, et al.: Pyramid constrained self-attention network for fast video salient object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, pp. 10869-10876. (2020)

  39. Ji, Y., et al.: CASNet: a cross-attention siamese network for video salient object detection. IEEE Trans. Neural Netw. Learn. Syst. 32(6), 2676–2690 (2020)

    Article  MATH  Google Scholar 

  40. Liu, J., et al.: DS-Net: dynamic spatiotemporal network for video salient object detection. Digit. Signal Process. 130, 103700 (2022)

    Article  MATH  Google Scholar 

  41. Chen, C., et al.: Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans. Image Process. 30, 3995–4007 (2021)

    Article  MATH  Google Scholar 

  42. Zhao, W., et al.: Weakly supervised video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2021)

  43. Patil, P.W., et al.: Multi-frame recurrent adversarial network for moving object segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. (2021)

  44. Patil, P.W., et al.: Multi?frame based adversarial learning approach for video surveillance. Pattern Recognit. 122, 108350 (2022)

    Article  MATH  Google Scholar 

  45. Ji, et al.: Full-duplex strategy for video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 4922-4933. (2021)

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

All author(s) have equal contribution.

Corresponding author

Correspondence to Maheep Singh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, G., Singh, M., Kumain, S.C. et al. BMST-Net: bidirectional multi-scale spatiotemporal network for salient object detection in videos. SIViP 19, 99 (2025). https://doi.org/10.1007/s11760-024-03599-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03599-y

Keywords