Skip to main content
Log in

Video saliency detection via combining temporal difference and pixel gradient

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Even though temporal information matters for the quality of video saliency detection, many problems still arise/emerge in present network frameworks, such as bad performance in time-space coherence and edge continuity. In order to solve these problems, this paper proposes a full convolutional neural network, which integrates temporal differential and pixel gradient to fine tune the edges of salient targets. Considering the features of neighboring frames are highly relevant because of their proximity in location, a co-attention mechanism is used to put pixel-wise weight on the saliency probability map after features extraction with multi-scale pooling so that attention can be paid on both the edge and central of images. And the changes of pixel gradients of original images are used to recursively improve the continuity of target edges and details of central areas. In addition, residual networks are utilized to integrate information between modules, ensuring stable connections between the backbone network and modules and propagation of pixel gradient changes. In addition, a self-adjustment strategy for loss functions is presented to solve the problem of overfitting in experiments. The method presented in the paper has been tested with three available public datasets and its effectiveness has been proved after comparing with 6 other typically stat-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Guo C, Zhang L (2009) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on image processing 19(1):185–198

    MathSciNet  Google Scholar 

  2. Wu H, Li G, Luo X (2014) Weighted attentional blocks for probabilistic object tracking. The Visual Computer 30(2):229–243

    Article  Google Scholar 

  3. Fan Q, Luo W, Xia Y et al (2019) Metrics and methods of video quality assessment: a brief review. Multimedia Tools and Applications 78(22):31019–31033

    Article  Google Scholar 

  4. Götze N, Mertsching B, Schmalz S, et al. (1996) Multistage recognition of complex objects with the active vision system NAVIS

  5. Lu X, Yuan Y, Zheng X (2016) Joint dictionary learning for multispectral change detection. IEEE Transactions on cybernetics 47(4):884–897

    Article  Google Scholar 

  6. Wang Q, Wan J, Yuan Y (2018) Locality constraint distance metric learning for traffic congestion detection. Pattern Recognition 75:272–281

    Article  Google Scholar 

  7. Wang Q, Gao J, Yuan Y (2017) Embedding structured contour and location prior in siamesed fully convolutional networks for road detection. IEEE Transactions on Intelligent Transportation Systems 19(1):230–241

    Article  Google Scholar 

  8. Wang Q, Gao J, Yuan Y (2017) A joint convolutional neural networks and context transfer for street scenes labeling. IEEE Transactions on Intelligent Transportation Systems 19(5):1457–1470

    Article  Google Scholar 

  9. Wang Q, Wan J, Yuan Y (2017) Deep metric learning for crowdedness regression. IEEE Transactions on Circuits and Systems for Video Technology 28(10):2633–2643

    Article  Google Scholar 

  10. Yang J, Yang MH (2016) Top-down visual saliency via joint CRF and dictionary learning. IEEE transactions on pattern analysis and machine intelligence 39(3):576–588

    Article  Google Scholar 

  11. Gao D, Vasconcelos N (2007) Bottom-up saliency is a discriminant process 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1-6

  12. Cheng MM, Mitra NJ, Huang X et al (2014) Global contrast based salient region detection. IEEE transactions on pattern analysis and machine intelligence 37(3):569–582

    Article  Google Scholar 

  13. Fang Y, Wang Z, Lin W et al (2014) Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE transactions on image processing 23(9):3910–3921

    Article  MathSciNet  Google Scholar 

  14. Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Transactions on Image Processing 24(11):4185–4196

    Article  MathSciNet  Google Scholar 

  15. Wang W, Shen J, Shao L (2017) Video salient object detection via fully convolutional networks. IEEE Transactions on Image Processing 27(1):38–49

    Article  MathSciNet  Google Scholar 

  16. Brox T, Malik J (2010) Object segmentation by long term analysis of point trajectories European conference on computer vision. Springer, Berlin, Heidelberg, pp 282–295

    Google Scholar 

  17. Li F, Kim T, Humayun A, et al. (2013) Video segmentation by tracking many figure-ground segments Proceedings of the IEEE International Conference on Computer Vision. 2192-2199

  18. Perazzi F, Pont-Tuset J, McWilliams B, et al. (2016) A benchmark dataset and evaluation methodology for video object segmentation Proceedings of the IEEE conference on computer vision and pattern recognition. 724-732

  19. Achanta R, Hemami S, Estrada F, et al. (2009) Frequency-tuned salient region detection 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1597-1604

  20. Fan D P, Wang W, Cheng M M, et al. (2019) Shifting more attention to video salient object detection Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8554-8564

  21. Song H, Wang W, Zhao S, et al. (2018) Pyramid dilated deeper convlstm for video salient object detection Proceedings of the European conference on computer vision (ECCV). 715-731

  22. Li G, Xie Y, Wei T, et al. (2018) Flow guided recurrent neural encoder for video salient object detection Proceedings of the IEEE conference on computer vision and pattern recognition. 3243-3252

  23. Chen Y, Zou W, Tang Y et al (2018) SCOM: Spatiotemporal constrained optimization for salient object detection. IEEE Transactions on Image Processing 27(7):3345–3357

    Article  MathSciNet  Google Scholar 

  24. Li S, Seybold B, Vorobyov A, et al. (2018) Unsupervised video object segmentation with motion-based bilateral networks proceedings of the European Conference on Computer Vision (ECCV). 207-223

  25. Wang B, Liu W, Han G et al (2020) Learning long-term structural dependencies for video salient object detection. IEEE Transactions on Image Processing 29:9017–9031

    Article  Google Scholar 

  26. Jian M, Lam K-M, Dong J, Shen L (2014) Visual-patch-attention aware saliency detection, IEEE Trans Cybern, pp. 1575–1586

  27. Wang Q, Lin J, Yuan Y (2016) Salient band selection for hyperspectral image classification via manifold ranking, IEEE Transactions on Neural Networks and Learning Systems, 1279–1289

  28. Han J, Chen H, Liu N, Yan C, Li X (2017) Cnns-based rgb-d saliency detection via cross-view transfer and multiview fusion. IEEE Transactions on Cybernetics 48(11):3171–3183

    Article  Google Scholar 

  29. Cong R, Lei J, Fu H, Lin W, Huang Q, Cao X, Hou C (2019) An iterative co-saliency framework for rgbd images. IEEE Transactions on Cybernetics 49(1):233–246

    Article  Google Scholar 

  30. Cong R, Lei J, Fu H, Hou J, Huang Q, Kwong S (2020) Going from rgb to rgbd saliency: A depth-guided transformation model. IEEE Transactions on Cybernetics 50(8):3627–3639

    Article  Google Scholar 

  31. Zhang M, Ji W, Piao Y, Li J, Zhang Y, Xu S, Lu H (2020) Lfnet: Light field fusion network for salient object detection. IEEE Transactions on Image Processing 29:6276–6287

    Article  Google Scholar 

  32. Li C, Cong R, Kwong S, Hou J, Fu H, Zhu G, Zhang D, Huang Q (2020) Asif-net: Attention steered interweave fusion network for rgb-d salient object detection, IEEE Trans Cybern, pp.1–13

  33. Jian M, Qi Q, Dong J et al (2018) Saliency detection using quaternionic distance based weber local descriptor and level priors. Multimed Tools Appl 77:14343–14360

    Article  Google Scholar 

  34. Jian M, Wang J, Dong J et al (2020) Saliency detection using multiple low-level priors and a propagation mechanism. Multimed Tools Appl 79:33467–33482

    Article  Google Scholar 

  35. Hu R, Deng Z, Zhu X. Multi-scale Graph Fusion for Co-saliency Detection. Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 7789–7796

  36. Wang Z, Zhou Z, Lu H, Jiang J et al (2020) Global and local sensitivity guided key salient object re-augmentation for video saliency detection. Pattern Recognition 103:107275

    Article  Google Scholar 

  37. Zhang K, Dong M, Liu B et al. (2021) DeepACG: Co-Saliency Detection via Semantic-aware Contrast Gromov-Wasserstein Distance. the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13703-13712

  38. Wang Y, Wang R, Fan X, Wang T, He X (2023) Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 10031-10040

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (NSFC) (61976123, 61601427, 61876098); the Taishan Young Scholars Program of Shandong Province; and Key Development Program for Basic Research of Shandong Province (ZR2020ZD44).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Muwei Jian or Hui Yu.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, X., Jian, M., Wang, R. et al. Video saliency detection via combining temporal difference and pixel gradient. Multimed Tools Appl 83, 37589–37602 (2024). https://doi.org/10.1007/s11042-023-17128-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17128-5

Keywords

Navigation