Skip to main content

A Unified Video Semantics Extraction and Noise Object Suppression Network for Video Saliency Detection

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Abstract

Video salient object detection (VSOD) aims to segment the most attractive objects from a video sequence. Exploring video semantics and suppressing noise objects are two challenges in the VSOD. In this paper, we propose a unified end-to-end network with video Semantics Extraction and Noise Object suppression (SENO). SENO has two modules, including a video semantics module (VSM) and a contrastive learning module (CLM). VSM extracts video semantics by calculating global pixel correspondences, locating the video salient objects. CLM pulls close video foregrounds and pushes away interference objects, which enhances effective video salient features and suppresses noise objects. CLM is only applied during training, avoiding extra overhead during inference. Besides, our SENO does not use the pre-processing temporal modeling techniques such as optical flow methods, which avoids high computational costs and accumulated inaccuracies caused by these complex models. Experimental results on five benchmark testing datasets show that our SENO outperforms state-of-the-art methods. In addition, the proposed SENO can detect results in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, Y., Zou, W., Tang, Y., et al.: SCOM: spatiotemporal constrained optimization for salient object detection. IEEE Trans. Image Process. 27(7), 3345–3357 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  2. Chen, T., Kornblith, S., Norouzi, M., et al.: A simple framework for contrastive learning of visual representations. In: Proceedings of the International Conference on Machine Learning, pp. 1597–1607. ACM, Vienna (2020)

    Google Scholar 

  3. Chen, P., Lai, J., Wang, G., et al.: Confidence-guided adaptive gate and dual differential enhancement for video salient object detection. In: Proceedings of the IEEE International Conference on Multimedia and Expo. 2021, pp. 1–6. IEEE, Beijing (2021)

    Google Scholar 

  4. Chen, C., Wang, G., Peng, C., et al.: Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans. Image Process. 30, 3995–4007 (2021)

    Article  Google Scholar 

  5. Chen, C., Song, J., Peng, C., et al.: A novel video salient object detection method via semisupervised motion quality perception. IEEE Trans. Circ. Syst. Video Technol. 32(5), 2732–2745 (2021)

    Article  Google Scholar 

  6. Chen, C., Tan, Z., Cheng, Q., et al.: UTC: a unified transformer with inter-task contrastive learning for visual dialog. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18103–18112. IEEE, New Orleans (2022)

    Google Scholar 

  7. Cheng, Q., Tan, Z., Wen, K., et al.: Semantic pre-alignment and ranking learning with unified framework for cross-modal retrieval. IEEE Trans. Circ. Syst. Video Technol. 1 (2022). https://doi.org/10.1109/TCSVT.2022.3182549

  8. Fan, D., Wang, W., Cheng, M., et al.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8554–8564. IEEE, Long Beach (2019)

    Google Scholar 

  9. Fan, Q., Fan, D., Fu, H., et al.: Group collaborative learning for co-salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12288–12298. IEEE, Kuala Lumpur (2021)

    Google Scholar 

  10. Gu, Y., Wang, L., Wang, Z., et al.: Pyramid constrained self-attention network for fast video salient object detection. In: Proceedings of the AAAI conference on artificial intelligence, pp. 10869–10876. AAAI, New York (2020)

    Google Scholar 

  11. Li, F., Kim, T., Humayun, A., et al.: Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2192–2199. IEEE, Sydney (2013)

    Google Scholar 

  12. Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  13. Li, J., Xia, C., Chen, X.: A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection. IEEE Trans. Image Process. 27(1), 349–364 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  14. Li, G., Xie, Y., We, I.T., et al.: Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3243–3252. IEEE, Salt Lake (2018)

    Google Scholar 

  15. Li, H., Chen, G., Li, G., et al.: Motion guided attention for video salient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7274–7283. IEEE, California (2019)

    Google Scholar 

  16. Liu Z, Lin Y, Cao Y, et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022. IEEE, Kuala Lumpur (2021)

    Google Scholar 

  17. Perazzi, F., Pont-Tuset, J., McWilliams, B., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732. IEEE, Las Vegas (2016)

    Google Scholar 

  18. Qin, Y., Gu, X., Tan, Z.: Visual context learning based on textual knowledge for image-text retrieval. Neural Netw. 152, 434–449 (2022)

    Article  Google Scholar 

  19. Ren, S., Han, C., Yang, X., Han, G., He, S.: TENet: triple excitation network for video salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS. vol. 12350, pp. 212–228. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_13

  20. Tan, Z., Hua, Y., Gu, X.: Salient object detection with edge recalibration. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS. vol. 12396, pp. 724–735. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_57

  21. Tan, Z., Gu, X.: Depth scale balance saliency detection with connective feature pyramid and edge guidance. Appl. Intell. 51(8), 5775–5792 (2021). https://doi.org/10.1007/s10489-020-02150-z

    Article  Google Scholar 

  22. Tan, Z., Gu, X.: Co-saliency detection with intra-group two-stage group semantics propagation and inter-group contrastive learning. Knowl. -Based Syst. 252, 109356 (2022)

    Article  Google Scholar 

  23. Tan, Z., Gu, X.: Feature recalibration network for salient object detection. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning-ICANN 2022. ICANN 2022. Lecture Notes in Computer Science. vol. 13532. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15937-4_6

  24. Tan, Z., Chen, C., Wen, K., et al.: A unified two-stage group semantics propagation and contrastive learning network for co-saliency detection. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 1–6. IEEE, Taipei (2022)

    Google Scholar 

  25. Tan, Z., Gu, X.: A unified multiple inducible co-attentions and edge guidance network for co-saliency detection. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds.) Artificial Neural Networks and Machine Learning-ICANN 2022. ICANN 2022. Lecture Notes in Computer Science. vol. 13529. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15919-0_2

  26. Tan, Z., Gu, X.: Bridging feature complementarity gap between encoder and decoder for salient object detection. Digital Sig. Process. 133, 103841 (2023)

    Article  Google Scholar 

  27. Zhang, M., Liu, J., Wang, Y., et al.: Dynamic context-sensitive filtering network for video salient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1553–1563. IEEE, Kuala Lumpur (2021)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China under grant 62176062.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodong Gu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tan, Z., Gu, X. (2023). A Unified Video Semantics Extraction and Noise Object Suppression Network for Video Saliency Detection. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44195-0_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44194-3

  • Online ISBN: 978-3-031-44195-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics