A Unified Video Semantics Extraction and Noise Object Suppression Network for Video Saliency Detection

Tan, Zhenshan; Gu, Xiaodong

doi:10.1007/978-3-031-44195-0_28

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14260))

Included in the following conference series:

International Conference on Artificial Neural Networks

1651 Accesses

Abstract

Video salient object detection (VSOD) aims to segment the most attractive objects from a video sequence. Exploring video semantics and suppressing noise objects are two challenges in the VSOD. In this paper, we propose a unified end-to-end network with video Semantics Extraction and Noise Object suppression (SENO). SENO has two modules, including a video semantics module (VSM) and a contrastive learning module (CLM). VSM extracts video semantics by calculating global pixel correspondences, locating the video salient objects. CLM pulls close video foregrounds and pushes away interference objects, which enhances effective video salient features and suppresses noise objects. CLM is only applied during training, avoiding extra overhead during inference. Besides, our SENO does not use the pre-processing temporal modeling techniques such as optical flow methods, which avoids high computational costs and accumulated inaccuracies caused by these complex models. Experimental results on five benchmark testing datasets show that our SENO outperforms state-of-the-art methods. In addition, the proposed SENO can detect results in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fie-net: spatiotemporal full-stage interaction enhancement network for video salient object detection

Article 17 June 2024

DSFNet: dynamic selection-fusion networks for video salient object detection

Article 16 November 2023

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

References

Chen, Y., Zou, W., Tang, Y., et al.: SCOM: spatiotemporal constrained optimization for salient object detection. IEEE Trans. Image Process. 27(7), 3345–3357 (2018)
Article MathSciNet MATH Google Scholar
Chen, T., Kornblith, S., Norouzi, M., et al.: A simple framework for contrastive learning of visual representations. In: Proceedings of the International Conference on Machine Learning, pp. 1597–1607. ACM, Vienna (2020)
Google Scholar
Chen, P., Lai, J., Wang, G., et al.: Confidence-guided adaptive gate and dual differential enhancement for video salient object detection. In: Proceedings of the IEEE International Conference on Multimedia and Expo. 2021, pp. 1–6. IEEE, Beijing (2021)
Google Scholar
Chen, C., Wang, G., Peng, C., et al.: Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans. Image Process. 30, 3995–4007 (2021)
Article Google Scholar
Chen, C., Song, J., Peng, C., et al.: A novel video salient object detection method via semisupervised motion quality perception. IEEE Trans. Circ. Syst. Video Technol. 32(5), 2732–2745 (2021)
Article Google Scholar
Chen, C., Tan, Z., Cheng, Q., et al.: UTC: a unified transformer with inter-task contrastive learning for visual dialog. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18103–18112. IEEE, New Orleans (2022)
Google Scholar
Cheng, Q., Tan, Z., Wen, K., et al.: Semantic pre-alignment and ranking learning with unified framework for cross-modal retrieval. IEEE Trans. Circ. Syst. Video Technol. 1 (2022). https://doi.org/10.1109/TCSVT.2022.3182549
Fan, D., Wang, W., Cheng, M., et al.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8554–8564. IEEE, Long Beach (2019)
Google Scholar
Fan, Q., Fan, D., Fu, H., et al.: Group collaborative learning for co-salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12288–12298. IEEE, Kuala Lumpur (2021)
Google Scholar
Gu, Y., Wang, L., Wang, Z., et al.: Pyramid constrained self-attention network for fast video salient object detection. In: Proceedings of the AAAI conference on artificial intelligence, pp. 10869–10876. AAAI, New York (2020)
Google Scholar
Li, F., Kim, T., Humayun, A., et al.: Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2192–2199. IEEE, Sydney (2013)
Google Scholar
Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)
Article MathSciNet MATH Google Scholar
Li, J., Xia, C., Chen, X.: A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection. IEEE Trans. Image Process. 27(1), 349–364 (2017)
Article MathSciNet MATH Google Scholar
Li, G., Xie, Y., We, I.T., et al.: Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3243–3252. IEEE, Salt Lake (2018)
Google Scholar
Li, H., Chen, G., Li, G., et al.: Motion guided attention for video salient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7274–7283. IEEE, California (2019)
Google Scholar
Liu Z, Lin Y, Cao Y, et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022. IEEE, Kuala Lumpur (2021)
Google Scholar
Perazzi, F., Pont-Tuset, J., McWilliams, B., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732. IEEE, Las Vegas (2016)
Google Scholar
Qin, Y., Gu, X., Tan, Z.: Visual context learning based on textual knowledge for image-text retrieval. Neural Netw. 152, 434–449 (2022)
Article Google Scholar
Ren, S., Han, C., Yang, X., Han, G., He, S.: TENet: triple excitation network for video salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS. vol. 12350, pp. 212–228. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_13
Tan, Z., Hua, Y., Gu, X.: Salient object detection with edge recalibration. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS. vol. 12396, pp. 724–735. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_57
Tan, Z., Gu, X.: Depth scale balance saliency detection with connective feature pyramid and edge guidance. Appl. Intell. 51(8), 5775–5792 (2021). https://doi.org/10.1007/s10489-020-02150-z
Article Google Scholar
Tan, Z., Gu, X.: Co-saliency detection with intra-group two-stage group semantics propagation and inter-group contrastive learning. Knowl. -Based Syst. 252, 109356 (2022)
Article Google Scholar
Tan, Z., Gu, X.: Feature recalibration network for salient object detection. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning-ICANN 2022. ICANN 2022. Lecture Notes in Computer Science. vol. 13532. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15937-4_6
Tan, Z., Chen, C., Wen, K., et al.: A unified two-stage group semantics propagation and contrastive learning network for co-saliency detection. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 1–6. IEEE, Taipei (2022)
Google Scholar
Tan, Z., Gu, X.: A unified multiple inducible co-attentions and edge guidance network for co-saliency detection. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds.) Artificial Neural Networks and Machine Learning-ICANN 2022. ICANN 2022. Lecture Notes in Computer Science. vol. 13529. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15919-0_2
Tan, Z., Gu, X.: Bridging feature complementarity gap between encoder and decoder for salient object detection. Digital Sig. Process. 133, 103841 (2023)
Article Google Scholar
Zhang, M., Liu, J., Wang, Y., et al.: Dynamic context-sensitive filtering network for video salient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1553–1563. IEEE, Kuala Lumpur (2021)
Google Scholar

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China under grant 62176062.

Author information

Authors and Affiliations

Department of Electronic Engineering, Fudan University, Shanghai, 200438, China
Zhenshan Tan & Xiaodong Gu

Authors

Zhenshan Tan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Gu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Gu .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
Lancaster University, Lancaster, UK
Plamen Angelov
Teesside University, Middlesbrough, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, Z., Gu, X. (2023). A Unified Video Semantics Extraction and Noise Object Suppression Network for Video Saliency Detection. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-44195-0_28
Published: 22 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44194-3
Online ISBN: 978-3-031-44195-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Unified Video Semantics Extraction and Noise Object Suppression Network for Video Saliency Detection