Abstract
Video salient object detection (VSOD) aims to segment the most attractive objects from a video sequence. Exploring video semantics and suppressing noise objects are two challenges in the VSOD. In this paper, we propose a unified end-to-end network with video Semantics Extraction and Noise Object suppression (SENO). SENO has two modules, including a video semantics module (VSM) and a contrastive learning module (CLM). VSM extracts video semantics by calculating global pixel correspondences, locating the video salient objects. CLM pulls close video foregrounds and pushes away interference objects, which enhances effective video salient features and suppresses noise objects. CLM is only applied during training, avoiding extra overhead during inference. Besides, our SENO does not use the pre-processing temporal modeling techniques such as optical flow methods, which avoids high computational costs and accumulated inaccuracies caused by these complex models. Experimental results on five benchmark testing datasets show that our SENO outperforms state-of-the-art methods. In addition, the proposed SENO can detect results in real-time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, Y., Zou, W., Tang, Y., et al.: SCOM: spatiotemporal constrained optimization for salient object detection. IEEE Trans. Image Process. 27(7), 3345–3357 (2018)
Chen, T., Kornblith, S., Norouzi, M., et al.: A simple framework for contrastive learning of visual representations. In: Proceedings of the International Conference on Machine Learning, pp. 1597–1607. ACM, Vienna (2020)
Chen, P., Lai, J., Wang, G., et al.: Confidence-guided adaptive gate and dual differential enhancement for video salient object detection. In: Proceedings of the IEEE International Conference on Multimedia and Expo. 2021, pp. 1–6. IEEE, Beijing (2021)
Chen, C., Wang, G., Peng, C., et al.: Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans. Image Process. 30, 3995–4007 (2021)
Chen, C., Song, J., Peng, C., et al.: A novel video salient object detection method via semisupervised motion quality perception. IEEE Trans. Circ. Syst. Video Technol. 32(5), 2732–2745 (2021)
Chen, C., Tan, Z., Cheng, Q., et al.: UTC: a unified transformer with inter-task contrastive learning for visual dialog. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18103–18112. IEEE, New Orleans (2022)
Cheng, Q., Tan, Z., Wen, K., et al.: Semantic pre-alignment and ranking learning with unified framework for cross-modal retrieval. IEEE Trans. Circ. Syst. Video Technol. 1 (2022). https://doi.org/10.1109/TCSVT.2022.3182549
Fan, D., Wang, W., Cheng, M., et al.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8554–8564. IEEE, Long Beach (2019)
Fan, Q., Fan, D., Fu, H., et al.: Group collaborative learning for co-salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12288–12298. IEEE, Kuala Lumpur (2021)
Gu, Y., Wang, L., Wang, Z., et al.: Pyramid constrained self-attention network for fast video salient object detection. In: Proceedings of the AAAI conference on artificial intelligence, pp. 10869–10876. AAAI, New York (2020)
Li, F., Kim, T., Humayun, A., et al.: Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2192–2199. IEEE, Sydney (2013)
Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)
Li, J., Xia, C., Chen, X.: A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection. IEEE Trans. Image Process. 27(1), 349–364 (2017)
Li, G., Xie, Y., We, I.T., et al.: Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3243–3252. IEEE, Salt Lake (2018)
Li, H., Chen, G., Li, G., et al.: Motion guided attention for video salient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7274–7283. IEEE, California (2019)
Liu Z, Lin Y, Cao Y, et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022. IEEE, Kuala Lumpur (2021)
Perazzi, F., Pont-Tuset, J., McWilliams, B., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732. IEEE, Las Vegas (2016)
Qin, Y., Gu, X., Tan, Z.: Visual context learning based on textual knowledge for image-text retrieval. Neural Netw. 152, 434–449 (2022)
Ren, S., Han, C., Yang, X., Han, G., He, S.: TENet: triple excitation network for video salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS. vol. 12350, pp. 212–228. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_13
Tan, Z., Hua, Y., Gu, X.: Salient object detection with edge recalibration. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS. vol. 12396, pp. 724–735. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_57
Tan, Z., Gu, X.: Depth scale balance saliency detection with connective feature pyramid and edge guidance. Appl. Intell. 51(8), 5775–5792 (2021). https://doi.org/10.1007/s10489-020-02150-z
Tan, Z., Gu, X.: Co-saliency detection with intra-group two-stage group semantics propagation and inter-group contrastive learning. Knowl. -Based Syst. 252, 109356 (2022)
Tan, Z., Gu, X.: Feature recalibration network for salient object detection. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning-ICANN 2022. ICANN 2022. Lecture Notes in Computer Science. vol. 13532. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15937-4_6
Tan, Z., Chen, C., Wen, K., et al.: A unified two-stage group semantics propagation and contrastive learning network for co-saliency detection. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 1–6. IEEE, Taipei (2022)
Tan, Z., Gu, X.: A unified multiple inducible co-attentions and edge guidance network for co-saliency detection. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds.) Artificial Neural Networks and Machine Learning-ICANN 2022. ICANN 2022. Lecture Notes in Computer Science. vol. 13529. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15919-0_2
Tan, Z., Gu, X.: Bridging feature complementarity gap between encoder and decoder for salient object detection. Digital Sig. Process. 133, 103841 (2023)
Zhang, M., Liu, J., Wang, Y., et al.: Dynamic context-sensitive filtering network for video salient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1553–1563. IEEE, Kuala Lumpur (2021)
Acknowledgements
This work was supported in part by National Natural Science Foundation of China under grant 62176062.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tan, Z., Gu, X. (2023). A Unified Video Semantics Extraction and Noise Object Suppression Network for Video Saliency Detection. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-44195-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44194-3
Online ISBN: 978-3-031-44195-0
eBook Packages: Computer ScienceComputer Science (R0)