Abstract
Video salient object detection (VSOD) aims at distinguishing the salient objects from the complex background and highlighting them uniformly in the spatiotemporal domain. One of the fundamental challenges in VSOD is how to make the most use of the temporal information to boost the performance. We propose a dual temporal memory network (DTMNet) which stores short- and long-term video sequence information preceding the current frame as the temporal memories to address the temporal modeling in VSOD. The proposed network consists of two temporal modules including a short-term co-inference learning (SCL) sub-module and a long-range memory learning (LML) sub-module. The SCL is designed for inferencing spatiotemporal interactions between neighboring frames of the current input video clip. The LML aims to satisfy the logical reasoning sequence in timeline and learn the long-time range information between current clip and the previous video clips. Comprehensive evaluations well demonstrate the effectiveness and robustness of our proposed architecture.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wei, Y., et al.: STC: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2314–2320 (2017)
Itti, L.: Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans. Image Process. 13(10), 1304–1318 (2004)
Wu, H., Li, G., Luo, X.: Weighted attentional blocks for probabilistic object tracking. Vis. Comput. 30(2), 229–243 (2014)
Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFs. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 669–677 (2016)
Song, H., et al.: Pyramid dilated deeper ConvLSTM for video salient object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 744–760. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_44
Dosovitskiy, et al.: FlowNet: learning optical flow with convolutional networks. In: IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)
Yan, P., et al.: Semi-supervised video salient object detection using pseudo-labels. In: IEEE International Conference on Computer Vision, pp. 7284–7293 (2019)
Gu, Y., Wang, L., Wang, Z., Qin, H.: Pyramid constrained self-attention network for fast video salient object detection. In: Association for the Advance of Artificial Intelligence, pp. 10869–10876 (2020)
Li, G., et al.: Flow guided recurrent neural encoder for video salient object detection. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3243–3252 (2018)
Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24, 4185–4196 (2015)
Chen, C., Li, S., Wang, Y., Qin, H., Hao, A.: Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE Trans. Image Process. 26, 3156–3170 (2017)
Liu, Z., Zhang, X., Luo, S., Le Meur, O.: Superpixel-based spatiotemporal saliency detection. IEEE Trans. Circ. Syst. Video Technol. 24, 1522–1540 (2014)
Liu, Z., Li, J., Ye, L., Sun, G., Shen, L.: Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans. Circ. Syst. Video Technol. 27, 2527–2542 (2017)
Fang, Y., Lin, W., Chen, Z., Tsai, C., Lin, C.: A video saliency detection model in compressed domain. IEEE Trans. Circuits Syst. Video Technol. 24(1), 27–38 (2014)
Xi, T., Zhao, W., Wang, H., Lin, W.: Salient object detection with spatiotemporal background priors for video. IEEE Trans. Image Process. 26(7), 3425–3436 (2017)
Le, T.-N., Sugimoto, A.: Deeply supervised 3D recurrent FCN for salient object detection in videos. In: The 28th British Machine Vision Conference, pp. 1–13 (2017)
Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27, 38–49 (2018)
Li, H., Chen, G., Li, G., Yu, Y.: Motion guided attention for video salient object detection. In: IEEE International Conference on Computer Vision, pp. 7274–7283 (2019)
Wang, L., et al.: Learning to detect salient objects with image-level supervision. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 136–145 (2017)
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732 (2017)
Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15555-0_21
Fan, D., Wang, W., Cheng, M., Shen, J.: Shifting more attention to video salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8554–8564 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci (2014)
Tu, W., He, S., Yang, Q., Chien, S.: Real-time salient object detection with a minimum spanning tree. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2334–2342 (2016)
Hou, Q., Cheng, M., Hu, X., Borji, A., Torr, P.: Deeply supervised salient object detection with short connections. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Zhang, L., Dai, J., Lu, H., Gang, W.: A Bi-directional message passing model for salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Chen, Y., Zou, W., Tang, Y., Li, X., Xu, C., Komodakis, N.: SCOM: spatiotemporal constrained optimization for salient object detection. IEEE Trans. Image Process. 27, 3345–3357 (2018)
Tang, Y., Zou, W., Jin, Z., Chen, Y., Hua, Y., Li, X.: Weakly supervised salient object detection with spatiotemporal cascade neural networks. In: IEEE Transactions on Circuits and Systems for Video Technology, pp. 1973–1984 (2019)
Li, S., Seybold, B., Vorobyov, A., Lei, X., Kuo, C.-C.J.: Unsupervised video object segmentation with motion-based bilateral networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 215–231. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_13
Chen, C., Wang, G., Peng, C., Zhang, X., Qin, H.: Improved robust video saliency detection based on long-term spatial-temporal information. IEEE Trans. Image Process. 29, 1090–1100 (2020)
Acknowledgments
This work was supported in part by the National Key Research and Development Program of China under Grant 2018AAA0100400, and in part by the National Science Fund of China under Grant 62272235 and Grand U21B2044.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Z., Li, J., Li, J. (2023). Dual Temporal Memory Network for Video Salient Object Detection. In: Lu, H., et al. Image and Graphics . ICIG 2023. Lecture Notes in Computer Science, vol 14358. Springer, Cham. https://doi.org/10.1007/978-3-031-46314-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-46314-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46313-6
Online ISBN: 978-3-031-46314-3
eBook Packages: Computer ScienceComputer Science (R0)