Two-stream graph convolutional neural network fusion for weakly supervised temporal action detection

Zhao, Mengyao; Hu, Zhengping; Li, Shufang; Bi, Shuai; Sun, Zhe

doi:10.1007/s11760-021-02039-5

Two-stream graph convolutional neural network fusion for weakly supervised temporal action detection

Original Paper
Published: 11 October 2021

Volume 16, pages 947–954, (2022)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Mengyao Zhao¹,
Zhengping Hu ORCID: orcid.org/0000-0003-0300-6144^1,2,
Shufang Li¹,
Shuai Bi¹ &
…
Zhe Sun^1,2

434 Accesses
Explore all metrics

Abstract

Weakly supervised temporal action detection is an important and challenging task, which is to detect temporal intervals of actions and identify category with only video-level labels. Correctly identifying the transition state between action and background will improve the detection accuracy; therefore, this paper focuses on filtering the transition state and proposes two-stream graph convolutional neural network fusion for weakly supervised temporal action detection. Generally, the transition state changes prominently and lasts for a short time, but it is not the same as the characteristics of the action. The feature difference between two video segments with temporal interactions indicates whether this segment belongs to the transition state. Then, according to the feature similarity and temporal correlation of the segments, the semantic similarity weighted graph and the transition-aware temporal correlation graph are constructed. Finally, the temporal attention sequence of video segments is extracted according to the fused two-stream graph feature. Taking the attention-based feature expression as input for the linear classifier to generate the class activation sequence, and the temporal action detection is performed accordingly. Experimental results on shared datasets show that the proposed method can effectively improve the performance of action detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Chua, J.L., Chang, Y.C., Lim, W.K.: A simple vision-based fall detection technique for indoor video surveillance. SIViP 9, 623–633 (2015)
Article Google Scholar
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1346–1353 (2012)
Xiong, B., Kalantidis, Y., Ghadiyaram, D., et al.: Less is more: learning highlight detection from video duration. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1258–1267 (2019)
Zhao, Y., Xiong, Y., Wang, L., et al.: Temporal action detection with structured segment networks. Int. J. Comput. Vis. 128(1), 74–95 (2020)
Article MathSciNet Google Scholar
Wang, L., Xiong, Y., Lin, D., et al.: UntrimmedNets for weakly supervised action recognition and detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6402–6411 (2017)
Lee, P., Uh, Y., Byun, H.: Background suppression network for weakly-supervised temporal action localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11320–11327 (2020)
Islam, A., Radke, R.J.: Weakly supervised temporal action localization using deep metric learning. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 536–545 (2020)
Islam, A., Long, C., Radke, R.J.: A hybrid attention mechanism for weakly-supervised temporal action localization. arXiv preprint, arXiv: 2101.00545 (2021)
Rashid, M., Kjellstrom, H., Lee, Y.J.: Action graphs: weakly-supervised action localization with graph convolution networks. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 604–613 (2020)
Nguyen, P., Liu, T., Prasad, G., et al.: Weakly supervised action localization by sparse temporal pooling network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6752–6761 (2018)
Liu, D., Jiang, T., Wang, Y.: Completeness modeling and context separation for weakly supervised temporal action localization. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1298–1307 (2019)
Nguyen, P.X., Ramanan, D., Fowlkes, C.C.: Weakly-supervised action localization with background modeling. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5501–5510 (2019)
Shi, B., Dai, Q., Mu, Y., et al.: Weakly-supervised action localization by generative attention modeling. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1006–1016 (2020)
Zhai, Y., Wang, L., Tang, W., et al.: Two-stream consensus network for weakly-supervised temporal action localization. In: European Conference on Computer Vision, pp. 37–54 (2020)
Paul, S., Roy, S., Roy-Chowdhury, A.K.: W-TALC: weakly-supervised temporal activity localization and classification. In: European Conference on Computer Vision, pp. 588–607 (2018)
Fernando, B., Yin Chet, C.T., Bilen, H.: Weakly supervised Gaussian networks for action detection. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 526–535 (2020)
Huang, L., Huang, Y., Ouyang, W., et al.: Relational prototypical network for weakly supervised temporal action localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11053–11060 (2020)
Carreira, J., Zisserman, A., Vadis, Q.: Action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6299–6308 (2017)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Lin, T., Zhao, X., Su, H., et al.: BSN: boundary sensitive network for temporal action proposal generation. In: European Conference on Computer Vision, Munich, Germany, pp. 3–21 (2018)
Idrees, H., Zamir, A.R., Jiang, Y., et al.: The THUMOS challenge on action recognition for videos “in the wild.” Comput. Vis. Image Underst. 155, 1–23 (2017)
Article Google Scholar
Heilbron, F.C., Escorcia, V., Ghanem, B., et al.: Activitynet: a large-scale video benchmark for human activity understanding. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 961–970 (2015)
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv preprint, arXiv: 1412.6980 (2014)
Shou, Z., Gao, H., Zhang, L., et al.: AutoLoc: weakly-supervised temporal action localization in untrimmed videos. In: European Conference on Computer Vision, pp. 162–179 (2018)
Yuan, Y., Lyu, Y., Shen, X., et al.: Marginalized average attentional network for weakly supervised learning. arXiv preprint, arXiv: 1905.08586 (2019)
Ge, Y., Qin, X., Yang, D., et al.: Deep snippet selective network for weakly supervised temporal action localization. Pattern Recognit. 110, 107686 (2021)
Article Google Scholar
Zhang, X., Li, C., Shi, H., et al.: AdapNet: adaptability decomposing encoder-decoder network for weakly supervised action recognition and localization. IEEE Trans. Neural Netw. Learn. Syst. (2020). https://doi.org/10.1109/TNNLS.2019.2962815
Article Google Scholar
Xu, Y., Zhang, C., Cheng, Z., et al.: Segregated temporal assembly recurrent networks for weakly supervised multiple action detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9070–9078 (2019)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 61771420 and 62001413, the Natural Science Foundation of Hebei Province under Grants F2020203064, as well as the Doctoral Foundation of Yanshan University under Grants BL18033.

Author information

Authors and Affiliations

School of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei, China
Mengyao Zhao, Zhengping Hu, Shufang Li, Shuai Bi & Zhe Sun
Hebei Key Laboratory of Information Transmission and Signal Processing, Qinhuangdao, Hebei, China
Zhengping Hu & Zhe Sun

Authors

Mengyao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhengping Hu
View author publications
You can also search for this author in PubMed Google Scholar
Shufang Li
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Bi
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhengping Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, M., Hu, Z., Li, S. et al. Two-stream graph convolutional neural network fusion for weakly supervised temporal action detection. SIViP 16, 947–954 (2022). https://doi.org/10.1007/s11760-021-02039-5

Download citation

Received: 08 May 2021
Revised: 19 August 2021
Accepted: 17 September 2021
Published: 11 October 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11760-021-02039-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-stream graph convolutional neural network fusion for weakly supervised temporal action detection

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Video summarization using deep learning techniques: a detailed analysis and investigation

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-stream graph convolutional neural network fusion for weakly supervised temporal action detection

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Video summarization using deep learning techniques: a detailed analysis and investigation

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation