An attention-based bidirectional GRU network for temporal action proposals generation

Liao, Xiaoxin; Yuan, Jingyi; Cai, Zemin; Lai, Jian-huang

doi:10.1007/s11227-022-04973-8

An attention-based bidirectional GRU network for temporal action proposals generation

Published: 16 December 2022

Volume 79, pages 8322–8339, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Xiaoxin Liao^1,2^na1,
Jingyi Yuan^1,2^na1,
Zemin Cai^1,2 &
…
Jian-huang Lai^3,4^na1

253 Accesses
Explore all metrics

Abstract

Temporal action detection is an important yet challenging task in video understanding task. Temporal action proposals generation is a common module in action detection, and it effects the performance of action detection greatly. The module requires methods not only generating proposals with accurate temporal boundaries, but also retrieving proposals to cover action instances with high recall using relative fewer proposals. To address these difficulties, we propose an Actionness Score Optimization Model to improve the accuracy of generated proposals by capturing global contextual information of untrimmed videos. Firstly, a deconvolution layer is utilized to learn a nonlinear upsampling for the extracted features, in both spatial and temporal domains. In order to reveal the contextual information, then we introduce the bidirectional gated recurrent unit to the network. Moreover, an attention mechanism is applied to the network so that it can focus on the most relevant parts of the information to obtain more reliable actionness scores. Finally, we validate the effectiveness of our proposed network on three challenging benchmark datasets, ActivityNet v1.2, ActivityNet v1.3, and THUMOS’14.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-BMN for Temporal Action Proposal Generation

Recurrent Tubelet Proposal and Recognition Networks for Action Detection

CTAP: Complementary Temporal Action Proposal Generation

References

Yu H, Li G, Zhang W, Huang Q, Du D, Tian Q, Sebe N (2020) The unmanned aerial vehicle benchmark: object detection, tracking and baseline. Int J Comput Vis 128(5):1141–1159. https://doi.org/10.1007/s11263-019-01266-1
Article Google Scholar
Vallathan G, Ayeelyan J, Thirumalai CS, Mohan S, Srivastava G, Lin C-W (2021) Suspicious activity detection using deep learning in secure assisted living IoT environments. J Supercomput 77(4):3242–3260. https://doi.org/10.1007/s11227-020-03387-8
Article Google Scholar
Zhang K, Grauman K, Sha F (2018) Retrospective encoders for video summarization. In: 2018 European Conference on Computer Vision (ECCV), pp 391–408
Rochan M, Ye L, Wang Y (2018) Video summarization using fully convolutional sequence networks. In: 2018 European Conference on Computer Vision (ECCV), pp 358–374
Hussain T, Muhammad K, Ding W, Lloret J, Baik SW, de Albuquerque VHC (2021) A comprehensive survey of multi-view video summarization. Pattern Recogn 109:107567. https://doi.org/10.1016/j.patcog.2020.107567
Article Google Scholar
Song J, Gao L, Liu L, Zhu X, Sebe N (2018) Quantization-based hashing: a general framework for scalable image and video retrieval. Pattern Recogn 75:175–187. https://doi.org/10.1016/j.patcog.2017.03.021
Article Google Scholar
Dong J, Li X, Xu C, Yang X, Yang G, Wang X, Wang M (2021) Dual encoding for video retrieval by text. IEEE Trans Pattern Anal Mach Intell 1:21. https://doi.org/10.1109/TPAMI.2021.3059295
Article Google Scholar
Gabeur V, Sun C, Alahari K, Schmid C (2020) Multi-modal transformer for video retrieval. In: 2020 European Conference on Computer Vision (ECCV), pp 214–229
Moltisanti D, Fidler S, Damen D (2019) Action recognition from single timestamp supervision in untrimmed videos. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9907–9916. https://doi.org/10.1109/CVPR.2019.01015
Singh A, Chakraborty O, Varshney A, Panda R, Feris R, Saenko K, Das A (2021) Semi-supervised action recognition with temporal contrastive learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10384–10394. https://doi.org/10.1109/CVPR46437.2021.01025
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 4489–4497. https://doi.org/10.1109/ICCV.2015.510
Cai D, Yao A, Chen Y (2021) Dynamic normalization and relay for video action recognition. In: Advances in neural information processing systems, vol 34, pp 11026–11040
Buch S, Escorcia V, Shen C, Ghanem B, Niebles JC (2017) SST: single-stream temporal action proposals. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6373–6382. https://doi.org/10.1109/CVPR.2017.675
Heilbron FC, Niebles JC, Ghanem B (2016) Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1914–1923. https://doi.org/10.1109/CVPR.2016.211
Escorcia V, Heilbron FC, Niebles JC, Ghanem B (2016) DAPs: deep action proposals for action understanding. In: 2016 European Conference on Computer Vision (ECCV), pp 768–784
Gao J, Yang Z, Sun C, Chen K, Nevatia R (2017) TURN TAP: temporal unit regression network for temporal action proposals. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 3648–3656. https://doi.org/10.1109/ICCV.2017.392
Gao J, Shi Z, Li J, Wang G, Yuan Y, Ge S, Zhou X (2020) Accurate temporal action proposal generation with relation-aware pyramid network. In: 2020 the AAAI Conference on Artificial Intelligence, vol 34, pp 10810–10817. https://doi.org/10.1609/aaai.v34i07.6711
Shou Z, Wang D, Chang S-F (2016) Temporal action localization in untrimmed videos via multi-stage cnns. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1049–1058. https://doi.org/10.1109/CVPR.2016.119
Zhao Y, Xiong Y, Wang L, Wu Z, Tang X, Lin D (2017) Temporal action detection with structured segment networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2933–2942. https://doi.org/10.1109/ICCV.2017.317
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681. https://doi.org/10.1109/78.650093
Article Google Scholar
Heilbron FC, Escorcia V, Ghanem B, Niebles JC (2015) Activitynet: a large-scale video benchmark for human activity understanding. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 961–970. https://doi.org/10.1109/CVPR.2015.7298698
Idrees H, Zamir AR, Jiang Y, Gorban A, Laptev I, Sukthankar R, Shah M (2017) The THUMOS challenge on action recognition for videos “in the wild’’. Comput Vis Image Understand 155(4):1–23
Article Google Scholar
Perš J, Sulić V, Kristan M, Perše M, Polanec K, Kovačič S (2010) Histograms of optical flow for efficient representation of body motion. Pattern Recogn Lett 31(11):1369–1376. https://doi.org/10.1016/j.patrec.2010.03.024
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol 1, pp 886–893. https://doi.org/10.1109/CVPR.2005.177
Wang H, Kläser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79. https://doi.org/10.1007/s11263-012-0594-8
Article MathSciNet Google Scholar
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp 3551–3558. https://doi.org/10.1109/ICCV.2013.441
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1725–1732. https://doi.org/10.1109/CVPR.2014.223
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: 2014 the 27th International Conference on Neural Information Processing Systems. NIPS’14, pp 568–576
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12018–12027. https://doi.org/10.1109/CVPR.2019.01230
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1933–1941. https://doi.org/10.1109/CVPR.2016.213
Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 5534–5542. https://doi.org/10.1109/ICCV.2017.590
Liu Q, Wang Z (2020) Progressive boundary refinement network for temporal action detection. In: 2020 the AAAI Conference on Artificial Intelligence, vol 34, pp 11612–11619. https://doi.org/10.1609/aaai.v34i07.6829
Shou Z, Chan J, Zareian A, Miyazawa K, Chang S-F (2017) CDC: convolutional-De-Convolutional networks for precise temporal action localization in untrimmed videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1417–1426. https://doi.org/10.1109/CVPR.2017.155
Jiyang Gao ZY, Nevatia R (2017) Cascaded boundary regression for temporal action detection. In: The British Machine Vision Conference (BMVC), pp 1–11. https://doi.org/10.5244/C.31.52
Liu X, Wang Q, Hu Y, Tang X, Bai S, Bai X (2021) End-to-end temporal action detection with transformer. ArXiv abs/2106.10271
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-NMS—improving object detection with one line of code. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 5562–5570. https://doi.org/10.1109/ICCV.2017.593
Zhang G, Rao Y, Wang C, Zhou W, Ji X (2021) A deep learning method for video-based action recognition. IET Image Proc 15(12):3498–3511. https://doi.org/10.1049/ipr2.12303
Article Google Scholar
Roerdink JBTM, Meijster A (2003) The watershed transform: definitions, algorithms and parallelization strategies. Fund Inform 41(10):187–228
MathSciNet MATH Google Scholar
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: 2016 European Conference on Computer Vision (ECCV), vol 9912, pp 20–36. https://doi.org/10.1007/978-3-319-46484-8_2
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 2015 the 32nd International Conference on Machine Learning (ICML), vol 37, pp 448–456
Lin T, Zhao X, Su H, Wang C, Yang M (2018) BSN: boundary sensitive network for temporal action proposal generation. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IV. Lecture Notes in Computer Science, vol 11208, pp 3–21
Lin T, Liu X, Li X, Ding E, Wen S (2019) BMN: boundary-matching network for temporal action proposal generation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, pp 3888–3897
Wang W, Lin T, He D, Li F, Wen S, Wang L, Liu J (2021) Semi-supervised temporal action proposal generation via exploiting 2-d proposal map. IEEE Trans. Multim. 24:3624–3635
Article Google Scholar
Shou Z, Wang D, Chang SF (2016) Temporal action localization in untrimmed videos via multi-stage cnns. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016. https://doi.org/10.1109/CVPR.2016.119
Buch S, Escorcia V, Shen C, Ghanem B, Niebles JC (2017) SST: single-stream temporal action proposals. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp 6373–6382
Zhang D, Dai X, Wang X, Wang YF (2018) S3d: Single shot multi-span detector via fully 3d convolutional networks. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK,September 3–6, 2018, p 293
Chao YW, Vijayanarasimhan S, Seybold B, Ross DA, Deng J, Sukthankar R (2018) Rethinking the faster R-CNN architecture for temporal action localization. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp 1130–1139
Gao J, Chen K, Nevatia R (2018) CTAP: complementary temporal action proposal generation. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part II. Lecture Notes in Computer Science, vol 11206, pp 70–85
Lin T, Zhao X, Shou Z (2017) Temporal convolution based action proposal: submission to activitynet 2017. CVPR ActivityNet Workshop abs/1707.06750

Download references

Acknowledgements

This work was supported in part by funding from the National Natural Science Foundation of China (61876104, 62002061). Financial support for this study was provided by a grant from the National Natural Science Foundation of China. The authors wish to thank Prof. Jinwen Yan for his suggestions on preparing the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61876104 and 62002061).

Author information

Xiaoxin Liao, Jingyi Yuan, and Jian-huang Lai have contributed equally to this work.

Authors and Affiliations

Department of Electronic Engineering, Shantou University, Shantou, 515063, Guangdong, China
Xiaoxin Liao, Jingyi Yuan & Zemin Cai
The Key Lab of Digital Signal and Image Processing of Guangdong Province, Shantou, 515063, Guangdong, China
Xiaoxin Liao, Jingyi Yuan & Zemin Cai
The School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China
Jian-huang Lai
The Key Laboratory of Machine Intelligent and Advanced Computing, Ministry of Education, Guangzhou, 510006, Guangdong, China
Jian-huang Lai

Authors

Xiaoxin Liao
View author publications
You can also search for this author in PubMed Google Scholar
Jingyi Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Zemin Cai
View author publications
You can also search for this author in PubMed Google Scholar
Jian-huang Lai
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XL, JY, and ZC wrote the main manuscript text, and Jian-huang Lai prepared figures 1–4. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zemin Cai.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or nonfinancial interest in the subject matter or materials discussed in this manuscript.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Availability of data and materials

All data generated or analyzed during this study are available from the corresponding author on reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liao, X., Yuan, J., Cai, Z. et al. An attention-based bidirectional GRU network for temporal action proposals generation. J Supercomput 79, 8322–8339 (2023). https://doi.org/10.1007/s11227-022-04973-8

Download citation

Accepted: 21 November 2022
Published: 16 December 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11227-022-04973-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An attention-based bidirectional GRU network for temporal action proposals generation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Context-BMN for Temporal Action Proposal Generation

Recurrent Tubelet Proposal and Recognition Networks for Action Detection

CTAP: Complementary Temporal Action Proposal Generation

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval and consent to participate

Consent for publication

Availability of data and materials

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An attention-based bidirectional GRU network for temporal action proposals generation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Context-BMN for Temporal Action Proposal Generation

Recurrent Tubelet Proposal and Recognition Networks for Action Detection

CTAP: Complementary Temporal Action Proposal Generation

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval and consent to participate

Consent for publication

Availability of data and materials

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation