MBGNet:Multi-branch boundary generation network with temporal context aggregation for temporal action detection

Pan, Xiaoying; Zhang, Nijuan; Xie, Hewei; Li, Shoukun; Feng, Tong

doi:10.1007/s10489-024-05664-y

MBGNet:Multi-branch boundary generation network with temporal context aggregation for temporal action detection

Published: 09 July 2024

Volume 54, pages 9045–9066, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xiaoying Pan ORCID: orcid.org/0000-0002-8899-7540^1,2,
Nijuan Zhang^1,2,
Hewei Xie^1,2,
Shoukun Li^1,2 &
…
Tong Feng^1,2

191 Accesses
Explore all metrics

Abstract

Temporal action detection is an important and fundamental video understanding task that aims to locate the temporal regions where human actions or events may occur and to identify the classes of actions in untrimmed videos. The main challenge of temporal action detection is that videos are usually of different durations and untrimmed. Although existing methods have achieved better results in recent years, there are still some challenges, such as a lack of full utilisation of video context features, insufficient accuracy of generated action boundaries and failure to consider the relationship between proposals. To address the above issues, this paper proposes a Multi-branch Boundary Generation Network (MBGNet) with temporal context aggregation. It improves the performance of temporal action proposal generation by exploiting rich temporal context features and complementary boundary generators.First, we propose a multi-path temporal context feature aggregation (MTCA) module to exploit “local and global” contextual temporal features for the generation of temporal action proposals. Second, in order to generate accurate action boundaries, we design a multi-branch temporal boundary detector (MBG) to optimise the prediction results by exploiting the complementary relationship between the two boundary detectors.In addition, to accurately predict the confidence of densely distributed proposals, we design a proposal relation-aware module (PRAM) that exploits global correlation for proposal relationship modelling. Experiments on the popular datasets ActivityNet1.3, THUMOS14, and HACS demonstrate the effectiveness of the method proposed in this paper on the task of temporal action proposal generation, which can generate action proposals with high precision and recall. Moreover, combining with existing action classifiers can also achieve better performance in temporal action detection.These results demonstrate the effectiveness of the method in this paper in improving the accuracy of temporal action proposal generation and detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BSN: Boundary Sensitive Network for Temporal Action Proposal Generation

Boundary discrimination and proposal evaluation for temporal action proposal generation

Article 11 September 2020

TAN: a temporal-aware attention network with context-rich representation for boosting proposal generation

Article Open access 22 February 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability and access

We evaluate our proposed method on public datasets ActivityNet and THUMOS.The ActivityNet dataset is available at http://activity-net.org/.The THUMOS dataset is available at https://www.crcv.ucf.edu/THUMOS14/download.html

References

Sun Z, Ke Q, Rahmani H, Bennamoun M, Wang G, Liu J (2023) Human action recognition from various data modalities: A review. IEEE Trans Pattern Anal Mach Intell 45(3):3200–3225
Google Scholar
Hussain T, Muhammad K, Ding W, Lloret J, Baik SW, Albuquerque VHC (2021) A comprehensive survey of multi-view video summarization. Pattern Recogn 109:107567
Article Google Scholar
Dong J, Li X, Xu C, Yang X, Yang G, Wang X, Wang M (2022) Dual encoding for video retrieval by text. IEEE Trans Pattern Anal Mach Intell 44(8):4065–4080
Google Scholar
Yang L, Peng H, Zhang D, Fu J (2020) Han J () Revisiting anchor mechanisms for temporal action localization. IEEE Trans Image Process 29:8535–8548
Article Google Scholar
Gao J, Chen K, Nevatia R (2018) Ctap: Complementary temporal action proposal generation, In: Proceedings of the European Conference on Computer Vision (ECCV) pp, 68–83
Gao J, Shi Z, Wang G, Li J, Yuan Y, Ge S, Zhou X (2020) Accurate temporal action proposal generation with relation-aware pyramid network. Proceedings of the AAAI Conference on Artificial Intelligence 34:10810–10817
Article Google Scholar
Chen W, Chai Y, Qi M, Sun H, Pu Q, Kong J, Zheng C (2022) Bottomup improved multistage temporal convolutional network for action segmentation. Appl Intell 52(12):14053–14069
Article Google Scholar
Lin T, Zhao X, Su H, Wang C, Yang M (2018) Bsn: Boundary sensitive network for temporal action proposal generation. In: Proceedings of the European Conference on Computer Vision (ECCV) pp, 3–19
Bai Y, Wang Y, Tong Y, Yang Y, Liu Q, Liu J (2020) Boundary content graph neural network for temporal action proposal generation, In: European Conference on Computer Vision pp, 121–137. Springer
Su H, Gan W, Wu W, Qiao Y, Yan J (2021) Bsn++: Complementary boundary regressor with scale-balanced relation modeling for temporal action proposal generation. Proceedings of the AAAI Conference on Artificial Intelligence 35:2602–2610
Article Google Scholar
Xu M, Zhao C, Rojas D.S, Thabet A, Ghanem B (2020) G-tad: Sub-graph localization for temporal action detection, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp, 10156–10165
Lin C, Li J, Wang Y, Tai Y, Luo D, Cui Z, Wang C, Li J, Huang F, Ji R (2020) Fast learning of temporal action proposal via dense boundary generator. Proceedings of the AAAI Conference on Artificial Intelligence 34:11499–11506
Article Google Scholar
Lin T, Liu X, Li X, Ding E, Wen S (2019) Bmn: Boundary-matching network for temporal action proposal generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp, 3889–3898
Vo-Ho VK, Le N, Kamazaki K, Sugimoto A, Tran MT (2021) Agentenvironment network for temporal action proposal generation. In: ICASSP 2021- 2021 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) pp, 2160–2164
Yao G, Lei T, Zhong J, Jiang P (2019) Learning multi-temporal-scale deep information for action recognition. Appl Intell 49:2017–2029
Article Google Scholar
Du Z, Mukaidani H (2022) Linear dynamical systems approach for human action recognition with dual-stream deep features. Appl Intell 52(1):452–470
Article Google Scholar
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks, In: Proceedings of the IEEE International Conference on Computer Vision pp, 4489–4497
Jiang G, Jiang X, Fang Z, Chen S (2021) An efficient attention module for 3d convolutional neural networks in action recognition. Appl Intell 51(10):7043–7057
Article Google Scholar
Lin J, Gan C, Han S (2019) Tsm: Temporal shift module for efficient video understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp, 7083–7093
Jiang B, Wang M, Gan W, Wu W, Yan J (2019) Stm: Spatiotemporal and motion encoding for action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp, 2000–2009
Li Y, Ji B, Shi X, Zhang J, Kang B., Wang L (2020) Tea: Temporal excitation and aggregation for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp, 906–915
Wu Z, Xiong C, Ma CY, Socher R, Davis LS (2019) Adaframe: Adaptive frame selection for fast video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp, 1278–1287
Gao Z, Guo L, Ren T, Liu AA, Cheng ZY, Chen S (2020) Pairwise two-stream convnets for cross-domain action recognition with small data. IEEE Transactions on Neural Networks and Learning Systems 33(3):1147–1161
Article Google Scholar
Gurunlu B, Ozturk S (2022) Efficient approach for block-based copy-move forgery detection. Smart Trends in Computing and Communications: Proceedings of SmartCom 2021:167–174
Article Google Scholar
Shou Z, Wang D, Chang SF (2016) Temporal action localization in untrimmed videos via multi-stage cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp, 1049–1058
Gao J, Yang Z, Chen K, Sun C, Nevatia R (2017) Turn tap: Temporal unit regression network for temporal action proposals. In: Proceedings of the IEEE International Conference on Computer Vision pp, 3628–3638
Huang J, Li N, Zhang T, Li G, Huang T, Gao W (2018) Sap: Self-adaptive proposal model for temporal action detection based on reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence 32:6951–6958
Article Google Scholar
Eun H, Lee S, Moon J, Park J, Jung C, Kim C (2020) Srg: Snippet relatednessbased temporal action proposal generator. IEEE Trans Circuits Syst Video Technol 30(11):4232–4244
Article Google Scholar
Tan J, Tang J, Wang L, Wu G (2021) Relaxed transformer decoders for direct action proposal generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp, 13526–13535
Liu Y, Chen J, Chen X, Deng B, Huang J, Hua XS (2022) Centerness-aware network for temporal action proposal. IEEE Trans Circuits Syst Video Technol 32(1):5–16
Article Google Scholar
Yang H, Wu W, Wang L, Jin S, Xia B, Yao H, Huang H (2022) Temporal action proposal generation with background constraint. Proceedings of the AAAI Conference on Artificial Intelligence 36:3054–3062
Article Google Scholar
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp, 7794–7803
Chen P, Gan C, Shen G, Huang W, Zeng R, Tan M (2020) Relation attention for temporal action localization. IEEE Trans Multimedia 22(10):2723–2733
Article Google Scholar
Gao L, Li T, Song J, Zhao Z, Shen HT (2020) Play and rewind: Context-aware video temporal action proposals. Pattern Recogn 107:107477
Article Google Scholar
Zhao Y, Zhang H, Gao Z, Guan W, Nie J, Liu A, Wang M, Chen S (2022) A temporal-aware relation and attention network for temporal action localization. IEEE Trans Image Process 31:4746–4760
Bodla N, Singh B, Chellappa R, Davis L.S (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision pp, 5561–5569
Liu S, Zhao X, Su H, Hu Z (2020) Tsi: Temporal scale invariant network for action proposal generation. Proceedings of the Asian Conference on Computer Vision 12626:530–546
Google Scholar
Caba Heilbron F, Escorcia V, Ghanem B, Carlos Niebles J (2015) Activitynet: A large-scale video benchmark for human activity understanding. In: Proceedings of the Ieee Conference on Computer Vision and Pattern Recognition pp, 961–970
Idrees H, Zamir AR, Jiang Y.-G, Gorban A, Laptev I, Sukthankar R, Shah M () The thumos challenge on action recognition for videos “in the wild”. Computer Vision and Image Understanding 155:1–23
Zhao H, Torralba A, Torresani L, Yan Z (2017) Hacs: Human action clips and segments dataset for recognition and temporal localization, In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp, 8668–8678 (2019)
Alwassel H, Giancola S, Ghanem B (2021) Tsp: Temporally-sensitive pretraining of video encoders for localization tasks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp, 3173–3183
Qing Z, Su H, Gan W, Wang D, Wu W, Wang X, Qiao Y, Yan J, Gao, C, Sang N (2021) Temporal context aggregation network for temporal action proposal refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp, 485–494
Chen G, Zheng YD, Wang L, Lu T (2022) Dcan: improving temporal action detection via dual context aggregation. Proceedings of the AAAI Conference on Artificial Intelligence 36:248–257
Article Google Scholar
Shang J, Wei P, Li H, Zheng N (2023) Multi-scale interaction transformer for temporal action proposal generation. Image Vis Comput 129:104589
Article Google Scholar
Gan MG, Zhang Y (2023) Temporal attention-pyramid pooling for temporal action detection. IEEE Trans Multimedia 25:3799–3810
Su T, Wang H, Wang L (2023) Multi-level content-aware boundary detection for temporal action proposal generation. IEEE Trans Image Process S32:6090–6101
Article Google Scholar
Vo K, Truong S, Yamazaki K, Raj B, Tran MT, Le N (2023) Aoe-net: Entities interactions modeling with adaptive attention mechanism for temporal action proposals generation. Int J Comput Vision 131(1):302–323
Article Google Scholar
Liu Y, Ma L, Zhang Y, Liu W, Chang SF (2019) Multi-granularity generator for temporal action proposal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp, 3604–3613
Xia K, Wang L, Zhou S, Hua G, Tang W (2022) Dual relation network for temporal action localization. Pattern Recogn 129:108725
Article Google Scholar
Liu Q, Wang Z, Rong S (2023) Improve temporal action proposals using hierarchical context. Pattern Recogn 140:109560
Article Google Scholar
Zhao Y, Xiong Y, Wang L, Wu Z, Tang X, Lin D (2017) Temporal action detection with structured segment networks. In: Proceedings of the IEEE International Conference on Computer Vision pp, 2914–2923
Xu M, Zhao C, Rojas D.S, Thabet A, Ghanem B (2020) G-tad: Sub-graph localization for temporal action detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp, 10153–10162
Chen Y, Guo B, Shen Y, Wang W, Lu W, Suo X (2021) Boundary graph convolutional network for temporal action detection. Image Vis Comput 109:104144
Article Google Scholar
Qin X, Zhao H, Lin G, Zeng H, Xu S, Li X (2022) Pcmnet: Position-sensitive context modeling network for temporal action localization. Neurocomputing 510:48–58
Article Google Scholar
Xia K, Wang L, Shen Y, Zhou S, Hua G, Tang W (2023) Exploring action centers for temporal action localization. IEEE Trans Multimedia 25:9425–9436
Article Google Scholar
Xing K, Li T, Wang X (2023) Proposalvlad with proposal-intra exploring for temporal action proposal generation. ACM Transactions on Multimedia Computing. Communications and Applications 19(3):1–18
Zeng R, Huang W, Tan M, Rong Y, Zhao P, Huang J, Gan C (2019) Graph convolutional networks for temporal action localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp, 7094–7103
Liu Q, Wang Z (2020) Progressive boundary refinement network for temporal action detection. Proceedings of the AAAI Conference on Artificial Intelligence 34:11612–11619
Article Google Scholar
Vo K, Yamazaki K, Truong S, Tran M-T, Sugimoto A, Le N (2021) Abn: Agentaware boundary networks for temporal action proposal generation. IEEE Access 9:126431–126445
Article Google Scholar
Xu M, Perez Rua JM, Zhu X, Ghanem B, Martinez B (2021) Low-fidelity video encoder optimization for temporal action localization. Adv Neural Inf Process Syst 34:9923–9935
Google Scholar
Liu X, Wang Q, Hu Y, Tang X, Zhang S, Bai S, Bai X (2022) End-to-end temporal action detection with transformer. IEEE Trans Image Process 31:5427–5441
Article Google Scholar
Maaten L, Hinton GE (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
Google Scholar

Download references

Acknowledgements

The research work was supported by the Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing.

Author information

Authors and Affiliations

School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an, 710121, Shaanxi, China
Xiaoying Pan, Nijuan Zhang, Hewei Xie, Shoukun Li & Tong Feng
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi’an, 710121, Shaanxi, China
Xiaoying Pan, Nijuan Zhang, Hewei Xie, Shoukun Li & Tong Feng

Authors

Xiaoying Pan
View author publications
You can also search for this author in PubMed Google Scholar
Nijuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hewei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Shoukun Li
View author publications
You can also search for this author in PubMed Google Scholar
Tong Feng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xiao Ying Pan (First Author):Supervision,Conceptulization,Writing-Review & Editing,Formal Analysis,Methodology,Project Administration;Ni Juan Zhang:Conceptulization,Methodology, Data Curation, Writing, Validation, software, Formal Analysis,Visualization;He Wei Xie :Data Curation;Validation; Shou Kun Li:Data Curation;Tong Feng:Visualization;

Corresponding author

Correspondence to Xiaoying Pan.

Ethics declarations

Competing Interests

No potential conflict of interest was reported by the authors.

Ethical and informed consent for data used

I guarantee that the data used is from the official source and authorized by the official. Otherwise, I will bear the responsibility.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Notations

Below is a table that summarizes and briefly describes the important symbols in this paper:

Table 14 Description of the symbols used in this paper

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pan, X., Zhang, N., Xie, H. et al. MBGNet:Multi-branch boundary generation network with temporal context aggregation for temporal action detection. Appl Intell 54, 9045–9066 (2024). https://doi.org/10.1007/s10489-024-05664-y

Download citation

Accepted: 30 June 2024
Published: 09 July 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s10489-024-05664-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MBGNet:Multi-branch boundary generation network with temporal context aggregation for temporal action detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

BSN: Boundary Sensitive Network for Temporal Action Proposal Generation

Boundary discrimination and proposal evaluation for temporal action proposal generation

TAN: a temporal-aware attention network with context-rich representation for boosting proposal generation

Explore related subjects

Data Availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Appendix: Notations

Appendix: Notations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now