research-article

A Novel Temporal Channel Enhancement and Contextual Excavation Network for Temporal Action Localization

Authors:

Zan Gao,

Meng WangAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 6724 - 6733

https://doi.org/10.1145/3581783.3612167

Published: 27 October 2023 Publication History

Get Access

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

A Novel Temporal Channel Enhancement and Contextual Excavation Network for Temporal Action Localization

Pages 6724 - 6733

Abstract
Supplemental Material
References

Abstract

The temporal action localization (TAL) task aims to locate and classify action instances in untrimmed videos. Most previous methods use classifiers and locators to act on the same feature; thus, the classification and localization processes are relatively independent. Therefore, if the classification results and localization results are fused, there will be a problem that the classification results are correct while the localization results are wrong, resulting in inaccurate final results, and vice versa. To solve this problem, we propose a novel temporal channel enhancement and contextual excavation network (TCN) for the TAL task, which generates robust classification and localization features and refines the final localization results. Specifically, a temporal channel enhancement module is designed to enhance the temporal and channel information of the feature sequence. Then, the temporal semantic contextual excavation module is developed to establish relationships between similar frames. Finally, the features with enhanced contextual information are transferred to a classifier. While executing the classification process, we obtain powerful classification features. Most importantly, with the robust classification features, the final localization features are produced by the refine localization module, which is applied to obtain the final localization results. Extensive experiments show that TCN can outperform all the SOTA methods on the THUMOS14 dataset, and achieves a comparable performance on the ActivityNet1.3 dataset. Compared with ActionFormer (ECCV 2022) and BREM (MM 2022) on the THUMOS14 dataset, the proposed TCN can achieve improvements of 1.8% and 5.0%, respectively.

Supplemental Material

MP4 File

Paper 2074 presentation

Download
25.56 MB

References

[1]

Humam Alwassel, Silvio Giancola, and Bernard Ghanem. 2021. TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks. In IEEE/CVF International Conference on Computer Vision Workshops, ICCVW. 3166--3176.

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Anchor-free temporal action localization via Progressive Boundary-aware Boosting

Spatial–temporal correlations learning and action-background jointed attention for weakly-supervised temporal action localization

Complementary Temporal Classification Activation Maps in Temporal Action Localization

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations