Feature Refinement with Masked Cascaded Network for Temporal Action Localization | IEEE Conference Publication | IEEE Xplore

Feature Refinement with Masked Cascaded Network for Temporal Action Localization


Abstract:

Despite the great progress in temporal action localization (TAL), most existing methods directly use video encoders trained on trimmed Kinetics400 dataset to obtain clip-...Show More

Abstract:

Despite the great progress in temporal action localization (TAL), most existing methods directly use video encoders trained on trimmed Kinetics400 dataset to obtain clip-level visual features, ignoring the cross-dataset bias between Kinetics400 and TAL benchmarks. Such a dataset bias leads to poor visual representation, potentially hindering performance in both temporal detection and action recognition for TAL. In this paper, we propose a novel TAL method, termed feature refinement with masked cascaded network (FR-MCN), to tackle the above problem. Specifically, FR-MCN presents a new feature refinement strategy by developing clip-level feature classification task for both action and background clips to improve temporal sensitivity and enhance action semantics of visual features. Moreover, FR-MCN employs a masked cascaded paradigm for refinement to learn semantic disparities between action and background clips near boundary, enabling the starting and ending instants to be detected accurately for TAL. Extensive experimental results on THUMOS14 and ActivityNetv1.3 demonstrate that our FR-MCN, can significantly improve the action localization performance.
Date of Conference: 04-07 December 2023
Date Added to IEEE Xplore: 29 January 2024
ISBN Information:

ISSN Information:

Conference Location: Jeju, Korea, Republic of

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.