Abstract:
Weakly supervised temporal action localization (WSTAL) is a practical but challenging issue in video understanding. However, most existing methods have to activate backgr...Show MoreMetadata
Abstract:
Weakly supervised temporal action localization (WSTAL) is a practical but challenging issue in video understanding. However, most existing methods have to activate background snippets or deactivate action snippets in cases of no boundary annotations, which inevitably affects the localization of action instances. In this letter, we propose a spatial enhancement and temporal constraint (SETC) model to address this problem from three aspects. Specifically, we first propose a spatial enhancement module to enhance the discrimination of the extracted features. Then we leverage the instance sparse constraint to restrain the drastic fluctuation class activation sequence (CAS). Finally, we use the confidence connectivity enhancement to connect the snippets that are broken up by mistake. Experiments on THUMOS'14 and ActivityNet datasets validate the efficacy of SETC against existing state-of-the-art WSTAL algorithms.
Published in: IEEE Signal Processing Letters ( Volume: 27)