Abstract:
This letter presents a novel three-stream network for action recognition in extreme low resolution (LR) videos. In contrast to the existing networks, the new network uses...Show MoreMetadata
Abstract:
This letter presents a novel three-stream network for action recognition in extreme low resolution (LR) videos. In contrast to the existing networks, the new network uses the trajectory-spatial network, which is robust against visual distortion, instead of the pose information to complement the two-stream network. Also, the new three-stream network is combined with the inflated 3D ConvNet (I3D) model pre-trained on kinetics to produce more discriminative spatio-temporal features in blurred LR videos. Moreover, a bidirectional self-attention network is aggregated with the three-stream network to further manifest various temporal dependence among the spatio-temporal features. A new fusion strategy is devised as well to integrate the information from the three different modalities. Simulations show that the new architecture outperforms the main state-of-the-art extreme LR action recognition methods on the HMDB-51 and IXMAS datasets.
Published in: IEEE Signal Processing Letters ( Volume: 26, Issue: 8, August 2019)