Impact Statement:GCN has been widely used in skeleton-based action recognition. In this article, a temporal refinement graph convolution module with contrastive learning mechanism is prop...Show More
Abstract:
Human skeleton data, which has served in the aspect of human activity recognition, ought to be the most representative biometric characteristics due to its intuitivity an...Show MoreMetadata
Impact Statement:
GCN has been widely used in skeleton-based action recognition. In this article, a temporal refinement graph convolution module with contrastive learning mechanism is proposed to better modeling the latent features of motional dynamics by assigning different importance on channel and spatiotemporal dimension, and maximizing the learned mutual representatives. An interframe correlation matrix is proposed to embed the distant temporal correlations of frame-pairs to the skeletal representatives and generalize GCN operator to temporal domain. A STCA module is proposed to establish the short-range and long-term dependencies of skeletal sequence through hierarchically enlarging the receptive field by the successive feature flows within feature branches. The overall framework consists of the three above designed novelties, which can effectively improve action recognition performances on public datasets.
Abstract:
Human skeleton data, which has served in the aspect of human activity recognition, ought to be the most representative biometric characteristics due to its intuitivity and visuality. The state-of-the-art approaches mainly focus on improving modeling spatial correlations within graph topologies. However, the interframes motional representations are also of vital importance, and we argue that they are worth paying attention to and exploring. Therefore, a temporal refinement module with contrastive learning mechanism is proposed, fuzing as a complementary to the conventional spatial graph convolution layer. In addition, in order to further exploiting the interframe variances and generalizing graph convolutional network (GCN) operation to temporal dimension, a temporal-correlation matrix is introduced to effectively capture dynamic dependencies within frame-pairs, enhancing semantic feature representation. Moreover, since GCN is a typical local operator which lacks of capability to fully m...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 4, April 2024)