Abstract:
Action Quality Assessment (AQA) plays a crucial role in action understanding, and addressing this task poses unique challenges due to the presence of subtle visual differ...Show MoreMetadata
Abstract:
Action Quality Assessment (AQA) plays a crucial role in action understanding, and addressing this task poses unique challenges due to the presence of subtle visual differences among actions. Existing action assessment works typically make an overall quality prediction on an entire video. However, the internal structural parsing of actions are important in action quality assessment, which enhances the interpretability of the scoring process. To explore this underlying structural relationship, we propose an action parsing transformer to disintegrate the holistic feature into more fine-grained step-wise representations. Specifically, we utilize a set of learnable queries to represent the step-wise patterns for a specific action and our decoding process converts the video representation to a fixed number of step representations. Moreover, to obtain quality scores, we further devise a score generation module encompassing multiple action scorers, each of which is uniquely associated with specific steps to predict the corresponding step score. Extensive experiments on two public AQA benchmarks suggest that our method well assesses the action quality and achieves outstanding performance.
Published in: 2023 IEEE International Conference on Visual Communications and Image Processing (VCIP)
Date of Conference: 04-07 December 2023
Date Added to IEEE Xplore: 29 January 2024
ISBN Information: