Abstract:
Facial expression recognition (FER) methods are fundamental in various human-computer interaction scenarios. Although deep learning-based models have made substantial pro...Show MoreMetadata
Abstract:
Facial expression recognition (FER) methods are fundamental in various human-computer interaction scenarios. Although deep learning-based models have made substantial progress in the FER field, they primarily focus on capturing facial appearance features while neglecting the importance of structure features, which encompass the overall shape and structure details of the key facial regions. We propose a Structure and Appearance Feature Cross-fusion Transformer (SAFCT) network to leverage structure and appearance features. Specifically, we introduce the gradient-based structure feature to simultaneously capture the overall face shape and local organ variations. For appearance features, we extract both global and landmarks-guided local features to capture global texture and local details. Furthermore, we employ the structure-dominated cross-fusion transformer to integrate these three facial features. Through extensive experimental results, we evaluate the state-of-the-art recognition performance of SAFCT on widely used FER datasets.
Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information: