Learning Cross-Modal Audiovisual Representations with Ladder Networks for Emotion Recognition | IEEE Conference Publication | IEEE Xplore

Learning Cross-Modal Audiovisual Representations with Ladder Networks for Emotion Recognition


Abstract:

Representation learning is a challenging, but essential task in audiovisual learning. A key challenge is to generate strong cross-modal representations while still captur...Show More

Abstract:

Representation learning is a challenging, but essential task in audiovisual learning. A key challenge is to generate strong cross-modal representations while still capturing discriminative information contained in unimodal features. Properly capturing this information is important to increase accuracy and robustness in audiovisual tasks. Focusing on emotion recognition, this study proposes novel cross-modal ladder networks to capture modality-specific information while building strong cross-modal representations. Our method utilizes representations from a backbone network to implement unsupervised auxiliary tasks to reconstruct intermediate layer representations across the acoustic and visual networks. The skip connections between the cross-modal encoder and decoder provide powerful modality-specific and multimodal representations for emotion recognition. Our model on the CREMA-D corpus achieves high performance with precision, recall, and F1 scores over 80% on a six-class problem.
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information:

ISSN Information:

Conference Location: Rhodes Island, Greece

Contact IEEE to Subscribe

References

References is not available for this document.