Automatic emotion recognition for video clips has become a popular area of research in recent years. Previous studies have explored emotion recognition methods through monomodal approaches, such as voice, text, facial expression, and physiological information. We focus on the complementarity of the information and construct an automatic emotion recognition model based on deep learning technology and multimodal fusion strategy. In this model, visual features, audio features, and text features are extracted from the video clips. A decision-level fusion strategy, based on the theory of evidence, is proposed to fuse the multiple classification results. To solve the problem of evidence conflict in evidence theory, we study a compatibility algorithm designed to correct conflicting evidence based on the similarity matrix of the evidence. This approach is shown to improve the accuracy of emotion recognition. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 10 scholarly publications.
Feature extraction
Video
Databases
Sun
Convolution
Facial recognition systems
Neural networks