Skip to main content
Log in

Facial Expression Recognition Based on Depth Fusion and Discriminative Association Learning

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

As an effective deep semi-supervised network, deep discrimination association learning has achieved impressive performance in facial expression recognition (FER). However, the instability of facial appearance (i.e. illumination change) is still a big challenge for FER. To this point, we propose a novel multi-modal deep discriminative association (MDDA) framework to better exploit facial depth information and unlabeled data. Firstly, the facial depth map is generated via 3D face reconstruction and encoded in three channels to learn more representative features. Secondly, we design a novel deep multi-loss semi-supervised network based on association learning and exploit multi-modal information through cross fusion mechanism. We evaluate the proposed method on RaFD and Oulu-CASIA datasets and achieve accuracies of 95.64% and 66.75%, respectively. Compared to the existing deep discrimination association learning approach, the accuracies are increased by 1.01% and 4.88% with the encoded facial depth map information. Moreover, extensive experiments confirm that the proposed approach has comparable performance compared to existing deep networks and is more robust to illumination changes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. When the distribution of “true label” of the unlabeled features \({\varvec{b}}_j\)’s is not uniform (which is often true in real-world applications), additional parameters may be needed to refine the visit loss.

References

  1. Kallipolitis A, Galliakis M, Menychtas A, Maglogiannis I (2020) Affective analysis of patients in homecare video-assisted telemedicine using computational intelligence. Neural Comput Appl 32(23):17125–17136

    Article  Google Scholar 

  2. Ashir AM, Eleyan A, Akdemir B (2020) Facail expression recognition with dynamic cascaded classfier. Neural Comput Appl 32(10):6295–6309

    Article  Google Scholar 

  3. Zhu F, Gao J, Xu C, Yang J, Tao D (2017) On selecting effective patterns for fast support vector regression training. IEEE Trans Neural Netw Learn Syst 29(8):3610–3622

    MathSciNet  Google Scholar 

  4. Zhu F, Ning Y, Chen X, Zhao Y, Gang Y (2021) On removing potential redundant constraints for svor learning. Appl Soft Comput 102:106941

    Article  Google Scholar 

  5. Hu B, Zheng Z, Liu P, Yang W, Ren M (2020) Unsupervised eyeglasses removal in the wild. IEEE Transactions on Cybernetics

  6. Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre Ç, Memisevic R, Vincent P, Courville A, Bengio Y, Ferrari RC, et al. (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 543–550

  7. Levi G, Hassner T (2015) Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 503–510

  8. Osipov V, Nikiforov V, Zhukova N, Miloserdov D (2020) Urban traffic flows forecasting by recurrent neural networks with spiral structures of layers. Neural Computing and Applications pp. 1–13

  9. Jain DK, Mahati A, Shamsolmoali P, Manikandan R (2020) Deep neural learning techniques with long short-term memory for gesture recognition. Neural Computing and Applications pp. 1–17

  10. Ebrahimi Kahou S, Michalski V, Konda K, Memisevic R, Pal C (2015) Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474

  11. Walecki R, Rudovic O, Pavlovic V, Schuller B, Pantic M (2017) Deep structured learning for facial expression intensity estimation. Image Vis Comput 259:143–154

    Google Scholar 

  12. Kim DH, Baddar W, Jang J, Ro YM (2017) Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Trans Affect Comput 10(2):223–236

    Article  Google Scholar 

  13. Yao A, Cai D, Hu P, Wang S, Sha L, Chen Y (2016) Holonet: towards robust emotion recognition in the wild. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 472–478

  14. Hu P, Cai D, Wang S, Yao A, Chen Y (2017) Learning supervised scoring ensemble for emotion recognition in the wild. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 553–560

  15. Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision, pp. 499–515. Springer

  16. Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, pp. 302–309

  17. Sun W, Zhao H, Jin Z (2019) A facial expression recognition method based on ensemble of 3d convolutional neural networks. Neural Comput Appl 31(7):2795–2812

    Article  Google Scholar 

  18. Gao Y, Ma J, Yuille AL (2017) Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples. IEEE Trans Image Process 26(5):2545–2560

    Article  MathSciNet  Google Scholar 

  19. Haeusser P, Mordvintsev A, Cremers D (2017) Learning by association–a versatile semi-supervised training method for neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 89–98

  20. Jin X, Sun W, Jin Z (2020) A discriminative deep association learning for facial expression recognition. Int J Mach Learn Cybern 11(4):779–793

    Article  Google Scholar 

  21. Pantic M, Rothkrantz LJ (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445

    Article  Google Scholar 

  22. Yang B, Cao JM, Jiang DP, Lv JD (2017) Facial expression recognition based on dual-feature fusion and improved random forest classifier. Multimedia Tools and Applications pp. 1–23

  23. Jan A, Ding H, Meng H, Chen L, Li H (2018) Accurate facial parts localization and deep learning for 3d facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, pp. 466–472

  24. Ijjina EP, Mohan CK (2014) Facial expression recognition using kinect depth sensor and convolutional neural networks. In: 2014 13th International Conference on Machine Learning and Applications, pp. 392–396

  25. Uddin MZ, Hassan MM, Almogren A, Zuair M, Fortino G, Torresen J (2017) A facial expression recognition system using robust face features from depth videos and deep learning. Comput Electr Eng 63:114–125

    Article  Google Scholar 

  26. Moeini A, Moeini H (2014) Multimodal facial expression recognition based on 3d face reconstruction from 2d images. In: International Workshop on Face and Facial Expression Recognition from Real World Videos, pp. 46–57. Springer

  27. Wang H, Zhang S, Wang X, Zhang Y (2018) Sparse-region net: Local-enhanced facial depthmap reconstruction from a single face image. In: Pacific Rim Conference on Multimedia, pp. 663–673. Springer

  28. Avinash P, Sharma M (2019) Predicting forward & backward facial depth maps from a single rgb image for mobile 3d ar application. In: International Conference on 3D Immersion, pp. 1–8. IEEE

  29. Eitel A, Springenberg JT, Spinello L, Riedmiller M, Burgard W (2015) Multimodal deep learning for robust rgb-d object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 681–687

  30. Li G, Gan Y, Wu H, Xiao N, Lin L (2018) Cross-modal attentional context learning for rgb-d object detection. IEEE Trans Image Process 28(4):1591–1601

    Article  MathSciNet  Google Scholar 

  31. Oyedotun OK, Demisse G, El Rahman Shabayek A, Aouada D, Ottersten B (2017) Facial expression recognition via joint deep learning of rgb-depth map latent representations. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3161–3168

  32. Liu L, Zhang H, Zhou D (2021) Clothing generation by multi-modal embedding: a compatibility matrix-regularized gan model. Image Vision Comput 107:104097

    Article  Google Scholar 

  33. Caltagirone L, Bellone M, Svensson L, Wahde M (2019) Lidar-camera fusion for road detection using fully convolutional neural networks. Robot Auton Syst 111:125–131

    Article  Google Scholar 

  34. Feng Y, Wu F, Shao X, Wang Y, Zhou X (2018) Joint 3d face reconstruction and dense alignment with position map regression network. In: Proceedings of the European Conference on Computer Vision, pp. 534–551

  35. Jain DK, Shamsolmoali P, Sehdev P (2019) Extended deep neural network for facial emotion recognition. Pattern Recognit Lett 120:69–74

    Article  Google Scholar 

  36. Mollahosseini A, Chan DM, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE Winter Conference on Applications for Computer Vision, pp. 1–10

  37. Sun W, Zhao H, Jin Z (2018) A visual attention based roi detection method for facial expression recognition. Neurocomputing 296:12–22

    Article  Google Scholar 

  38. Cheng X, Miao Z, Qiu Q (2020) Graph convolution with low-rank learn-able local filters. arXiv e-prints pp. arXiv–2008

  39. Cugu I, Sener E, Akbas E (2019) Microexpnet: An extremely small and fast model for expression recognition from face images. In: 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6. IEEE

  40. Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 357–360

  41. Zhu X, Lei Z, Liu X, Shi H, Li SZ (2016) Face alignment across large poses: a 3d solution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 146–155

  42. Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  43. Liu Y, Jourabloo A, Ren W, Liu X (2017) Dense face alignment. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1619–1628

  44. Jackson AS, Bulat A, Argyriou V, Tzimiropoulos G (2017) Large pose 3d face reconstruction from a single image via direct volumetric cnn regressions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1031–1039

  45. Figueiredo MA, Nowak RD (2001) Wavelet-based image estimation: an empirical bayes approach using jeffrey’s noninformative prior. IEEE Trans Image Process 10(9):1322–1331

    Article  MathSciNet  Google Scholar 

  46. Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from rgb-d images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer

  47. Langner O, Dotsch R, Bijlstra G, Wigboldus DH, Hawk ST, Van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cognit Emot 24(8):1377–1388

    Article  Google Scholar 

  48. Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vision Comput 29(9):607–619

    Article  Google Scholar 

  49. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167

  50. Wang J, Li X, Ling CX (2018) Pelee: a real-time object detection system on mobile devices. In: Advances in Neural Information Processing Systems, pp. 1967–1976

  51. Ferro-Pérez R, Mitre-Hernandez H (2020) Resmonet: a residual mobile-based network for facial emotion recognition in resource-limited systems. arXiv preprint arXiv:2005.07649

  52. Aslam A, Hussian B (2021) Emotion recognition techniques with rule based and machine learning approaches. arXiv preprint arXiv:2103.00658

  53. Jin X, Lai Z, Jin Z (2021) Learning dynamic relationships for facial expression recognition based on graph convolutional network. IEEE Transactions on Image Processing

  54. Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: 2008-19th British Machine Vision Conference, pp. 275–285

  55. Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2983–2991

  56. Sikka K, Sharma G, Bartlett M (2016) Lomo: Latent ordinal model for facial analysis in videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5580–5589

  57. Sikka K, Dhall A, Bartlett M (2015) Exemplar hidden markov models for classification of facial expressions in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 18–25

  58. Zhang H, Huang B, Tian G (2020) Facial expression recognition based on deep convolution long short-term memory networks of double-channel weighted mixture. Pattern Recognit Lett 131:128–134

    Article  Google Scholar 

  59. Shuvendu R, Ali E (2021) Spatiotemporal contrastive learning of facial expressions in videos. In: 9th International Conference on Affective Computing and Intelligent Interaction (ACII)

Download references

Acknowledgements

This work is partially supported by National Natural Science Foundation of China under Grant Nos 61872188, 61861136011, 61972204, 61902250 and 61976145.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhong Jin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, X., Lai, Z., Sun, W. et al. Facial Expression Recognition Based on Depth Fusion and Discriminative Association Learning. Neural Process Lett 54, 2025–2047 (2022). https://doi.org/10.1007/s11063-021-10717-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10717-1

Keywords

Navigation