research-article

Patch-Aware Representation Learning for Facial Expression Recognition

Authors:

Yanan ChangAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 6143 - 6151

https://doi.org/10.1145/3581783.3612342

Published: 27 October 2023 Publication History

Abstract

Existing methods for facial expression recognition (FER) lack the utilization of prior facial knowledge, primarily focusing on expression-related regions while disregarding explicitly processing expression-independent information. This paper proposes a patch-aware FER method that incorporates facial keypoints to guide the model and learns precise representations through two collaborative streams, addressing these issues. First, facial keypoints are detected using a facial landmark detection algorithm, and the facial image is divided into equal-sized patches using the Patch Embedding Module. Then, a correlation is established between the keypoints and patches using a simplified conversion relationship. Two collaborative streams are introduced, each corresponding to a specific mask strategy. The first stream masks patches corresponding to the keypoints, excluding those along the facial contour, with a certain probability. The resulting image embedding is input into the Encoder to obtain expression-related features. The features are passed through the Decoder and Classifier to reconstruct the masked patches and recognize the expression, respectively. The second stream masks patches corresponding to all the above keypoints. The resulting image embedding is input into the Encoder and Classifier successively, with the resulting logit approximating a uniform distribution. Through the first stream, the Encoder learns features in the regions related to expression, while the second stream enables the Encoder to better ignore expression-independent information, such as the background, facial contours, and hair. Experiments on two benchmark datasets demonstrate that the proposed method outperforms state-of-the-art methods.

References

[1]

Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. 2021. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021).

[2]

Prashanth Chandran, Derek Bradley, Markus Gross, and Thabo Beeler. 2020. Attention-driven cropping for very high resolution facial landmark detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5861--5870.

[3]

Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, and Jingdong Wang. 2022. Context autoencoder for self-supervised representation learning. arXiv preprint arXiv:2202.03026 (2022).

[4]

Arnaud Dapogny, Kevin Bailly, and Matthieu Cord. 2019. Decafa: Deep convolutional cascade for face alignment in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6893--6901.

[5]

Charles Darwin and Phillip Prodger. 1998. The expression of the emotions in man and animals. Oxford University Press, USA.

[6]

Xuanyi Dong and Yi Yang. 2019. Teacher supervises students how to learn from partially labeled images for facial landmark detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 783--792.

[7]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[8]

Amir Hossein Farzaneh and Xiaojun Qi. 2021. Facial expression recognition in the wild via deep attentive center loss. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2402--2411.

[9]

Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, and Xiao-Jun Wu. 2018. Wing loss for robust facial landmark localisation with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2235--2245.

[10]

SL Happy and Aurobinda Routray. 2014. Automatic facial expression recognition using features of salient facial patches. IEEE transactions on Affective Computing, Vol. 6, 1 (2014), 1--12.

[11]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000--16009.

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[13]

Mahdi Jampour, Vincent Lepetit, Thomas Mauthner, and Horst Bischof. 2017. Pose-specific non-linear mappings in feature space towards multiview facial expression recognition. Image and vision computing, Vol. 58 (2017), 38--46.

[14]

Ying-Hsiu Lai and Shang-Hong Lai. 2018. Emotion-preserving representation learning via generative adversarial network for multi-view facial expression recognition. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 263--270.

Digital Library

[15]

Hanting Li, Mingzhe Sui, Feng Zhao, Zhengjun Zha, and Feng Wu. 2021. MVT: mask vision transformer for facial expression recognition in the wild. arXiv preprint arXiv:2106.04520 (2021).

[16]

Shan Li and Weihong Deng. 2020. Deep facial expression recognition: A survey. IEEE transactions on affective computing, Vol. 13, 3 (2020), 1195--1215.

[17]

Shan Li, Weihong Deng, and JunPing Du. 2017. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2852--2861.

[18]

Yong Li, Jiabei Zeng, Shiguang Shan, and Xilin Chen. 2018. Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Transactions on Image Processing, Vol. 28, 5 (2018), 2439--2450.

[19]

Fuyan Ma, Bin Sun, and Shutao Li. 2021. Facial expression recognition with visual transformers and attentional selective fusion. IEEE Transactions on Affective Computing (2021).

Digital Library

[20]

Delian Ruan, Yan Yan, Si Chen, Jing-Hao Xue, and Hanzi Wang. 2020. Deep disturbance-disentangled learning for facial expression recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2833--2841.

Digital Library

[21]

Y-I Tian, Takeo Kanade, and Jeffrey F Cohn. 2001. Recognizing action units for facial expression analysis. IEEE Transactions on pattern analysis and machine intelligence, Vol. 23, 2 (2001), 97--115.

Digital Library

[22]

Can Wang, Shangfei Wang, and Guang Liang. 2019. Identity-and pose-robust facial expression recognition through adversarial feature learning. In Proceedings of the 27th ACM international conference on multimedia. 238--246.

Digital Library

[23]

Kai Wang, Xiaojiang Peng, Jianfei Yang, Shijian Lu, and Yu Qiao. 2020a. Suppressing uncertainties for large-scale facial expression recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6897--6906.

[24]

Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, and Yu Qiao. 2020b. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Transactions on Image Processing, Vol. 29 (2020), 4057--4069.

Digital Library

[25]

Siyue Xie and Haifeng Hu. 2018. Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks. IEEE Transactions on Multimedia, Vol. 21, 1 (2018), 211--220.

Digital Library

[26]

Siyue Xie, Haifeng Hu, and Yongbo Wu. 2019. Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern recognition, Vol. 92 (2019), 177--191.

[27]

Fanglei Xue, Qiangchang Wang, and Guodong Guo. 2021. Transfer: Learning relation-aware facial expression representations with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3601--3610.

[28]

Fanglei Xue, Qiangchang Wang, Zichang Tan, Zhongsong Ma, and Guodong Guo. 2022. Vision Transformer with Attentive Pooling for Robust Facial Expression Recognition. IEEE Transactions on Affective Computing (2022).

Digital Library

[29]

Lijun Yin, Xiaozhou Wei, Yi Sun, Jun Wang, and Matthew J Rosato. 2006. A 3D facial expression database for facial behavior research. In 7th international conference on automatic face and gesture recognition (FGR06). IEEE, 211--216.

Digital Library

[30]

Feifei Zhang, Mingliang Xu, and Changsheng Xu. 2021. Weakly-supervised facial expression recognition in the wild with noisy data. IEEE Transactions on Multimedia, Vol. 24 (2021), 1800--1814.

[31]

Feifei Zhang, Tianzhu Zhang, Qirong Mao, and Changsheng Xu. 2018. Joint pose and expression modeling for facial expression recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3359--3368.

[32]

Feifei Zhang, Tianzhu Zhang, Qirong Mao, and Changsheng Xu. 2020. Geometry guided pose-invariant facial expression recognition. IEEE Transactions on Image Processing, Vol. 29 (2020), 4445--4460.

[33]

Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, Jingwei Yan, and Keyu Yan. 2016. A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Transactions on Multimedia, Vol. 18, 12 (2016), 2528--2536.

Digital Library

Cited By

Yang YWen LZeng XXu YWu XZhou JWang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680747(4236-4245)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680747
Li XWang TZhao JMao SWang JZheng FPeng XLi XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context TransformerProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680623(9340-9349)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680623
Tian XZhang YMu CZhang Z(2024)Intra-class Compact Facial Expression Recognition Based on Amplitude Phase SeparationMultiMedia Modeling10.1007/978-981-96-2061-6_13(169-182)Online publication date: 31-Dec-2024
https://doi.org/10.1007/978-981-96-2061-6_13

Index Terms

Patch-Aware Representation Learning for Facial Expression Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations

Recommendations

Expression-invariant face recognition by facial expression transformations

In this paper, we present a method of expression-invariant face recognition that transforms input face image with an arbitrary expression into its corresponding neutral facial expression image. When a new face image with an arbitrary expression is ...
Pose-robust feature learning for facial expression recognition

Automatic facial expression recognition (FER) from non-frontal views is a challenging research topic which has recently started to attract the attention of the research community. Pose variations are difficult to tackle and many face analysis methods ...
Local binary patterns for multi-view facial expression recognition

Research into facial expression recognition has predominantly been applied to face images at frontal view only. Some attempts have been made to produce pose invariant facial expression classifiers. However, most of these attempts have only considered ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the National Key R & D program of China
the project from Anhui Science Technology Agency

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
191
Total Downloads

Downloads (Last 12 months)79
Downloads (Last 6 weeks)5

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang YWen LZeng XXu YWu XZhou JWang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680747(4236-4245)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680747
Li XWang TZhao JMao SWang JZheng FPeng XLi XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context TransformerProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680623(9340-9349)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680623
Tian XZhang YMu CZhang Z(2024)Intra-class Compact Facial Expression Recognition Based on Amplitude Phase SeparationMultiMedia Modeling10.1007/978-981-96-2061-6_13(169-182)Online publication date: 31-Dec-2024
https://doi.org/10.1007/978-981-96-2061-6_13

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten