skip to main content
10.1145/3581783.3612342acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Patch-Aware Representation Learning for Facial Expression Recognition

Published: 27 October 2023 Publication History

Abstract

Existing methods for facial expression recognition (FER) lack the utilization of prior facial knowledge, primarily focusing on expression-related regions while disregarding explicitly processing expression-independent information. This paper proposes a patch-aware FER method that incorporates facial keypoints to guide the model and learns precise representations through two collaborative streams, addressing these issues. First, facial keypoints are detected using a facial landmark detection algorithm, and the facial image is divided into equal-sized patches using the Patch Embedding Module. Then, a correlation is established between the keypoints and patches using a simplified conversion relationship. Two collaborative streams are introduced, each corresponding to a specific mask strategy. The first stream masks patches corresponding to the keypoints, excluding those along the facial contour, with a certain probability. The resulting image embedding is input into the Encoder to obtain expression-related features. The features are passed through the Decoder and Classifier to reconstruct the masked patches and recognize the expression, respectively. The second stream masks patches corresponding to all the above keypoints. The resulting image embedding is input into the Encoder and Classifier successively, with the resulting logit approximating a uniform distribution. Through the first stream, the Encoder learns features in the regions related to expression, while the second stream enables the Encoder to better ignore expression-independent information, such as the background, facial contours, and hair. Experiments on two benchmark datasets demonstrate that the proposed method outperforms state-of-the-art methods.

References

[1]
Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. 2021. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021).
[2]
Prashanth Chandran, Derek Bradley, Markus Gross, and Thabo Beeler. 2020. Attention-driven cropping for very high resolution facial landmark detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5861--5870.
[3]
Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, and Jingdong Wang. 2022. Context autoencoder for self-supervised representation learning. arXiv preprint arXiv:2202.03026 (2022).
[4]
Arnaud Dapogny, Kevin Bailly, and Matthieu Cord. 2019. Decafa: Deep convolutional cascade for face alignment in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6893--6901.
[5]
Charles Darwin and Phillip Prodger. 1998. The expression of the emotions in man and animals. Oxford University Press, USA.
[6]
Xuanyi Dong and Yi Yang. 2019. Teacher supervises students how to learn from partially labeled images for facial landmark detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 783--792.
[7]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[8]
Amir Hossein Farzaneh and Xiaojun Qi. 2021. Facial expression recognition in the wild via deep attentive center loss. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2402--2411.
[9]
Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, and Xiao-Jun Wu. 2018. Wing loss for robust facial landmark localisation with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2235--2245.
[10]
SL Happy and Aurobinda Routray. 2014. Automatic facial expression recognition using features of salient facial patches. IEEE transactions on Affective Computing, Vol. 6, 1 (2014), 1--12.
[11]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000--16009.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[13]
Mahdi Jampour, Vincent Lepetit, Thomas Mauthner, and Horst Bischof. 2017. Pose-specific non-linear mappings in feature space towards multiview facial expression recognition. Image and vision computing, Vol. 58 (2017), 38--46.
[14]
Ying-Hsiu Lai and Shang-Hong Lai. 2018. Emotion-preserving representation learning via generative adversarial network for multi-view facial expression recognition. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 263--270.
[15]
Hanting Li, Mingzhe Sui, Feng Zhao, Zhengjun Zha, and Feng Wu. 2021. MVT: mask vision transformer for facial expression recognition in the wild. arXiv preprint arXiv:2106.04520 (2021).
[16]
Shan Li and Weihong Deng. 2020. Deep facial expression recognition: A survey. IEEE transactions on affective computing, Vol. 13, 3 (2020), 1195--1215.
[17]
Shan Li, Weihong Deng, and JunPing Du. 2017. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2852--2861.
[18]
Yong Li, Jiabei Zeng, Shiguang Shan, and Xilin Chen. 2018. Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Transactions on Image Processing, Vol. 28, 5 (2018), 2439--2450.
[19]
Fuyan Ma, Bin Sun, and Shutao Li. 2021. Facial expression recognition with visual transformers and attentional selective fusion. IEEE Transactions on Affective Computing (2021).
[20]
Delian Ruan, Yan Yan, Si Chen, Jing-Hao Xue, and Hanzi Wang. 2020. Deep disturbance-disentangled learning for facial expression recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2833--2841.
[21]
Y-I Tian, Takeo Kanade, and Jeffrey F Cohn. 2001. Recognizing action units for facial expression analysis. IEEE Transactions on pattern analysis and machine intelligence, Vol. 23, 2 (2001), 97--115.
[22]
Can Wang, Shangfei Wang, and Guang Liang. 2019. Identity-and pose-robust facial expression recognition through adversarial feature learning. In Proceedings of the 27th ACM international conference on multimedia. 238--246.
[23]
Kai Wang, Xiaojiang Peng, Jianfei Yang, Shijian Lu, and Yu Qiao. 2020a. Suppressing uncertainties for large-scale facial expression recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6897--6906.
[24]
Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, and Yu Qiao. 2020b. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Transactions on Image Processing, Vol. 29 (2020), 4057--4069.
[25]
Siyue Xie and Haifeng Hu. 2018. Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks. IEEE Transactions on Multimedia, Vol. 21, 1 (2018), 211--220.
[26]
Siyue Xie, Haifeng Hu, and Yongbo Wu. 2019. Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern recognition, Vol. 92 (2019), 177--191.
[27]
Fanglei Xue, Qiangchang Wang, and Guodong Guo. 2021. Transfer: Learning relation-aware facial expression representations with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3601--3610.
[28]
Fanglei Xue, Qiangchang Wang, Zichang Tan, Zhongsong Ma, and Guodong Guo. 2022. Vision Transformer with Attentive Pooling for Robust Facial Expression Recognition. IEEE Transactions on Affective Computing (2022).
[29]
Lijun Yin, Xiaozhou Wei, Yi Sun, Jun Wang, and Matthew J Rosato. 2006. A 3D facial expression database for facial behavior research. In 7th international conference on automatic face and gesture recognition (FGR06). IEEE, 211--216.
[30]
Feifei Zhang, Mingliang Xu, and Changsheng Xu. 2021. Weakly-supervised facial expression recognition in the wild with noisy data. IEEE Transactions on Multimedia, Vol. 24 (2021), 1800--1814.
[31]
Feifei Zhang, Tianzhu Zhang, Qirong Mao, and Changsheng Xu. 2018. Joint pose and expression modeling for facial expression recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3359--3368.
[32]
Feifei Zhang, Tianzhu Zhang, Qirong Mao, and Changsheng Xu. 2020. Geometry guided pose-invariant facial expression recognition. IEEE Transactions on Image Processing, Vol. 29 (2020), 4445--4460.
[33]
Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, Jingwei Yan, and Keyu Yan. 2016. A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Transactions on Multimedia, Vol. 18, 12 (2016), 2528--2536.

Cited By

View all
  • (2024)Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680747(4236-4245)Online publication date: 28-Oct-2024
  • (2024)Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context TransformerProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680623(9340-9349)Online publication date: 28-Oct-2024
  • (2024)Intra-class Compact Facial Expression Recognition Based on Amplitude Phase SeparationMultiMedia Modeling10.1007/978-981-96-2061-6_13(169-182)Online publication date: 31-Dec-2024

Index Terms

  1. Patch-Aware Representation Learning for Facial Expression Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. facial expression recognition
    2. facial landmarks
    3. patch-aware
    4. two collaborative streams

    Qualifiers

    • Research-article

    Funding Sources

    • the National Key R & D program of China
    • the project from Anhui Science Technology Agency

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)79
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680747(4236-4245)Online publication date: 28-Oct-2024
    • (2024)Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context TransformerProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680623(9340-9349)Online publication date: 28-Oct-2024
    • (2024)Intra-class Compact Facial Expression Recognition Based on Amplitude Phase SeparationMultiMedia Modeling10.1007/978-981-96-2061-6_13(169-182)Online publication date: 31-Dec-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media