research-article

Blindfold Attention: Novel Mask Strategy for Facial Expression Recognition

Authors:
Bo Fu

Dalian University of Technology, Dalian, China

Dalian University of Technology, Dalian, China
View Profile

,
Yuanxin Mao

Liaoning Normal University, Dalian, China

Liaoning Normal University, Dalian, China
View Profile

,
Shilin Fu

Liaoning Normal University, Dalian, China

Liaoning Normal University, Dalian, China
View Profile

,
Yonggong Ren

Liaoning Normal University, Dalian, China

Liaoning Normal University, Dalian, China
View Profile

,
Zhongxuan Luo

Dalian University of Technology, Dalian, China

Dalian University of Technology, Dalian, China
View Profile

ICMR '22: Proceedings of the 2022 International Conference on Multimedia RetrievalJune 2022Pages 624–630https://doi.org/10.1145/3512527.3531416

Published:27 June 2022Publication History

ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval

Pages 624–630

ABSTRACT

Facial Expression Recognition (FER) is a basic and crucial computer vision task of classifying emotional expressions from human faces images into various emotion categories such as happy, sad, surprised, scared, angry, etc. Recently, facial expression recognition based on deep learning has made great progress. However, no matter the weight initialization technology or the attention mechanism, the face recognition method based on deep learning hard to capture those visually insignificant but semantically important features. To aid above question, in this paper we present a novel Facial Expression Recognition training strategy consisting of two components: Memo Affinity Loss (MAL) and Mask Attention Fine Tuning (MAFT). MAL is a variant of center loss, which uses memory bank strategy as well as discriminative center. MAL widens the distance between different clusters and narrows the distance within each cluster. Therefore, the features extracted by CNN were comprehensive and independent, which produced a more robust model. MAFT is a strategy that blindfolds attention parts temporarily and forces the model to learn from other important regions of the input image. It's not only an augmenting technique, but also a novel fine-tuning approach. As we know, we are the first to apply the mask strategy to the attention part and use this strategy to fine-tune the models. Finally, to implement our ideas, we constructed a new network named Architecture Attention ResNet based on ResNet-18. Our methods are conceptually and practically simple, but receives superior results on popular public facial expression recognition benchmarks with 88.75% on RAF-DB, 65.17% on AffectNet-7, 60.72% on AffectNet-8. The code will open source soon.

Supplemental Material

icmr-presentation (3531416).mp4

mp4

17.2 MB

Download

References

Tadas Baltrusaitis, Marwa Mahmoud, and Peter Robinson. 2015. Cross-Dataset Learning and Person-Specific Normalisation for Automatic Action Unit Detection. In 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, Ljubljana, 1--6. https://doi.org/10.1109/FG.2015.7284869Google ScholarDigital Library
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In Computer Vision -- ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12346. Springer International Publishing, Cham, 213--229. https://doi.org/10.1007/978--3-030--58452--8_13Google ScholarDigital Library
Mark Chen and Alec Radford. 2020. Generative Pretraining from Pixels. In International Conference on Machine Learning. PMLR, 1691--1703.Google Scholar
Shikai Chen, Jianfeng Wang, Yuedong Chen, Zhongchao Shi, Xin Geng, and Yong Rui. 2020. Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 13981--13990. https://doi.org/10.1109/CVPR42600.2020.01400Google Scholar
Yuedong Chen, Jianfeng Wang, Shikai Chen, Zhongchao Shi, and Jianfei Cai. 2019. Facial Motion Prior Networks for Facial Expression Recognition. In 2019 IEEE Visual Communications and Image Processing (VCIP). IEEE, Sydney, Australia, 1--4. https://doi.org/10.1109/VCIP47243.2019.8965826Google Scholar
Yifan Chen, Yang Wang, Pengjie Ren, Meng Wang, and Maarten de Rijke. 2022. Bayesian Feature Interaction Selection for Factorization Machines. Artificial Intelligence 302 (Jan. 2022), 103589. https://doi.org/10.1016/j.artint.2021.103589Google ScholarDigital Library
Arun Das, Jeffrey Mock, Yufei Huang, Edward Golob, and Peyman Najafirad. 2021. Interpretable Self-Supervised Facial Micro-Expression Learning to Predict Cognitive State and Neurological Disorders. (Nov. 2021), 9.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). arXiv:1810.04805 [cs]Google Scholar
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929 [cs] (June 2021). arXiv:2010.11929 [cs]Google Scholar
X. Feng, M. Pietikäinen, and A. Hadid. 2007. Facial Expression Recognition Based on Local Binary Patterns. Pattern Recognition and Image Analysis 17, 4 (Dec. 2007), 592--598. https://doi.org/10.1134/S1054661807040190Google ScholarCross Ref
Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, and Shi-Min Hu. 2021. Attention Mechanisms in Computer Vision: A Survey. arXiv:2111.07624 [cs] (Nov. 2021). arXiv:2111.07624 [cs]Google Scholar
Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. 2016. MS-Celeb-1M: Challenge of Recognizing One Million Celebrities in the Real World. Electronic Imaging 2016, 11 (Feb. 2016), 1--6. https://doi.org/10.2352/ISSN.2470--1173.2016.11.IMAWM-463Google Scholar
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2021. Masked Autoencoders Are Scalable Vision Learners. arXiv:2111.06377 [cs] (Dec. 2021). arXiv:2111.06377 [cs]Google Scholar
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. arXiv:1911.05722 [cs] (March 2020). arXiv:1911.05722 [cs]Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, 770--778. https://doi.org/10.1109/CVPR.2016.90Google Scholar
Qibin Hou, Daquan Zhou, and Jiashi Feng. 2021. Coordinate Attention for Efficient Mobile Network Design. arXiv:2103.02907 [cs] (March 2021). arXiv:2103.02907 [cs]Google Scholar
Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. 2019. Squeeze-and-Excitation Networks. arXiv:1709.01507 [cs] (May 2019). arXiv:1709.01507 [cs]Google Scholar
Qionghao Huang, Changqin Huang, Xizhe Wang, and Fan Jiang. 2021. Facial Expression Recognition with Grid-Wise Attention and Visual Transformer. Information Sciences 580 (Nov. 2021), 35--54. https://doi.org/10.1016/j.ins.2021.08.043Google ScholarDigital Library
Steven C. Y. Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-Song Chen. 2019. Compacting, Picking and Growing for Unforgetting Continual Learning. arXiv:1910.06562 [cs, stat] (Oct. 2019). arXiv:1910.06562 [cs, stat]Google Scholar
Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2016. Spatial Transformer Networks. arXiv:1506.02025 [cs] (Feb. 2016). arXiv:1506.02025 [cs]Google Scholar
Zeng Jiabei, Shan Shiguang, and Chen Xilin. 2018. Facial Expression Recognition with Inconsistently Annotated Datasets. In Proceedings of the European Conference on Computer Vision (ECCV).Google Scholar
Amine Kechaou, Manuel Martinez, Monica Haurilet, and Rainer Stiefelhagen. 2021. Detective: An Attentive Recurrent Model for Sparse Object Detection. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, Milan, Italy, 5340--5347. https://doi.org/10.1109/ICPR48806.2021.9412336Google Scholar
Dimitrios Kollias, Shiyang Cheng, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. Deep Neural Network Augmentation: Generating Faces for Affect Analysis. International Journal of Computer Vision 128, 5 (May 2020), 1455--1484. https://doi.org/10.1007/s11263-020-01304--3Google ScholarCross Ref
Hanting Li, Mingzhe Sui, Feng Zhao, Zhengjun Zha, and Feng Wu. 2021. MVT: Mask Vision Transformer for Facial Expression Recognition in the Wild. arXiv:2106.04520 [cs] (July 2021). arXiv:2106.04520 [cs]Google Scholar
Shan Li and Weihong Deng. 2020. Deep Facial Expression Recognition: A Survey. IEEE Transactions on Affective Computing (2020), 1--1. https://doi.org/10.1109/TAFFC.2020.2981446Google Scholar
Shan Li, Weihong Deng, and JunPing Du. 2017. Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, 2584--2593. https://doi.org/10.1109/CVPR.2017.277Google Scholar
Yong Li, Jiabei Zeng, Shiguang Shan, and Xilin Chen. 2019. Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism. IEEE Transactions on Image Processing 28, 5 (May 2019), 2439--2450. https://doi.org/10.1109/TIP.2018.2886767Google ScholarCross Ref
Fuyan Ma, Bin Sun, and Shutao Li. 2021. Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion. IEEE Transactions on Affective Computing (2021), 1--1. https://doi.org/10.1109/TAFFC.2021.3122146Google ScholarDigital Library
Jiageng Mao, Minzhe Niu, Haoyue Bai, Xiaodan Liang, Hang Xu, and Chunjing Xu. 2021. Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection. arXiv:2109.02499 [cs] (Sept. 2021). arXiv:2109.02499 [cs]Google Scholar
Shervin Minaee, Mehdi Minaei, and Amirali Abdolrashidi. 2021. Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network. Sensors 21, 9 (April 2021), 3046. https://doi.org/10.3390/s21093046Google Scholar
Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent Models of Visual Attention. arXiv:1406.6247 [cs, stat] (June 2014). arXiv:1406.6247 [cs, stat]Google Scholar
Ali Mollahosseini, Behzad Hasani, and Mohammad H. Mahoor. 2019. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Transactions on Affective Computing 10, 1 (Jan. 2019), 18--31. https://doi.org/10.1109/TAFFC.2017.2740923Google ScholarDigital Library
Mahdi Pourmirzaei, Gholam Ali Montazer, and Farzaneh Esmaili. 2021. Using Self-Supervised Auxiliary Tasks to Improve Fine-Grained Facial Representation. arXiv:2105.06421 [cs] (Aug. 2021). arXiv:2105.06421 [cs]Google Scholar
Christopher Pramerdorfer and Martin Kampel. 2016. Facial Expression Recognition Using Convolutional Neural Networks: State of the Art. arXiv:1612.02903 [cs] (Dec. 2016). arXiv:1612.02903 [cs]Google Scholar
Biao Qian, Yang Wang, Richang Hong, Meng Wang, and Ling Shao. 2021. Diversifying Inference Path Selection: Moving-Mobile-Network for Landmark Recognition. IEEE Transactions on Image Processing 30 (2021), 4894--4904. https://doi.org/10.1109/TIP.2021.3076275Google ScholarCross Ref
Shuvendu Roy and Ali Etemad. 2021. Self-Supervised Contrastive Learning of Multi-view Facial Expressions. In Proceedings of the 2021 International Conference on Multimodal Interaction. ACM, Montréal QC Canada, 253--257. https://doi.org/10.1145/3462244.3479955Google ScholarDigital Library
Ruicong Zhi, Markus Flierl, Qiuqi Ruan, and W Bastiaan Kleijn. 2011. Graph-Preserving Sparse Nonnegative Matrix Factorization With Application to Facial Expression Recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 41, 1 (Feb. 2011), 38--52. https://doi.org/10.1109/TSMCB.2010.2044788Google ScholarDigital Library
Henrique Siqueira, Sven Magg, and Stefan Wermter. 2020. Efficient Facial Feature Learning with Wide Ensemble-Based Convolutional Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence 34, 04 (April 2020), 5800--5809. https://doi.org/10.1609/aaai.v34i04.6037Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv:1706.03762 [cs] (Dec. 2017). arXiv:1706.03762 [cs]Google Scholar
Kai Wang, Xiaojiang Peng, Jianfei Yang, Shijian Lu, and Yu Qiao. 2020. Suppressing Uncertainties for Large-Scale Facial Expression Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 6896--6905. https://doi.org/10.1109/CVPR42600.2020.00693Google Scholar
Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, and Yu Qiao. 2020. Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition. IEEE Transactions on Image Processing 29 (2020), 4057--4069. https://doi.org/10.1109/TIP.2019.2956143Google ScholarDigital Library
Tongzhou Wang and Phillip Isola. 2020. Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. arXiv:2005.10242 [cs, stat] (Nov. 2020). arXiv:2005.10242 [cs, stat]Google Scholar
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-Local Neural Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 7794--7803. https://doi.org/10.1109/CVPR.2018.00813Google Scholar
Yang Wang. 2021. Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (March 2021), 1--25. https://doi.org/10.1145/3408317Google ScholarDigital Library
Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang, and Shirui Pan. 2016. Iterative Views Agreement: An Iterative Low-Rank Based Structured Optimization Method to Multi-View Spectral Clustering. arXiv:1608.05560 [cs, stat] (Aug. 2016). arXiv:1608.05560 [cs, stat]Google Scholar
Y. Wang J. Peng H. Wang M. Wang. 2022. Progressive Learning with Multi-scale Attention Network for Cross-domain Vehicle Re-identification. In SCIENCE CHINA Information Sciences 2022.Google ScholarCross Ref
Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, and Christoph Feichtenhofer. 2021. Masked Feature Prediction for Self-Supervised Visual Pre-Training. arXiv:2112.09133 [cs] (Dec. 2021). arXiv:2112.09133 [cs]Google Scholar
Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. 2016. A Discriminative Feature Learning Approach for Deep Face Recognition. In Computer Vision -- ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Vol. 9911. Springer International Publishing, Cham, 499--515. https://doi.org/10.1007/978--3--319--46478--7_31Google ScholarCross Ref
Zhengyao Wen, Wenzhong Lin, Tao Wang, and Ge Xu. 2021. Distract Your Attention: Multi-head Cross Attention Network for Facial Expression Recognition. arXiv:2109.07270 [cs] (Nov. 2021). arXiv:2109.07270 [cs]Google Scholar
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional Block Attention Module. arXiv:1807.06521 [cs] (July 2018). arXiv:1807.06521 [cs]Google Scholar
Lin Wu, Yang Wang, Junbin Gao, Meng Wang, Zheng-Jun Zha, and Dacheng Tao. 2021. Deep Coattention-Based Comparator for Relative Representation Learning in Person Re-Identification. IEEE Transactions on Neural Networks and Learning Systems 32, 2 (Feb. 2021), 722--735. https://doi.org/10.1109/TNNLS.2020.2979190Google ScholarCross Ref
Kaihao Zhang, Yongzhen Huang, Yong Du, and Liang Wang. 2017. Facial Expression Recognition Based on Deep Evolutional Spatial-Temporal Networks. IEEE Transactions on Image Processing 26, 9 (Sept. 2017), 4193--4203. https://doi.org/10.1109/TIP.2017.2689999Google ScholarDigital Library
Hengshuang Zhao, Jiaya Jia, and Vladlen Koltun. 2020. Exploring Self-Attention for Image Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 10073--10082. https://doi.org/10.1109/CVPR42600.2020.01009Google Scholar
Zengqun Zhao, Qingshan Liu, and Feng Zhou. 2021. Robust Lightweight Facial Expression Recognition Network with Label Distribution Training. (Nov. 2021), 10.Google Scholar
Lin Zhong, Qingshan Liu, Peng Yang, Bo Liu, Junzhou Huang, and Dimitris N Metaxas. 2012. Learning active facial patches for expression analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2562--2569.Google ScholarCross Ref

Index Terms

Blindfold Attention: Novel Mask Strategy for Facial Expression Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Facial expression recognition with Convolutional Neural Networks

Facial expression recognition has been an active research area in the past 10 years, with growing application areas including avatar animation, neuromarketing and sociable robots. The recognition of facial expressions is not an easy problem for machine ...
Read More
Expression-invariant face recognition by facial expression transformations

In this paper, we present a method of expression-invariant face recognition that transforms input face image with an arbitrary expression into its corresponding neutral facial expression image. When a new face image with an arbitrary expression is ...
Read More
Pose-robust feature learning for facial expression recognition

Automatic facial expression recognition (FER) from non-frontal views is a challenging research topic which has recently started to attract the attention of the research community. Pose variations are difficult to tackle and many face analysis methods ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval
June 2022
714 pages
ISBN:9781450392389
DOI:10.1145/3512527
General Chairs:
Vincent Oria
New Jersey Institute of Technology, USA
,
Maria Luisa Sapino
Università degli Studi di Torino, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Brigitte Kerhervé
Université du Québec à Montréal, Canada
,
Program Chairs:
Wen-Huang Cheng
National Yang Ming Chao Tung University, Taiwan
,
Ichiro Ide
Nagoya University, Japan
,
Vivek Singh
Rutgers University, USA
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
affect
computer vision
deep learning
facial expression recognition
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of830submissions,31%
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 176
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Blindfold Attention: Novel Mask Strategy for Facial Expression Recognition

ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Facial expression recognition with Convolutional Neural Networks

Expression-invariant face recognition by facial expression transformations

Pose-robust feature learning for facial expression recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Blindfold Attention: Novel Mask Strategy for Facial Expression Recognition

ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Facial expression recognition with Convolutional Neural Networks

Expression-invariant face recognition by facial expression transformations

Pose-robust feature learning for facial expression recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media