skip to main content
10.1145/3512527.3531416acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Blindfold Attention: Novel Mask Strategy for Facial Expression Recognition

Authors Info & Claims
Published:27 June 2022Publication History

ABSTRACT

Facial Expression Recognition (FER) is a basic and crucial computer vision task of classifying emotional expressions from human faces images into various emotion categories such as happy, sad, surprised, scared, angry, etc. Recently, facial expression recognition based on deep learning has made great progress. However, no matter the weight initialization technology or the attention mechanism, the face recognition method based on deep learning hard to capture those visually insignificant but semantically important features. To aid above question, in this paper we present a novel Facial Expression Recognition training strategy consisting of two components: Memo Affinity Loss (MAL) and Mask Attention Fine Tuning (MAFT). MAL is a variant of center loss, which uses memory bank strategy as well as discriminative center. MAL widens the distance between different clusters and narrows the distance within each cluster. Therefore, the features extracted by CNN were comprehensive and independent, which produced a more robust model. MAFT is a strategy that blindfolds attention parts temporarily and forces the model to learn from other important regions of the input image. It's not only an augmenting technique, but also a novel fine-tuning approach. As we know, we are the first to apply the mask strategy to the attention part and use this strategy to fine-tune the models. Finally, to implement our ideas, we constructed a new network named Architecture Attention ResNet based on ResNet-18. Our methods are conceptually and practically simple, but receives superior results on popular public facial expression recognition benchmarks with 88.75% on RAF-DB, 65.17% on AffectNet-7, 60.72% on AffectNet-8. The code will open source soon.

Skip Supplemental Material Section

Supplemental Material

icmr-presentation (3531416).mp4

mp4

17.2 MB

References

  1. Tadas Baltrusaitis, Marwa Mahmoud, and Peter Robinson. 2015. Cross-Dataset Learning and Person-Specific Normalisation for Automatic Action Unit Detection. In 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, Ljubljana, 1--6. https://doi.org/10.1109/FG.2015.7284869Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In Computer Vision -- ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12346. Springer International Publishing, Cham, 213--229. https://doi.org/10.1007/978--3-030--58452--8_13Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mark Chen and Alec Radford. 2020. Generative Pretraining from Pixels. In International Conference on Machine Learning. PMLR, 1691--1703.Google ScholarGoogle Scholar
  4. Shikai Chen, Jianfeng Wang, Yuedong Chen, Zhongchao Shi, Xin Geng, and Yong Rui. 2020. Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 13981--13990. https://doi.org/10.1109/CVPR42600.2020.01400Google ScholarGoogle Scholar
  5. Yuedong Chen, Jianfeng Wang, Shikai Chen, Zhongchao Shi, and Jianfei Cai. 2019. Facial Motion Prior Networks for Facial Expression Recognition. In 2019 IEEE Visual Communications and Image Processing (VCIP). IEEE, Sydney, Australia, 1--4. https://doi.org/10.1109/VCIP47243.2019.8965826Google ScholarGoogle Scholar
  6. Yifan Chen, Yang Wang, Pengjie Ren, Meng Wang, and Maarten de Rijke. 2022. Bayesian Feature Interaction Selection for Factorization Machines. Artificial Intelligence 302 (Jan. 2022), 103589. https://doi.org/10.1016/j.artint.2021.103589Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Arun Das, Jeffrey Mock, Yufei Huang, Edward Golob, and Peyman Najafirad. 2021. Interpretable Self-Supervised Facial Micro-Expression Learning to Predict Cognitive State and Neurological Disorders. (Nov. 2021), 9.Google ScholarGoogle Scholar
  8. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). arXiv:1810.04805 [cs]Google ScholarGoogle Scholar
  9. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929 [cs] (June 2021). arXiv:2010.11929 [cs]Google ScholarGoogle Scholar
  10. X. Feng, M. Pietikäinen, and A. Hadid. 2007. Facial Expression Recognition Based on Local Binary Patterns. Pattern Recognition and Image Analysis 17, 4 (Dec. 2007), 592--598. https://doi.org/10.1134/S1054661807040190Google ScholarGoogle ScholarCross RefCross Ref
  11. Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, and Shi-Min Hu. 2021. Attention Mechanisms in Computer Vision: A Survey. arXiv:2111.07624 [cs] (Nov. 2021). arXiv:2111.07624 [cs]Google ScholarGoogle Scholar
  12. Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. 2016. MS-Celeb-1M: Challenge of Recognizing One Million Celebrities in the Real World. Electronic Imaging 2016, 11 (Feb. 2016), 1--6. https://doi.org/10.2352/ISSN.2470--1173.2016.11.IMAWM-463Google ScholarGoogle Scholar
  13. Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2021. Masked Autoencoders Are Scalable Vision Learners. arXiv:2111.06377 [cs] (Dec. 2021). arXiv:2111.06377 [cs]Google ScholarGoogle Scholar
  14. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. arXiv:1911.05722 [cs] (March 2020). arXiv:1911.05722 [cs]Google ScholarGoogle Scholar
  15. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, 770--778. https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle Scholar
  16. Qibin Hou, Daquan Zhou, and Jiashi Feng. 2021. Coordinate Attention for Efficient Mobile Network Design. arXiv:2103.02907 [cs] (March 2021). arXiv:2103.02907 [cs]Google ScholarGoogle Scholar
  17. Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. 2019. Squeeze-and-Excitation Networks. arXiv:1709.01507 [cs] (May 2019). arXiv:1709.01507 [cs]Google ScholarGoogle Scholar
  18. Qionghao Huang, Changqin Huang, Xizhe Wang, and Fan Jiang. 2021. Facial Expression Recognition with Grid-Wise Attention and Visual Transformer. Information Sciences 580 (Nov. 2021), 35--54. https://doi.org/10.1016/j.ins.2021.08.043Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Steven C. Y. Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-Song Chen. 2019. Compacting, Picking and Growing for Unforgetting Continual Learning. arXiv:1910.06562 [cs, stat] (Oct. 2019). arXiv:1910.06562 [cs, stat]Google ScholarGoogle Scholar
  20. Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2016. Spatial Transformer Networks. arXiv:1506.02025 [cs] (Feb. 2016). arXiv:1506.02025 [cs]Google ScholarGoogle Scholar
  21. Zeng Jiabei, Shan Shiguang, and Chen Xilin. 2018. Facial Expression Recognition with Inconsistently Annotated Datasets. In Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  22. Amine Kechaou, Manuel Martinez, Monica Haurilet, and Rainer Stiefelhagen. 2021. Detective: An Attentive Recurrent Model for Sparse Object Detection. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, Milan, Italy, 5340--5347. https://doi.org/10.1109/ICPR48806.2021.9412336Google ScholarGoogle Scholar
  23. Dimitrios Kollias, Shiyang Cheng, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. Deep Neural Network Augmentation: Generating Faces for Affect Analysis. International Journal of Computer Vision 128, 5 (May 2020), 1455--1484. https://doi.org/10.1007/s11263-020-01304--3Google ScholarGoogle ScholarCross RefCross Ref
  24. Hanting Li, Mingzhe Sui, Feng Zhao, Zhengjun Zha, and Feng Wu. 2021. MVT: Mask Vision Transformer for Facial Expression Recognition in the Wild. arXiv:2106.04520 [cs] (July 2021). arXiv:2106.04520 [cs]Google ScholarGoogle Scholar
  25. Shan Li and Weihong Deng. 2020. Deep Facial Expression Recognition: A Survey. IEEE Transactions on Affective Computing (2020), 1--1. https://doi.org/10.1109/TAFFC.2020.2981446Google ScholarGoogle Scholar
  26. Shan Li, Weihong Deng, and JunPing Du. 2017. Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, 2584--2593. https://doi.org/10.1109/CVPR.2017.277Google ScholarGoogle Scholar
  27. Yong Li, Jiabei Zeng, Shiguang Shan, and Xilin Chen. 2019. Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism. IEEE Transactions on Image Processing 28, 5 (May 2019), 2439--2450. https://doi.org/10.1109/TIP.2018.2886767Google ScholarGoogle ScholarCross RefCross Ref
  28. Fuyan Ma, Bin Sun, and Shutao Li. 2021. Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion. IEEE Transactions on Affective Computing (2021), 1--1. https://doi.org/10.1109/TAFFC.2021.3122146Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jiageng Mao, Minzhe Niu, Haoyue Bai, Xiaodan Liang, Hang Xu, and Chunjing Xu. 2021. Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection. arXiv:2109.02499 [cs] (Sept. 2021). arXiv:2109.02499 [cs]Google ScholarGoogle Scholar
  30. Shervin Minaee, Mehdi Minaei, and Amirali Abdolrashidi. 2021. Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network. Sensors 21, 9 (April 2021), 3046. https://doi.org/10.3390/s21093046Google ScholarGoogle Scholar
  31. Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent Models of Visual Attention. arXiv:1406.6247 [cs, stat] (June 2014). arXiv:1406.6247 [cs, stat]Google ScholarGoogle Scholar
  32. Ali Mollahosseini, Behzad Hasani, and Mohammad H. Mahoor. 2019. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Transactions on Affective Computing 10, 1 (Jan. 2019), 18--31. https://doi.org/10.1109/TAFFC.2017.2740923Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mahdi Pourmirzaei, Gholam Ali Montazer, and Farzaneh Esmaili. 2021. Using Self-Supervised Auxiliary Tasks to Improve Fine-Grained Facial Representation. arXiv:2105.06421 [cs] (Aug. 2021). arXiv:2105.06421 [cs]Google ScholarGoogle Scholar
  34. Christopher Pramerdorfer and Martin Kampel. 2016. Facial Expression Recognition Using Convolutional Neural Networks: State of the Art. arXiv:1612.02903 [cs] (Dec. 2016). arXiv:1612.02903 [cs]Google ScholarGoogle Scholar
  35. Biao Qian, Yang Wang, Richang Hong, Meng Wang, and Ling Shao. 2021. Diversifying Inference Path Selection: Moving-Mobile-Network for Landmark Recognition. IEEE Transactions on Image Processing 30 (2021), 4894--4904. https://doi.org/10.1109/TIP.2021.3076275Google ScholarGoogle ScholarCross RefCross Ref
  36. Shuvendu Roy and Ali Etemad. 2021. Self-Supervised Contrastive Learning of Multi-view Facial Expressions. In Proceedings of the 2021 International Conference on Multimodal Interaction. ACM, Montréal QC Canada, 253--257. https://doi.org/10.1145/3462244.3479955Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ruicong Zhi, Markus Flierl, Qiuqi Ruan, and W Bastiaan Kleijn. 2011. Graph-Preserving Sparse Nonnegative Matrix Factorization With Application to Facial Expression Recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 41, 1 (Feb. 2011), 38--52. https://doi.org/10.1109/TSMCB.2010.2044788Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Henrique Siqueira, Sven Magg, and Stefan Wermter. 2020. Efficient Facial Feature Learning with Wide Ensemble-Based Convolutional Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence 34, 04 (April 2020), 5800--5809. https://doi.org/10.1609/aaai.v34i04.6037Google ScholarGoogle ScholarCross RefCross Ref
  39. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv:1706.03762 [cs] (Dec. 2017). arXiv:1706.03762 [cs]Google ScholarGoogle Scholar
  40. Kai Wang, Xiaojiang Peng, Jianfei Yang, Shijian Lu, and Yu Qiao. 2020. Suppressing Uncertainties for Large-Scale Facial Expression Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 6896--6905. https://doi.org/10.1109/CVPR42600.2020.00693Google ScholarGoogle Scholar
  41. Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, and Yu Qiao. 2020. Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition. IEEE Transactions on Image Processing 29 (2020), 4057--4069. https://doi.org/10.1109/TIP.2019.2956143Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tongzhou Wang and Phillip Isola. 2020. Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. arXiv:2005.10242 [cs, stat] (Nov. 2020). arXiv:2005.10242 [cs, stat]Google ScholarGoogle Scholar
  43. Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-Local Neural Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 7794--7803. https://doi.org/10.1109/CVPR.2018.00813Google ScholarGoogle Scholar
  44. Yang Wang. 2021. Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (March 2021), 1--25. https://doi.org/10.1145/3408317Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang, and Shirui Pan. 2016. Iterative Views Agreement: An Iterative Low-Rank Based Structured Optimization Method to Multi-View Spectral Clustering. arXiv:1608.05560 [cs, stat] (Aug. 2016). arXiv:1608.05560 [cs, stat]Google ScholarGoogle Scholar
  46. Y. Wang J. Peng H. Wang M. Wang. 2022. Progressive Learning with Multi-scale Attention Network for Cross-domain Vehicle Re-identification. In SCIENCE CHINA Information Sciences 2022.Google ScholarGoogle ScholarCross RefCross Ref
  47. Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, and Christoph Feichtenhofer. 2021. Masked Feature Prediction for Self-Supervised Visual Pre-Training. arXiv:2112.09133 [cs] (Dec. 2021). arXiv:2112.09133 [cs]Google ScholarGoogle Scholar
  48. Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. 2016. A Discriminative Feature Learning Approach for Deep Face Recognition. In Computer Vision -- ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Vol. 9911. Springer International Publishing, Cham, 499--515. https://doi.org/10.1007/978--3--319--46478--7_31Google ScholarGoogle ScholarCross RefCross Ref
  49. Zhengyao Wen, Wenzhong Lin, Tao Wang, and Ge Xu. 2021. Distract Your Attention: Multi-head Cross Attention Network for Facial Expression Recognition. arXiv:2109.07270 [cs] (Nov. 2021). arXiv:2109.07270 [cs]Google ScholarGoogle Scholar
  50. Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional Block Attention Module. arXiv:1807.06521 [cs] (July 2018). arXiv:1807.06521 [cs]Google ScholarGoogle Scholar
  51. Lin Wu, Yang Wang, Junbin Gao, Meng Wang, Zheng-Jun Zha, and Dacheng Tao. 2021. Deep Coattention-Based Comparator for Relative Representation Learning in Person Re-Identification. IEEE Transactions on Neural Networks and Learning Systems 32, 2 (Feb. 2021), 722--735. https://doi.org/10.1109/TNNLS.2020.2979190Google ScholarGoogle ScholarCross RefCross Ref
  52. Kaihao Zhang, Yongzhen Huang, Yong Du, and Liang Wang. 2017. Facial Expression Recognition Based on Deep Evolutional Spatial-Temporal Networks. IEEE Transactions on Image Processing 26, 9 (Sept. 2017), 4193--4203. https://doi.org/10.1109/TIP.2017.2689999Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Hengshuang Zhao, Jiaya Jia, and Vladlen Koltun. 2020. Exploring Self-Attention for Image Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 10073--10082. https://doi.org/10.1109/CVPR42600.2020.01009Google ScholarGoogle Scholar
  54. Zengqun Zhao, Qingshan Liu, and Feng Zhou. 2021. Robust Lightweight Facial Expression Recognition Network with Label Distribution Training. (Nov. 2021), 10.Google ScholarGoogle Scholar
  55. Lin Zhong, Qingshan Liu, Peng Yang, Bo Liu, Junzhou Huang, and Dimitris N Metaxas. 2012. Learning active facial patches for expression analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2562--2569.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Blindfold Attention: Novel Mask Strategy for Facial Expression Recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval
      June 2022
      714 pages
      ISBN:9781450392389
      DOI:10.1145/3512527

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 June 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate254of830submissions,31%

      Upcoming Conference

      ICMR '24
      International Conference on Multimedia Retrieval
      June 10 - 14, 2024
      Phuket , Thailand
    • Article Metrics

      • Downloads (Last 12 months)36
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader