BLAN: Bi-directional ladder attentive network for facial attribute prediction

doi:10.1016/j.patcog.2019.107155

Pattern Recognition

Volume 100, April 2020, 107155

https://doi.org/10.1016/j.patcog.2019.107155 Get rights and content

Highlights

•
A novel Bi-directional Ladder Attentive Network (BLAN) to make facial attribute prediction better.
•
Learning hierarchical representations for exploiting the correlations between feature hierarchies and attribute characteristics.
•
Residual Dual Attention Module (RDAM) shows the excellent ability in interweaving features from the encoder and the decoder.
•
Local Mutual Information Maximization (LMIM) loss further incorporates the locality of the input attribute features to the high-level representations and produces high-quality features.
•
Adaptive score fusion module performs well in merging multiple global and local decisions from all hierarchies.

Abstract

Deep facial attribute prediction has received considerable attention with a wide range of real-world applications in the past few years. Existing works almost extract abstract global features at high levels of deep neural networks to make predictions. However, local features at low levels, which contain detailed local attribute information, are not well exploited. In this paper, we propose a novel Bi-directional Ladder Attentive Network (BLAN) to learn hierarchical representations, covering the correlations between feature hierarchies and attribute characteristics. BLAN adopts layer-wise bi-directional connections based on the autoencoder framework from low to high levels. In this way, hierarchical features with local and global attribute characteristics could be correspondingly interweaved at each level via multiple designed Residual Dual Attention Modules (RDAMs). Besides, we derive a Local Mutual Information Maximization (LMIM) loss to further incorporate the locality of facial attributes to high-level representations at each hierarchy. Multiple attribute classifiers receive hierarchical representations to produce local and global decisions, followed by a proposed adaptive score fusion module to merge these decisions for yielding the final prediction result. Extensive experiments on two facial attribute datasets, CelebA and LFWA, demonstrate that our BLAN outperforms state-of-the-art methods.

Introduction

Facial attributes represent intuitive semantic features that describe visual properties of face images [1], [2], such as smiling and eyeglasses, contributing to numerous real-world applications, e.g., face verification [3], [4], face recognition [5], [6], and face retrieval [7], [8]. Given a face image, facial attribute prediction aims to estimate whether desired attributes are present by learning discriminative feature representations and constructing accurate attribute classifiers.

Recently, deep convolutional neural networks (CNNs) have gained great popularity and have dramatically improved the performance of state-of-the-art algorithms in the field of facial attribute prediction. In general, deep facial attribute prediction methods can be categorized into two groups: part-based methods [9], [10] and holistic methods [11], [12]. Part-based methods first locate the positions of facial attributes and then extract features according to obtained location cues for the subsequent attribute prediction. In contrast, holistic methods learn attribute relationships and estimate facial attributes from the entire face images without any additional localization mechanism.

In this paper, we focus on holistic facial attribute prediction methods. The insight in this line of work lies in capturing shared and specific attribute features with customized architectures. Specifically, the customized networks learn shared features of all attributes across low-level layers. Then, these features flow to high-level layers, which resort to multiple split branches to predict attributes with different characteristics. However, in this process, only the high-level abstract features at the end of each branch take part in the final attribute prediction. The low-level shared information at low-level layers might vanish when arriving at the high-level layers [12]. Consequently, low-level features may not be fully explored and utilized.

Such deficiency of current holistic facial attribute methods prompts us to reconsider the relationship between the CNN network architecture and its extracted features at each level. Rather than capturing features with the commonality and specialty in deep networks, this paper considers leveraging the hierarchical structure of a deep network to learn the locality and globality of facial attribute features. Specifically, low-level CNN layers capture subtle and detailed face features, corresponding to the attributes that appear in local face regions, i.e., local facial attributes. As CNNs go deeper, more global and abstract information is explored to estimate the attributes that rely on the entire face to make predictions, i.e., global facial attributes. Therefore, the local and global natures of facial attributes can be significantly projected to the local and global feature representations, which are captured by low-level and high-level hierarchies of deep networks.

Taking such correlations between feature hierarchies and attribute characteristics, we design a novel Bi-directional Ladder Attentive Network (BLAN) to learn hierarchical feature representations from low-levels to high-levels, correspondingly to predict facial attributes with the locality and the globality. BLAN is constructed based on the autoencoder framework with multiple layer-wise bi-directional connections between its encoder and decoder. The encoder and decoder features learned at each level are fed into the proposed Residual Dual Attention Module (RDAM). RDAM adaptively interweaves these features to learn complementary information via residual connections. Besides, it employs dual channel-wise and spatial-wise attention to jointly learn what and where to focus, yielding richer attentive feature representations. To further improve the quality of learned interweaved representations at each level, Local Mutual Information Maximization (LMIM) loss is derived for incorporating the locality of input attributes into high-level representations. After that, multiple hierarchical classifiers operate on learned hierarchical attentive features with maximized mutual information to produce global and local decisions. Then, an adaptive score fusion module is followed to merge these multiple decisions at each level of BLAN, resulting in a further boost of the final performance. Extensive experiments on two facial attribute datasets CelebA and LFWA demonstrate that the proposed method outperforms state-of-the-art methods.

The main contributions are summarized as follows.

•
We propose a novel Bi-directional Ladder Attentive Network (BLAN) which exploits the correlations between low-to-high hierarchy features and local-to-global facial attributes. Layer-wise bi-directional connections are designed based on the autoencoder framework to learn complementary features from the encoder and the decoder.
•
Residual Dual Attention Module (RDAM) is developed to jointly learn dual channel-wise and spatial-wise attention for interweaving the encoder and decoder features. The residual connection ensures to capture complementary information.
•
A Local Mutual Information Maximization (LMIM) loss is introduced to maximize the deep mutual information between input attentive attribute features and learned abstract representations, yielding improved features at each hierarchy.
•
We present an adaptive score fusion strategy to merge local and global decisions from multiple hierarchical attribute classifiers for further boosting the performance of facial attribute prediction. Superior experimental results on two facial attribute datasets CelebA and LFWA demonstrate the effectiveness of the proposed BLAN.

Section snippets

Facial attribute prediction

Existing deep facial attribute prediction works can be generally grouped into two broad categories: part-based methods and holistic methods. We provide a detailed introduction about the two categories below, respectively.

Part-based methods extract feature representations from different positions of facial attributes. Each position corresponds to a single attribute classifier. Hence, the key of part-based methods exists in the localization mechanism, which further classifies part-based methods

Bi-directional ladder attentive network

Given facial attribute images, the proposed BLAN first learns hierarchical feature representations from low-level layers to high-level layers under the autoencoder framework, corresponding to local and global features with the locality and the globality of facial attributes. Then, learned representations from both the encoder and the decoder at different hierarchies are fed into multiple residual dual attention modules for interweaving more discriminative attentive features. Next, these

Experiments

In this section, we systemically conduct experiments on two facial attribute datasets: CelebA and LFWA [17]. First, we introduce their descriptions and test protocols. Second, the implementation details involving training schemes, hyperparameter configurations, and attention settings are provided. Third, we compare and discuss our BLAN with state-of-the-art methods. Then, we experimentally illustrate the effectiveness of the hierarchical features learned by BLAN. Finally, the in-depth analysis

Conclusion and future works

In this paper, we study the facial attribute prediction problem by exploiting the correlations between hierarchical features and attributes with the locality and the globality characteristics. We have proposed a novel Bi-directional Ladder Attentive Network (BLAN) to learn hierarchical representations at different levels of an autoencoder framework. Layer-wise bi-directional connections between the encoder and the decoder ensure to capture richer local and global attribute representations by

Acknowledgements

This work is supported in part by the the State Key Development Program (Grant No. 2016YFB1001001), in part by the National Natural Science Foundation of China (NSFC) under Grant U1736119 and Grant U1936117, as well as the Fundamental Research Funds for the Central Universities under Grant DUT18JC06.

Xin Zheng received the B.E. degree in Integrated Circuit Design and Integration System, Dalian University of Technology, in 2017. She is currently a Master Student in the School of Information and Communication Engineering, Dalian University of Technology. Her research interests are in computer vision and pattern recognition.

References (42)

Y. Li et al.
Learning a bi-level adversarial network with global and local perception for makeup-invariant face verification
Pattern Recognit.
(2019)
R. He et al.
Learning structured ordinal measures for video based face recognition
Pattern Recognit.
(2018)
P. Perakis et al.
Feature fusion for facial landmark detection
Pattern Recognit.
(2014)
N. Sarafianos et al.
Curriculum learning of visual attribute clusters for multi-task classification
Pattern Recognit.
(2018)
N. Kumar et al.
Attribute and simile classifiers for face verification
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
(2009)
J. Wang et al.
Walk and learn: facial attribute representation learning from egocentric video and contextual data
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2016)
S. Zhang et al.
DeMeshnet: blind face inpainting for deep meshface verification
IEEE Trans. Inf. ForensicsSecur. (TIFS)
(2018)
R. He et al.
Wasserstein CNN: learning invariant features for NIR-VIS face recognition
IEEE Trans. Pattern Anal. Mach.Intell. (TPAMI)
(2018)
Y. Li et al.
Two birds, one stone: jointly learning binary code for large-scale face image retrieval and attributes prediction
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
(2015)
H.M. Nguyen et al.
Large-scale face image retrieval system at attribute level based on facial attribute ontology and deep neuron network
Asian Conference on Intelligent Information and Database Systems
(2018)

N. Zhang et al.

PANDA: pose aligned networks for deep attribute modeling

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2014)

J. Li et al.

Landmark free face attribute prediction

IEEE Trans. Image Process.

(2018)

E.M. Hand et al.

Attributes for improved attributes: a multi-task network utilizing implicit and explicit relationships for facial attribute classification.

Proceedings of the 31st (AAAI) Conference on Artificial Intelligence

(2017)

J. Cao et al.

Partially shared multi-task convolutional neural network with local constraint for face attribute learning

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2018)

L. Bourdev et al.

Poselets: body part detectors trained using 3D human pose annotations

Proceedings of the IEEE International Conference on Computer Vision (ICCV)

(2009)

M.M. Kalayeh et al.

Improving facial attribute prediction using semantic segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2017)

U. Mahbub et al.

Segment-based methods for facial attribute detection from partial faces

IEEE Trans. Affective Comput.

(2018)

Z. Liu et al.

Deep learning face attributes in the wild

Proceedings of the IEEE International Conference on Computer Vision (ICCV)

(2015)

H. Ding et al.

A deep cascade network for unaligned face attribute classification

Proceedings of the Conference on Artificial Intelligence(AAAI)

(2018)

E.M. Rudd et al.

MOON: a mixed objective optimization network for the recognition of facial attributes

European Conference on Computer Vision (ECCV)

(2016)

O.M. Parkhi et al.

Deep face recognition

Proceedings of the British Machine Vision Conference 2015, (BMVC)

(2015)

Cited by (7)

Learning an attention-aware parallel sharing network for facial attribute recognition
2023, Journal of Visual Communication and Image Representation
Existing multi-task learning based facial attribute recognition (FAR) methods usually employ the serial sharing network, where the high-level global features are used for attribute prediction. However, the shared low-level features with valuable spatial information are not well exploited for multiple tasks. This paper proposes a novel Attention-aware Parallel Sharing network termed APS for effective FAR. To make full use of the shared low-level features, the task-specific sub-networks can adaptively extract important features from each block of the shared sub-network. Furthermore, an effective attention mechanism with multi-feature soft-alignment modules is employed to evaluate the compatibility of the local and global features from the different network levels for discriminating attributes. In addition, an adaptive Focal loss penalty scheme is developed to automatically assign weights to handle the problems of class imbalance and hard example mining for FAR. Experiments demonstrate that the proposed method achieves better performance than the state-of-the-art FAR methods.
Deep learning approaches in face analysis
2020, Learning Control: Applications in Robotics and Complex Dynamical Systems
Although face analysis algorithms have changed over the decades, almost in every face-related algorithm it is still usually the case that the order of the problem solving algorithm is the same. The analysis task is relatively easy on frontal and clear faces. However, when it comes to in-the-wild objects with spontaneous expressions, it is a challenging issue due to the changes in illumination, pose variation, expression intensity, subtle deformations, occlusion, etc. The recent success in deep neural networks makes it inevitable to ignore the technique in face analysis to automatically learn the discriminative representations of the face. In a deep network, the input data is run through several hidden layers that decompose the features of the input. The features are then classified using a function to retrieve the class probabilities to predict the output class. This chapter will investigate the deep learning approaches used to detect and pre-process the face, estimate its attributes, classify the expression and recognize the face.
Feature-Guided Perturbation for Facial Attribute Classification
2023, IEEE Transactions on Artificial Intelligence
Face Attribute Recognition Combining Feature Fusion and Task Grouping
2023, Jisuanji Gongcheng/Computer Engineering
Facial Attributes Recognition Combined with Feature Decoupling and Static-Dynamic Joint Graph Convolutional Network
2022, Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics
Prior-Guided Multi-scale Fusion Transformer for Face Attribute Recognition
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View all citing articles on Scopus

Huaibo Huang received the B.E. degree in Measurement and Control Technology and Instrument from Xi’an Jiaotong University in 2012, and the M.E. degree in Optical Engineering from Beihang University in 2016. He is currently a Ph.D. student in the Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), CASIA, Beijing, China. His current research interests include computer vision and pattern recognition.

Yanqing Guo received the B.S. degree and Ph.D. degree in Electronic Engineering from Dalian University of Technology of China, in 2002 and 2009, respectively. He is currently a professor with School of Information and Communication Engineering, Dalian University of Technology. His research interests include multimedia security and forensics, digital image processing, deep learning and machine learning.

Bo Wang received the Ph.D. degree from Dalian University of Technology, China, in 2010. He is currently an Associate Professor with the School of Information and Communication Engineering, Dalian University of Technology. His research interests include image forensics and image steganalysis.

Ran He received the BE and MS degrees in computer science from Dalian University of Technology, and the PhD degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, in 2001, 2004, and 2009, respectively. Since September 2010, he has been with the National Laboratory of Pattern Recognition, where he is currently an associate professor. He currently serves as an associate editor of Neurocomputing (Elsevier) and serves on the program committees of several conferences. His research interests include information theoretic learning, pattern recognition, and computer vision.

View full text

BLAN: Bi-directional ladder attentive network for facial attribute prediction

Highlights

Abstract

Introduction

Section snippets

Facial attribute prediction

Bi-directional ladder attentive network

Experiments

Conclusion and future works

Acknowledgements

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Attribute and simile classifiers for face verification

Proceedings of the IEEE International Conference on Computer Vision (ICCV)

Walk and learn: facial attribute representation learning from egocentric video and contextual data

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

DeMeshnet: blind face inpainting for deep meshface verification

IEEE Trans. Inf. ForensicsSecur. (TIFS)

Wasserstein CNN: learning invariant features for NIR-VIS face recognition

IEEE Trans. Pattern Anal. Mach.Intell. (TPAMI)

Two birds, one stone: jointly learning binary code for large-scale face image retrieval and attributes prediction

Proceedings of the IEEE International Conference on Computer Vision (ICCV)

Large-scale face image retrieval system at attribute level based on facial attribute ontology and deep neuron network

Asian Conference on Intelligent Information and Database Systems

PANDA: pose aligned networks for deep attribute modeling

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Landmark free face attribute prediction

IEEE Trans. Image Process.

Attributes for improved attributes: a multi-task network utilizing implicit and explicit relationships for facial attribute classification.

Proceedings of the 31st (AAAI) Conference on Artificial Intelligence

Partially shared multi-task convolutional neural network with local constraint for face attribute learning

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Poselets: body part detectors trained using 3D human pose annotations

Proceedings of the IEEE International Conference on Computer Vision (ICCV)

Improving facial attribute prediction using semantic segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Segment-based methods for facial attribute detection from partial faces

IEEE Trans. Affective Comput.

Deep learning face attributes in the wild

Proceedings of the IEEE International Conference on Computer Vision (ICCV)

A deep cascade network for unaligned face attribute classification

Proceedings of the Conference on Artificial Intelligence(AAAI)

MOON: a mixed objective optimization network for the recognition of facial attributes

European Conference on Computer Vision (ECCV)

Deep face recognition

Proceedings of the British Machine Vision Conference 2015, (BMVC)