BLAN: Bi-directional ladder attentive network for facial attribute prediction
Introduction
Facial attributes represent intuitive semantic features that describe visual properties of face images [1], [2], such as smiling and eyeglasses, contributing to numerous real-world applications, e.g., face verification [3], [4], face recognition [5], [6], and face retrieval [7], [8]. Given a face image, facial attribute prediction aims to estimate whether desired attributes are present by learning discriminative feature representations and constructing accurate attribute classifiers.
Recently, deep convolutional neural networks (CNNs) have gained great popularity and have dramatically improved the performance of state-of-the-art algorithms in the field of facial attribute prediction. In general, deep facial attribute prediction methods can be categorized into two groups: part-based methods [9], [10] and holistic methods [11], [12]. Part-based methods first locate the positions of facial attributes and then extract features according to obtained location cues for the subsequent attribute prediction. In contrast, holistic methods learn attribute relationships and estimate facial attributes from the entire face images without any additional localization mechanism.
In this paper, we focus on holistic facial attribute prediction methods. The insight in this line of work lies in capturing shared and specific attribute features with customized architectures. Specifically, the customized networks learn shared features of all attributes across low-level layers. Then, these features flow to high-level layers, which resort to multiple split branches to predict attributes with different characteristics. However, in this process, only the high-level abstract features at the end of each branch take part in the final attribute prediction. The low-level shared information at low-level layers might vanish when arriving at the high-level layers [12]. Consequently, low-level features may not be fully explored and utilized.
Such deficiency of current holistic facial attribute methods prompts us to reconsider the relationship between the CNN network architecture and its extracted features at each level. Rather than capturing features with the commonality and specialty in deep networks, this paper considers leveraging the hierarchical structure of a deep network to learn the locality and globality of facial attribute features. Specifically, low-level CNN layers capture subtle and detailed face features, corresponding to the attributes that appear in local face regions, i.e., local facial attributes. As CNNs go deeper, more global and abstract information is explored to estimate the attributes that rely on the entire face to make predictions, i.e., global facial attributes. Therefore, the local and global natures of facial attributes can be significantly projected to the local and global feature representations, which are captured by low-level and high-level hierarchies of deep networks.
Taking such correlations between feature hierarchies and attribute characteristics, we design a novel Bi-directional Ladder Attentive Network (BLAN) to learn hierarchical feature representations from low-levels to high-levels, correspondingly to predict facial attributes with the locality and the globality. BLAN is constructed based on the autoencoder framework with multiple layer-wise bi-directional connections between its encoder and decoder. The encoder and decoder features learned at each level are fed into the proposed Residual Dual Attention Module (RDAM). RDAM adaptively interweaves these features to learn complementary information via residual connections. Besides, it employs dual channel-wise and spatial-wise attention to jointly learn what and where to focus, yielding richer attentive feature representations. To further improve the quality of learned interweaved representations at each level, Local Mutual Information Maximization (LMIM) loss is derived for incorporating the locality of input attributes into high-level representations. After that, multiple hierarchical classifiers operate on learned hierarchical attentive features with maximized mutual information to produce global and local decisions. Then, an adaptive score fusion module is followed to merge these multiple decisions at each level of BLAN, resulting in a further boost of the final performance. Extensive experiments on two facial attribute datasets CelebA and LFWA demonstrate that the proposed method outperforms state-of-the-art methods.
The main contributions are summarized as follows.
- •
We propose a novel Bi-directional Ladder Attentive Network (BLAN) which exploits the correlations between low-to-high hierarchy features and local-to-global facial attributes. Layer-wise bi-directional connections are designed based on the autoencoder framework to learn complementary features from the encoder and the decoder.
- •
Residual Dual Attention Module (RDAM) is developed to jointly learn dual channel-wise and spatial-wise attention for interweaving the encoder and decoder features. The residual connection ensures to capture complementary information.
- •
A Local Mutual Information Maximization (LMIM) loss is introduced to maximize the deep mutual information between input attentive attribute features and learned abstract representations, yielding improved features at each hierarchy.
- •
We present an adaptive score fusion strategy to merge local and global decisions from multiple hierarchical attribute classifiers for further boosting the performance of facial attribute prediction. Superior experimental results on two facial attribute datasets CelebA and LFWA demonstrate the effectiveness of the proposed BLAN.
Section snippets
Facial attribute prediction
Existing deep facial attribute prediction works can be generally grouped into two broad categories: part-based methods and holistic methods. We provide a detailed introduction about the two categories below, respectively.
Part-based methods extract feature representations from different positions of facial attributes. Each position corresponds to a single attribute classifier. Hence, the key of part-based methods exists in the localization mechanism, which further classifies part-based methods
Bi-directional ladder attentive network
Given facial attribute images, the proposed BLAN first learns hierarchical feature representations from low-level layers to high-level layers under the autoencoder framework, corresponding to local and global features with the locality and the globality of facial attributes. Then, learned representations from both the encoder and the decoder at different hierarchies are fed into multiple residual dual attention modules for interweaving more discriminative attentive features. Next, these
Experiments
In this section, we systemically conduct experiments on two facial attribute datasets: CelebA and LFWA [17]. First, we introduce their descriptions and test protocols. Second, the implementation details involving training schemes, hyperparameter configurations, and attention settings are provided. Third, we compare and discuss our BLAN with state-of-the-art methods. Then, we experimentally illustrate the effectiveness of the hierarchical features learned by BLAN. Finally, the in-depth analysis
Conclusion and future works
In this paper, we study the facial attribute prediction problem by exploiting the correlations between hierarchical features and attributes with the locality and the globality characteristics. We have proposed a novel Bi-directional Ladder Attentive Network (BLAN) to learn hierarchical representations at different levels of an autoencoder framework. Layer-wise bi-directional connections between the encoder and the decoder ensure to capture richer local and global attribute representations by
Acknowledgements
This work is supported in part by the the State Key Development Program (Grant No. 2016YFB1001001), in part by the National Natural Science Foundation of China (NSFC) under Grant U1736119 and Grant U1936117, as well as the Fundamental Research Funds for the Central Universities under Grant DUT18JC06.
Xin Zheng received the B.E. degree in Integrated Circuit Design and Integration System, Dalian University of Technology, in 2017. She is currently a Master Student in the School of Information and Communication Engineering, Dalian University of Technology. Her research interests are in computer vision and pattern recognition.
References (42)
- et al.
Learning a bi-level adversarial network with global and local perception for makeup-invariant face verification
Pattern Recognit.
(2019) - et al.
Learning structured ordinal measures for video based face recognition
Pattern Recognit.
(2018) - et al.
Feature fusion for facial landmark detection
Pattern Recognit.
(2014) - et al.
Curriculum learning of visual attribute clusters for multi-task classification
Pattern Recognit.
(2018) - et al.
Attribute and simile classifiers for face verification
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
(2009) - et al.
Walk and learn: facial attribute representation learning from egocentric video and contextual data
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2016) - et al.
DeMeshnet: blind face inpainting for deep meshface verification
IEEE Trans. Inf. ForensicsSecur. (TIFS)
(2018) - et al.
Wasserstein CNN: learning invariant features for NIR-VIS face recognition
IEEE Trans. Pattern Anal. Mach.Intell. (TPAMI)
(2018) - et al.
Two birds, one stone: jointly learning binary code for large-scale face image retrieval and attributes prediction
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
(2015) - et al.
Large-scale face image retrieval system at attribute level based on facial attribute ontology and deep neuron network
Asian Conference on Intelligent Information and Database Systems
(2018)
PANDA: pose aligned networks for deep attribute modeling
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Landmark free face attribute prediction
IEEE Trans. Image Process.
Attributes for improved attributes: a multi-task network utilizing implicit and explicit relationships for facial attribute classification.
Proceedings of the 31st (AAAI) Conference on Artificial Intelligence
Partially shared multi-task convolutional neural network with local constraint for face attribute learning
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Poselets: body part detectors trained using 3D human pose annotations
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Improving facial attribute prediction using semantic segmentation
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Segment-based methods for facial attribute detection from partial faces
IEEE Trans. Affective Comput.
Deep learning face attributes in the wild
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
A deep cascade network for unaligned face attribute classification
Proceedings of the Conference on Artificial Intelligence(AAAI)
MOON: a mixed objective optimization network for the recognition of facial attributes
European Conference on Computer Vision (ECCV)
Deep face recognition
Proceedings of the British Machine Vision Conference 2015, (BMVC)
Cited by (7)
Learning an attention-aware parallel sharing network for facial attribute recognition
2023, Journal of Visual Communication and Image RepresentationDeep learning approaches in face analysis
2020, Learning Control: Applications in Robotics and Complex Dynamical SystemsFeature-Guided Perturbation for Facial Attribute Classification
2023, IEEE Transactions on Artificial IntelligenceFace Attribute Recognition Combining Feature Fusion and Task Grouping
2023, Jisuanji Gongcheng/Computer EngineeringFacial Attributes Recognition Combined with Feature Decoupling and Static-Dynamic Joint Graph Convolutional Network
2022, Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer GraphicsPrior-Guided Multi-scale Fusion Transformer for Face Attribute Recognition
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Xin Zheng received the B.E. degree in Integrated Circuit Design and Integration System, Dalian University of Technology, in 2017. She is currently a Master Student in the School of Information and Communication Engineering, Dalian University of Technology. Her research interests are in computer vision and pattern recognition.
Huaibo Huang received the B.E. degree in Measurement and Control Technology and Instrument from Xi’an Jiaotong University in 2012, and the M.E. degree in Optical Engineering from Beihang University in 2016. He is currently a Ph.D. student in the Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), CASIA, Beijing, China. His current research interests include computer vision and pattern recognition.
Yanqing Guo received the B.S. degree and Ph.D. degree in Electronic Engineering from Dalian University of Technology of China, in 2002 and 2009, respectively. He is currently a professor with School of Information and Communication Engineering, Dalian University of Technology. His research interests include multimedia security and forensics, digital image processing, deep learning and machine learning.
Bo Wang received the Ph.D. degree from Dalian University of Technology, China, in 2010. He is currently an Associate Professor with the School of Information and Communication Engineering, Dalian University of Technology. His research interests include image forensics and image steganalysis.
Ran He received the BE and MS degrees in computer science from Dalian University of Technology, and the PhD degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, in 2001, 2004, and 2009, respectively. Since September 2010, he has been with the National Laboratory of Pattern Recognition, where he is currently an associate professor. He currently serves as an associate editor of Neurocomputing (Elsevier) and serves on the program committees of several conferences. His research interests include information theoretic learning, pattern recognition, and computer vision.