Adaptive Cascade Deep Convolutional Neural Networks for face alignment

doi:10.1016/j.csi.2015.06.004

Computer Standards & Interfaces

Volume 42, November 2015, Pages 105-112

https://doi.org/10.1016/j.csi.2015.06.004 Get rights and content

Highlights

•
In this paper, an adaptive Cascade Deep Convolutional Neural Networks framework is proposed for face alignment before face recognition.
•
A new convolutional network structure with three convolutional layers and three fully-connected layers is introduced. Gaussian distribution is utilized to model the output error of previous networks and adjust configurations of networks adaptively.
•
Experiments show that our method has a better accuracy result than the state-of-the-art methods, with low complexity and good robustness.

Abstract

Deep convolutional network cascade has been successfully applied for face alignment. The configuration of each network, including the selecting strategy of local patches for training and the input range of local patches, is crucial for achieving desired performance. In this paper, we propose an adaptive cascade framework, termed Adaptive Cascade Deep Convolutional Neural Networks (ACDCNN) which adjusts the cascade structure adaptively. Gaussian distribution is utilized to bridge the successive networks. Extensive experiments demonstrate that our proposed ACDCNN achieves the state-of-the-art in accuracy, but with reduced model complexity and increased robustness.

Introduction

Face alignment or facial landmark localization plays a critical role in many visual applications such as face recognition, face tracking, facial expression recognition and 3D face modeling. Therefore, it has been extensively studied in recent years. However, robust facial landmark detection remains a challenging problem when face images are taken under the situation with extreme occlusion, lighting, expressions and pose. To address this issue, research explores the modeling of shape variation and appearance variation for improved performance. In general, this type of research can be categorized into three groups: constrained local model based methods [2], [3], [4], active appearance model based methods [5], [6] and regression based methods [1], [7], [8], [9], [10].

Constrained local models build classifiers called component detectors to search for each facial feature point independently. These component detectors calculate response maps to present the appearance variance around facial feature points. Due to the problems of ambiguity and corruption in local features, facial points detected by the local experts may be far away from the ground truth positions. Then shape constraints are applied to adjust the initial positions for improved results [2], [4]. However, the global contextual information is difficult to be embedded into these methods.

Instead of modeling the appearance with each facial point, active appearance models such as Active Appearance Model (AAM) [5] use a holistic perspective to model the appearance variance. An AAM model is composed of a linear shape model and a linear texture model. The Principal Component Analysis (PCA) is applied to bridge the relationship between the two models. Nevertheless, simple linear models can hardly present the nonlinear variations of facial appearance in the case of faces taken in complex environment (e.g., extreme lighting).

Regression based methods, on the other hand, directly learn a regression function from image appearance (features) to the target output (shapes) [11]. Cascade architecture is usually employed and explored in regression based models. In each stage of the cascade architecture, shape-index features [12] are extracted to predict the shape increment with linear regression [7], tree-based regression [8] where the mean shape is used as the initializations of the shapes. Coarse-to-Fine Auto-Encoder Networks (CFAN) [9] utilizes a Stacked Auto-encoder Network [13] to predict the face shape quickly by taking a whole face as input. DCNN [1] employs a deep CNN model to extract high-level features to make accurate predictions as the initialization. After the initialization, the DCNN designs two-level convolutional networks to refine each landmark separately by taking local regions as input. To train these networks, several factors are critical for achieving good performance. For example, Sun et al. [1] conduct extensive experiments to investigate different network structures which are the basic regression units. The input range of local regions and the selecting strategy of local patches for training are other main factors having great impacts on the accuracy and reliability. But these factors are set by intuition or empirically in traditional methods. Besides, the relationship between any two successive networks is less developed.

In this paper, we propose an Adaptive Cascade Deep Convolutional Neural Networks (ACDCNN) for facial point detection. After initializing the shape by a CNN model like DCNN, each landmark is refined by a series of networks. These networks take the output of previous networks as input and locate a new position of the landmark. Different from existing methods [1], [9] which apply the same configuration of regression for each landmark or each facial component in a stage, we set the configurations according to different results of each landmark. In addition, a Gaussian distribution is used to model the output error of the previous network. The input range of the local region is related to the expectation and the standard deviation of this Gaussian distribution. After the input range is determined, patches centered at positions shifted from the ground truth position are taken for training. Instead of taking these patches randomly, they are fetched under that Gaussian distribution. Thus the most relevant image patches are selected for training the successive network. These better training samples lead to better performance. The comparison experiments show that the proposed ACDCNN outperforms or is comparable to the state-of-the-art methods on both robustness and accuracy.

The rest of the paper is organized as follows. Section 2 introduces related work followed by our proposed ACDCNN introduced in Section 3. The Implementation details are described in Section 4. Section 5 reports our experimental results followed by conclusion in Section 6.

Section snippets

Related work

Many approaches to face alignment have been reported in the past decades among which regression based methods show highly efficient and accurate performance thus have received increasing attentions. Valstar et al. [14] develop support vector regression to model the nonlinear transform from the input local features (Haar-like features) to target point locations. Dantone et al. [15] extend the regression forests [16] to conditional regression forests. Head poses are utilized in the framework as

Adaptive Cascade Deep Convolutional Neural Networks

In this section, we present a novel method termed ACDCNN for face alignment. The details of each component of ACDCNN, including the initialization with a Deep Convolutional Neural Network and the Local Adaptive Cascade Networks (LACN) are explained.

Structures

Networks at different levels follow a similar architecture with varied numbers of the inputs. Table 1 summarizes the input sizes for different facial components. All networks are trained on raw RGB values of the pixels. The networks used in the first level, the left eye and right eye have higher resolution since it is observed the whole faces and the regions of eyes contain richer context information than the regions of the nose and mouth.

Training

All networks are trained by stochastic gradient descent

Experiments

In this section, we firstly illustrate the datasets for the evaluations. Every cascade network for each facial landmark in our method is investigated. Next, the comparison with DCNN is conducted on the same training and validation set. Finally, we compare the proposed ACDCNN with the state-of-the-art methods and commercial software.

Conclusions and future work

In this paper, an adaptive cascade framework for face alignment is proposed. Gaussian distribution is used to bridge the current network input with the previous network output. Each landmark is refined independently based on its previous statistical information on which the adaptively sampling strategy to select training patches depends. The benefit of adaptive sampling lies in that the most relevant image patches are exploited for training deep convolutional neural networks. We show that the

Dong Yuan is associate professor at Beijing University of Posts and Telecommunications, China. He is also invited as “France Telecom — Orange Expert on Solution of Content Service” of France Telecom R&D Group. He received his PhD degree in Shanghai Jiao Tong University at 1999, worked as R&D scientist at Nokia Research Center China from 1999–2001, worked as post-doctoral research staff at Engineering Department Cambridge University UK from 2001–2003. His current research interests include

References (28)

J. Zhang et al.
Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment
Y. Sun et al.
Deep convolutional network cascade for facial point detection
P.N. Belhumeur et al.
Localizing parts of faces using a consensus of exemplars
T.F. Cootes et al.
Robust and accurate shape model fitting using random forest regression voting
X. Zhu et al.
Face detection, pose estimation, and landmark localization in the wild
T.F. Cootes et al.
Active appearance models
X. Gao et al.
A review of active appearance models
Syst. Man Cybern. C Appl. Rev. IEEE Trans.
(2010)
X. Xiong et al.
Supervised descent method and its applications to face alignment
S. Ren et al.
Face alignment at 3000 fps via regressing local binary features
Z. Zhang et al.
Facial landmark detection by deep multi-task learning

N. Wang et al.

Facial Feature Point Detection: a Comprehensive Survey. arXiv, preprint arXiv:1410.1037

(2014)

P. Dollár et al.

Cascaded pose regression

G.E. Hinton et al.

Reducing the dimensionality of data with neural networks

Science

(2006)

M. Valstar et al.

Facial point detection using boosted regression and graph models

Cited by (32)

Securing social platform from misinformation using deep learning
2023, Computer Standards and Interfaces
Citation Excerpt :
For example, fake news regarding the Barack Obama injury was flooded on Twitter in 2013, which led to 130 billion dollars being wiped out in stock value. The problem of misinformation is not new, and has remained since the development of the printing press, although it has recently received a lot of acceleration and exposure due to the ease of access to the Internet, and social media [15,16]. To date, the authenticity of news depends on some fact-checking websites like PolitiFact and Snopes.
People are easily duped by fake news and start to share it on their networks. With high frequency, fake news causes panic and forces people to engage in unethical behavior such as strikes, roadblocks, and similar actions. Thus, counterfeit news detection is highly needed to secure people from misinformation on social platforms. Filtering fake news manually from social media platforms is nearly impossible, as such an act raises security and privacy concerns for users. As a result, it is critical to assess the quality of news early on and prevent it from spreading. In this article, we propose an automated model to identify fake news at an early stage. Machine learning-based models such as Random Forest, Logistic Regression, Naïve Bayes, and K-Nearest Neighbor are used as baseline models, implemented with the features extracted using countvectorizer and tf–idf. The baseline and other existing model outcomes are compared with the proposed deep learning-based Long–Short Term Memory (LSTM) network. Experimental results show that different settings achieved an accuracy of 99.82% and outperformed the baseline and existing models.
PPNNP: A privacy-preserving neural network prediction with separated data providers using multi-client inner-product encryption
2023, Computer Standards and Interfaces
In a neural network of deep learning, it needs a series of algorithms that endeavor to recognize underlying relationships in a set of data. In order to protect the privacy of user’s datasets, traditional schemes can perform the prediction task by setting only a single data provider in the system. However, the data may come from multiple separated data providers rather than single data source in real world since each data provider might hold partial features of a complete prediction sample. It requires that multiple data providers cooperate to perform the prediction for the neural networks by sending their own local data to a well-trained prediction model deployed on a remote cloud server to obtain a predictive label. However, the data owned by multiple data providers usually contain a large amount of private information, which can lead to serious security problems once leaked. To resolve the security and privacy issues of the data owned by multiple data providers, in this paper, we propose a Privacy-Preserving Neural Network Prediction model (PPNNP) that deploys multi-client inner-product functional encryption to the first layer of prediction model. Multiple data providers encrypt their data and upload it to a well-trained model deployed on cloud server, and the server makes predictions by calculating inner-products related to them. It can provide sufficient privacy and security for the data while deploying different neural network architectures with activation functions that are even non-linear on the remote server. We evaluate our scheme based on the real datasets and provide a comparison with the related schemes. Experimental results demonstrate that our scheme can reduce the computational cost of the whole process while significantly reducing the encryption time. It can obtain an accuracy of over 90% in different network architectures with even non-linear activation functions. Meanwhile, our solution can reduce communication overhead in the whole protocol.
Improving face representation learning with center invariant loss
2018, Image and Vision Computing
In this paper, we address on the deep face representation learning with imbalanced data. With a large number of available face images of different people for training, Convolutional Neural Networks could learn deep face representation through classifying these people. However, uniformed distributed data for all people are hard to get. Some people come with more images but some come with less. In learning the deep face representation, the imbalanced images between people introduce the bias towards these people that have more images. Existing methods focus on the intra-class and inter-class variations but not well address the imbalanced data problem. To generate a robust and discriminative face representation for all people, we propose a center invariant loss which adds penalty to the differences between each center of classes. The center invariant loss could align the center of each person to the mean of all centers, which could force the deeply learned face features to have a good representation for all people with better generalization ability. Extensive experiments well demonstrate the effectiveness of the proposed approach. Many existing methods in learning deep face representation are further improved after adding the proposed center invariant loss.
Innovative method for recognizing subgrade defects based on a convolutional neural network
2018, Construction and Building Materials
Citation Excerpt :
The structure is similar to that of biological neural networks [18–20]. Detailed information about CNNs can be found in previous studies [21–27]. In general, robustness is an attractive property of CNNs in civil engineering, where this robustness is evident in terms of their high stability at recognizing different objects such as humans and animals in different conditions.
Subgrade defects originate below the base of an asphalt pavement and they contribute significantly to pavement damage. The detection of subgrade defects is considered challenging because the recognition of defects is difficult. Therefore, the utilization of ground penetrating radar (GPR) to detect subgrade defects has attracted significant interest in recent years. However, the use of manually processed GPR images for classifying defects is inefficient and inaccurate. Thus, in this study, we applied convolutional neural networks (CNNs) to GPR images for automatically classifying subgrade defects (e.g., uneven settlement, sinkholes, and subgrade cracks). Two CNNs called multi-stage CNN and cascade CNN with different structures were established to accomplish the tasks automatically. The main difference between the two CNNs is that the cascade CNN is a classifier 2, which is for recognition and trained only using hard samples. Each CNN was developed in training, validation, and testing processes. Based on the training and testing results, sensitivity analysis was performed to verify the stability of the CNNs. We compared state-of-the-art methods for defect detection and the CNN-based method in order to verify the superior performance of the CNNs. Finally, we tested an application of the CNN-based method to show that it is transferrable to other asphalt pavements. The training results indicated that the cascade CNN classified subgrade defects with 97.35% accuracy during training and 96.80% in validation, while the multi-stage CNN classified subgrade defects with 91.35% accuracy during training and 90.45% in validation. The sensitivity analysis results showed that the cascade CNN exhibited the expected stability in terms of the transmitting frequency, i.e., the frequency of a high-frequency electromagnet wave from the transmitting antenna of the GPR, and different highway structures, whereas the multi-stage CNN did not. In addition, compared with Sobel edge detection and K-value clustering analysis, the CNN-based method obtained more robust performance at subgrade defect detection under various conditions using raw images. These results indicate that the CNN-based method performs well and it can classify subgrade defects in realistic situations.
Improving the robustness of GNP-PCA using the multiagent system
2017, Applied Soft Computing Journal
Citation Excerpt :
Even more, the recognition accuracy of GNP-PCA in the noise free environment has been improved by GNP-MAS using the fine information of facial images. Nowadays, convolution neural networks(CNN) have become the state-of-the-art methods and have been widely studied to solve facial recognition problems [22,23]. However, it still suffers from noisy problems [24].
In order to improve the robustness of Genetic Network Programming fuzzy data mining and PCA (GNP-PCA) based face recognition in the Gaussian and Salt&Pepper noisy testing environments, a GNP-based multi-agent system is constructed using GNP-PCA and multi-resolution analysis in this paper. In the proposed approach, the different scales of training images in the Laplacian pyramid are regarded as sub-environments and each GNP-PCA is performed as an agent in its corresponding environment. Face recognition is finally realized by maximizing the weighted average matching degrees of all the persons in the training database. Experimental results indicate that the proposed method has improved the robustness of GNP-PCA in the Gaussian and Salt&Pepper noisy testing environments considerably.
Innovation for evaluating aggregate angularity based upon 3D convolutional neural network
2017, Construction and Building Materials
The performance of asphalt pavement is significantly influenced by the morphological characteristics of its aggregates, especially on its angularity. Evaluation of aggregate angularity is considered to be challenging because aggregates often have various shapes. Therefore, utilization of digital images in the evaluation of angularity has gained significant research interest in recent years. However, conventional manually processed images for evaluating the angularity of aggregates have the disadvantages of low efficiency and insufficient accuracy. This paper presents a novel application of convolutional neural networks (CNN) using digital images for evaluating aggregate angularity automatically. The research procedure is as follows: (a) develop a self-developed device for the acquisition of aggregate images; (b) establish an evaluation criterion for the angularity index (AI); (c) design a localization CNN and five AI CNNs; and (d) conduct a sensitivity analysis of the CNNs. First, a self-developed device is established based on the view-based approach to extract the 3D information of aggregates. Then, an evaluation criterion that is suitable for 3D images from aggregates is presented. Based on the 3D images and evaluation criterion, one localization CNN and five AI CNNs are jointly used to evaluate the AI of each aggregate. Finally, statistical analysis is performed to seek the optimal parameters for AI CNN, especially the kernel size, and to verify the sensitivity of AI CNN. The analysis includes the sensitivity to kernels size, image resolution, light, texture and aggregate size. The results indicate that the localization CNN is able to locate and abstract each aggregate from the images. The best size of the kernels is 6 × 6, and an AI CNN with a kernel size of 6 × 6 has a 0.0938 relative error for evaluating the AI using 300 PPI images. Moreover, AI CNN with a kernel size of 6 × 6 shows remarkable robustness under different light conditions, sizes and textures of aggregates.

View all citing articles on Scopus

Wu Yue is a postgraduate student at Beijing University of Posts and Telecommunications, China. He received the B.S. degree in Electronic Information Engineering in Beijing University of Posts and Telecommunications at 2013. His current research interests are face tracking, face alignment, face recognition, object detection and deep learning.

^☆: The work is sponsored by the Chinese NSFC project 61372169.

View full text

Adaptive Cascade Deep Convolutional Neural Networks for face alignment☆

Highlights

Abstract

Introduction

Section snippets

Related work

Adaptive Cascade Deep Convolutional Neural Networks

Structures

Training

Experiments

Conclusions and future work

Deep convolutional network cascade for facial point detection

Localizing parts of faces using a consensus of exemplars

Robust and accurate shape model fitting using random forest regression voting

Face detection, pose estimation, and landmark localization in the wild

Active appearance models

A review of active appearance models

Syst. Man Cybern. C Appl. Rev. IEEE Trans.

Supervised descent method and its applications to face alignment

Face alignment at 3000 fps via regressing local binary features

Facial landmark detection by deep multi-task learning

Facial Feature Point Detection: a Comprehensive Survey. arXiv, preprint arXiv:1410.1037

Cascaded pose regression

Reducing the dimensionality of data with neural networks

Science

Facial point detection using boosted regression and graph models