Adaptive optimal multi-features learning based representation for face hallucination
Introduction
The human face is a relevant and compelling research field that has been widely explored in computer vision and pattern recognition. Therefore, several applications, e.g., face detection, tracking, alignment, feature extraction, recognition, face-sketch synthesis, have been introduced and investigated in the facial image analysis. In recent years, these techniques have played an essential role in many real-world tasks, e.g., person identification, human–machine interaction, security management, surveillance systems, law enforcement, digital entertainment (Shi & Zhao, 2019). However, all these face analysis tasks significantly reduce the performance when the input face images are obtained under several undesirable conditions such as distance, illuminations, and low resolution. In particular, resolution deviation in images is often a critical problem which is usually encountered due to the subject’s high distance from the imaging system.
LR images induce several constraints in the real-life face analysis applications that require high quality images to achieve the desired performance. Therefore, before using the original LR images in these applications, it is necessary to obtain their corresponding high definition images. It is possible to restore HR images from the given LR images through the super-resolution (SR) techniques (Wang et al., 2014). Based on the recent literature, the existing SR methods can be grouped into two classes (i) interpolation-based SR (Dong et al., 2013, Sun et al., 2011, Zhang et al., 2012, Zhang and Wu, 2008) and (ii) learning-based SR (Baker and Kanade, 2000, Chang et al., 2004, Jiang et al., 2016, Jiang et al., 2014a, Jiang et al., 2014b, Jiang et al., 2017, Jiang, Yu et al., 2018, Ma et al., 2010). The interpolation-based SR methods obtain the HR images by estimating the prior knowledge (statistical properties) from the available LR images. However, these methods suffer from poor performance when an application requires images with a high magnification factor. In learning-based SR methods, the HR images are rendered by learning the mapping relationship between HR and LR exemplar/training sets. These methods obtain impressive results due to the informative and large training samples, specifically under a higher magnification factor. Thus in this work, we primarily focus on learning-based SR methods.
The pioneering model on FH was first presented in Baker and Kanade (2000) that obtains the HR images by employing the Bayesian approach. Based on the LR image representation schemes, existing learning-based SR methods can be broadly grouped into two subcategories: (i) global-features based, and (ii) local-patch based.
In global-features based approach, the entire LR face image is represented using some traditional statistical model. Few notable representative works are as follows: Principal Component Analysis (PCA) (Wang & Tang, 2005), the kernel-PCA (Chakrabarti et al., 2007), Singular Value Decomposition (SVD) (Jian and Lam, 2015, Jian et al., 2013), Canonical Correlation Analysis (CCA) (Huang et al., 2010), Two dimensional CCA (An & Bhanu, 2014), Locality Preserving Projections (LPP) (Zhuang et al., 2007), and Non-negative Matrix Factorization (NMF) (Wang et al., 2014). These methods compute the optimal coefficients of training samples in LR space for representing the LR face image and then use the similar coefficients in corresponding HR space for obtaining the HR face image. Though the implementation of global methods is easy, these methods show an inability to preserve the detailed facial features, especially when the input image is entirely different from the corresponding training images.
In contrast, the local patch-based methods provide better performance by exploiting the local manifold structure of training data effectively. They first divide the LR image into several small and equal regions called patches, then each of them is represented one by one with the help of training samples. Roweis and Saul (2000) propose a Local linear embedding (LLE) method for non-linear dimensionality reduction that follows an assumption where HR and their equivalent LR patches share the similar spatial structure. Further, considering the same assumption as LLE, Chang et al. (2004) introduce a neighbor-embedding (NE) based SR approach, which reconstructs the target HR patch using the most proximal neighbors of the input LR patch. Jiang et al. (2016) adopt similar idea; they improve the LR images by employing the Tikhonov regularization that minimizes the excessive variation in the data fidelity term of its objective function. Later, Zhang and Cham (2011) introduce the LLE approach in the discrete cosine transformation (DCT) domain to forecast the local features in faces through the AC coefficients.
Considering the similar position of an image patch, Ma et al. (2010) introduce a least-squares representation (LSR) based SR model, which reconstructs the input LR patch using the similar positioned LR training patches. However, it suffers instability in the results when the input patch has a lower dimension than the number of samples in the training. Subsequently, several methods adopt the concept of positioned training patches to reconstruct the target HR image. Jung et al. (2011) address the instability problem in Ma et al. (2010) by introducing the -norm constraint, which obtains more stable HR faces. Huang and Wu (2011) present an SR approach by approximating the non-linear relationship between the LR and HR faces using multiple local-linear transformations. Shi et al. (2014) propose a unified FH framework that jointly includes the local and global priors using a local sparsity model and a pixel correlation model together. Later, Jiang et al. (2017) include the smooth priors in the sparse representation based model, which assigns similar coefficients to the similar training samples.
Jiang et al. (2014b) present a locality-constraint representation (LcR) based FH method that shows promising performance by revealing the local topology in the nonlinear manifold. Meanwhile, Jiang et al. (2014a) propose an FH method that reconstructs the HR face using multi-layer LcR with iterative NE and intermediate dictionary learning (LINE). Later, Liu et al. (2018) adopt the idea of locality and proposed a noise-robust FH model, namely LcR with bi-layer representation (RLcBR) which obtains promising results under high impulsive noise. Recently, Jiang, Yu et al. (2018) extend the LcR method by incorporating the contextual patch information and present thresholding based LcR with a reproducing learning scheme (TLcR-RL). Due to the additional context patch information, TLcR-RL achieves better reconstruction performance and is capable of handling small misalignment problems too. However, owing to a large number of patch formation, it takes much time to reconstruct the target HR image, which is not suitable for real-time applications like LR face recognition. Recently, Nagar et al. (2020) propose a threshold-based pixel-wise dictionary learning LcR (T-DLcR) method, which effectively hallucinates the LR faces corrupted with high impulse noise. Further, Nagar et al. (2021) presented a PCA-mate face based mixed-noise robust face SR scheme, namely residual learning-based error suppressed nearest neighbor representation (RLENR) that effectively addresses the mixed-noise problem.
With the advancement in neural networks, many researchers have drawn their attention towards a deep learning strategy for addressing the pattern recognition and computer vision problems. For solving the image SR problem, Dong et al. (2016) first time present a three-layered convolutional neural network (CNN), which learns a mapping function between LR and HR images. Kim et al. (2016) improve the SR performance further by introducing a very deep CNN for HR image reconstruction. Zhu et al. (2016) present a cascaded deep bi-network based FH model that alternatively optimizes the dense field estimation and hallucination. Besides, its gated bi-network recovers the texture details under different levels. Song et al. (2017), first, employ an existing CNN for generating the HR facial components, then apply an enhancement strategy on face image obtained through the training HR images with these HR components. Cao et al. (2017) present a face hallucination model by utilizing an attention-aware scheme, which chooses the most appropriate components of faces using the reinforcement learning. Further, they used an enhancement network to reconstruct the local facial details. Huang et al. (2017) propose a face SR model, which predicts the wavelet coefficients for the training HR images before the target HR image reconstruction and then captures the local texture details and global topology information of facial images through a flexible and extensible CNN. Shi et al. (2019) employ deep reinforcement-learning and propose an attention-aware face SR framework that attains a series of patches and enhances the different facial parts by utilizing the global inter-dependency of images. Recently, Li et al. (2020) propose a deep face dictionary network (DFDNet) for blind face restoration. It employs K-means approach for generating the deep dictionaries of perceptually significant face components. Though the aforementioned deep-learning methods attain magnificent SR performance for noise-free facial images, they fail when the input LR faces are contaminated with heavy noise.
Recently, some researchers focus on exploiting different image features for attaining complementary and rich details in the target reconstructed facial image. For example, Liu et al. (2020) propose a multiple-feature learning and hierarchical structure (MLHS) based face SR model, which utilizes several horizontal and vertical gradients features under LR space. Moreover, it employs two levels, a middle layer and a higher layer, for hallucinating the LR input faces. Chen et al. (2019) present a joint-learning-based contextual face SR model for noisy LR faces, which jointly utilizes the HOG features with subdivided contextual-patch information to capture complementary details in the reconstructed HR face image. Though these methods improve the reconstruction performance for the LR face images that are affected with very small Additive White Gaussian Noise (AWGN), e.g., noise density of , they are unable to effectively deal with the situation when the LR images are corrupted by high noise.
Recently, few face SR methods, e.g., (Chen et al., 2019, Liu et al., 2020) utilize a number of image features along with the basic pixel information. Here, the problem is fixing of predefined proportion of these features for all the input images that may have individual local geometrical information. The existing feature-based methods have fixed these values by experimentally analyzing their reconstruction results for a set of test images. Practically, it is not an optimal solution to calculate the appropriate proportion of each feature for different input LR images. For better reconstruction, it is possible to obtain an accurate proportion of these features for the individual test images using some existing optimization techniques. Recently, swarm intelligence (SI) based approaches have been shown achieving great success in solving such problems. SI based approaches mimic the interesting social behavior of swarms, flocks, herds, or schools of creatures in the environment. Some of the recent and popular SI based methods include Particle swarm optimization (PSO) (Kennedy & Eberhart, 1995), Ant colony optimization (ACO) (Dorigo et al., 2006), Artificial bee colony algorithm (ABC) (Basturk & Karaboga, 2006), Bat-inspired algorithm (BA) (Yang, 2010), Cuckoo search algorithm (CS) (Yang & Deb, 2009). The social behavior of birds-flocking mainly influences the PSO method. It utilizes multiple particles (solutions) that follow the best particle (a solution with the best position) and their personal best positions acquired so far. The ACO algorithm is inspired by the collective behavior of ants of foraging using the shortest path. The ABC is also a prominent algorithm that mimics the social functioning of bees for finding the source of food. The BA algorithm is inspired by the echolocation behavior of bats for finding and hunting their prey. The CS algorithm is inspired by the brooding activity of a few cuckoo species that lay their eggs in the nest of other host birds.
Recently, Mirjalili et al. (2014) propose a Grey wolf optimization (GWO) algorithm that mimics the social behavior of grey wolves for searching and hunting the prey. It effectively solves the optimization problem for several numerical functions without stuck in the local optima. Moreover, in GWO, only a single parameter is tuned. Hence, it is more suitable for solving the problems in real-world. Due to its significant advantages, GWO has been widely employed to solve various applications in computer science, e.g., image processing (Li et al., 2016, Rajput et al., 2019), machine learning (Emary et al., 2016, Mosavi et al., 2016), scheduling applications (Abualigah et al., 2020, Jiang and Zhang, 2018, Jiang et al., 2018), system reliability (Kumar et al., 2017, Kumar et al., 2019). Due to its simplicity and convergence capability, it can also be adopted for finding the solution of the problem as mentioned earlier, i.e., finding the optimal proportion of multiple features to adequately represent the LR face image using the LR-HR training samples.
Though the aforementioned methods have been found achieving satisfactory performance to solve the image SR problem, they possess few limitations. First, as image attributes, only spatial pixel information along with sole image features (e.g., gradient features, HoG features) have exploited to learn the mapping function or coefficient representation in corresponding LR-HR space. However, this limited feature information is not sufficient to represent the LR image accurately, resulting in unsatisfactory HR image reconstruction. Indeed, considering the diverse image features may capture several additional pattern descriptions that can provide complementary information to obtain better reconstruction results. Secondly, the inference of sharing a similar manifold structure between the LR and HR space is not always satisfied due to the one-to-many relationship between HR and LR samples. Consequently, the geometry of the LR patch space cannot represent the real conditions (neighboring relationship) of the HR space. Third, the traditional methods generate inappropriate reconstruction coefficients when the appearance of LR observation and the training samples is quite different due to several environmental factors, e.g., noise in the camera sensor, varying illuminations, hazy images. Therefore, the above methods are unable to achieve the desired reconstruction performance, especially for noisy LR images. In order to resolve the above limitations, this paper presents an adaptive multi-feature learning-based neighbor representation to hallucinate both the noise-less and noisy LR facial images effectively. The primary contributions of this work are summarized as follows:
- •
Incorporate several informative image features of the transformation domain, e.g., multi-order gradients, contrast preserving features, multi-resolution features, along with spatial pixel information, to achieve certainty and high reconstruction performance.
- •
Formulate a min-type objective function for determining the adaptive and optimal proportion of each predetermined image feature and employ the widely adopted GWO to obtain an optimal and stable solution.
- •
The optimal representation of the observed LR image and its corresponding features are obtained by determining the appropriate threshold values for their respective LR training samples.
- •
The performance of the presented SR model is validated on two standard face databases, one real-world image dataset, and some locally captured low-resolution surveillance images.
The rest of the paper is organized as follows. Section 2 defines the mathematical notations used in this work and provides a brief review of some seminal position-patch-based face SR methods. Section 3 consists of the proposed methodology in detail. Section 4 presents the results of various experiments under optimal parameter settings that justifies the effectiveness of the proposed face SR method concerning to the existing state-of-the-art SR methods. The conclusion and inherent future work are presented in Section 5.
Section snippets
Related work
In recent years, position-patch-based face SR methods have gained more attention owing to their better reconstruction capability, especially under substantial magnification factors. Moreover, these methods obtain different reconstruction coefficients for near similar local patch representation, which help to attain more detailed reconstruction features. Considering the above-mentioned advantages, we follow the position-patch based mechanism to reconstruct the target HR face image. A brief
Proposed methodology
This section comprises a detailed description of the proposed OMFPL method. The flow diagram of OMFPL is depicted in Fig. 1. The complete working pipeline of the proposed face hallucination method is depicted in Fig. 2.
Experimental results and evaluation
This section evaluates the quantitative and subjective SR performance of the proposed OMFPL method on different face image databases. We investigate its effectiveness on two publicly available face databases (i) FEI database (Thomaz & Giraldi, 2010) and (ii) CAS-PEAL-R1 database (Gao et al., 2008), a real-world image dataset named CMU+MIT (Rowley et al., 1998), and a local surveillance image dataset named ABV-IIITM faces (Nagar et al., 2020).
Conclusion and future work
This work presents an adaptive multiple-image-features proportion learning (OMFPL) based model for hallucinating the LR face images. It employs several image features for preserving additional detail in the target HR image. Before representing an LR image through multiple-features arbitrarily, the proposed method adopts the GWO approach for computing the optimum proportion of each of these features. Besides, a reasonable limit has been applied to various feature training samples to represent LR
CRediT authorship contribution statement
Surendra Nagar: Conceptualization, Methodology, Data curation, Writing – original draft, Writing – review & editing. Ankush Jain: Methodology, Data curation, Visualization, Writing – original draft, Writing – review & editing. Pramod Kumar Singh: Supervision, Investigation, Writing – review & editing. Ajay Kumar: Supervision, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (61)
- et al.
Face image super-resolution using 2D CCA
Signal Processing
(2014) - et al.
Binary grey wolf optimization approaches for feature selection
Neurocomputing
(2016) - et al.
Super-resolution of human face image using canonical correlation analysis
Pattern Recognition
(2010) - et al.
A novel face-hallucination scheme based on singular value decomposition
Pattern Recognition
(2013) - et al.
Noise robust position-patch based face super-resolution via Tikhonov regularized neighbor representation
Information Sciences
(2016) - et al.
Face hallucination via multiple feature learning with hierarchical structure
Information Sciences
(2020) - et al.
Hallucinating face by position-patch
Pattern Recognition
(2010) - et al.
Grey wolf optimizer
Advances in Engineering Software
(2014) - et al.
Pixel-wise dictionary learning based locality-constrained representation for noise robust face hallucination
Digital Signal Processing
(2020) - et al.
Mixed-noise robust face super-resolution through residual-learning based error suppressed nearest neighbor representation
Information Sciences
(2021)
Global consistency, local sparsity and pixel correlation: A unified framework for face hallucination
Pattern Recognition
A new ranking method for principal components analysis and its application to face image analysis
Image and Vision Computing
Hallucinating faces: Lph super-resolution and neighbor reconstruction for residue compensation
Pattern Recognition
Ts-gwo: IoT tasks scheduling in cloud computing using grey wolf optimizer
Super-resolution of face images using kernel PCA-based prior
IEEE Transactions on Multimedia
Robust face image super-resolution via joint learning of subdivided contextual model
IEEE Transactions on Image Processing
Image super-resolution using deep convolutional networks
IEEE Transactions on Pattern Analysis and Machine Intelligence
Nonlocally centralized sparse representation for image restoration
IEEE Transactions on Image Processing
Ant colony optimization
IEEE Computational Intelligence Magazine
The cas-peal large-scale chinese face database and baseline evaluations
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Fast facial image super-resolution via local linear transformations for resource-limited applications
IEEE Transactions on Circuits and Systems for Video Technology
Simultaneous hallucination and recognition of low-resolution faces based on singular value decomposition
IEEE Transactions on Circuits and Systems for Video Technology
Face super-resolution via multilayer locality-constrained iterative neighbor embedding and intermediate dictionary learning
IEEE Transactions on Image Processing
Noise robust face hallucination via locality-constrained representation
IEEE Transactions on Multimedia
Noise robust face image super-resolution through smooth sparse representation
IEEE Transactions on Cybernetics
Cited by (3)
Superpixel-guided locality quaternion representation for color face hallucination
2022, Information SciencesStructural similarity-based Bi-representation through true noise level for noise-robust face super-resolution
2023, Multimedia Tools and Applications