Adaptive optimal multi-features learning based representation for face hallucination

doi:10.1016/j.eswa.2021.116141

Expert Systems with Applications

Volume 190, 15 March 2022, 116141

https://doi.org/10.1016/j.eswa.2021.116141 Get rights and content

Highlights

•
Multiple image features are obtained for an input image and the LR training samples.
•
The optimal proportion of each image feature is obtained through the GWO.
•
The representation of the LR image features is done by the nearest training features.

Abstract

Face hallucination (FH) is a classical problem to reconstruct a high-resolution (HR) face image for an observed low-resolution (LR) one. The existing methods represent LR facial images though the spatial pixel domain or by combining confined image features with this spatial pixel information. However, the uncertainty in stipulating the optimal proportion for such multiple image features may lead to unexpected results as the optimal proportion for each LR input face image may vary for obtaining the desired HR result. Additionally, they suffer from degraded performance when the observed LR images are contaminated with higher noise. For addressing such problems, this paper proposes an adaptive optimal multi-features proportion learning (OMFPL) scheme, which adopts the Grey Wolf Optimization (GWO) approach for determining the optimum proportion of each feature to represent a particular LR face image. Moreover, an appropriate threshold is applied on different feature samples in the training data for representing the LR patches with their nearest examples. The optimal proportion of these relevant features helps to reconstruct the high-quality HR faces for both noise-free and noisy LR faces. The performance of OMFPL is validated on widely used public databases, real-world images, and surveillance faces, where it achieves the superior results concerning the several competitive state-of-the-art FH methods.

Introduction

The human face is a relevant and compelling research field that has been widely explored in computer vision and pattern recognition. Therefore, several applications, e.g., face detection, tracking, alignment, feature extraction, recognition, face-sketch synthesis, have been introduced and investigated in the facial image analysis. In recent years, these techniques have played an essential role in many real-world tasks, e.g., person identification, human–machine interaction, security management, surveillance systems, law enforcement, digital entertainment (Shi & Zhao, 2019). However, all these face analysis tasks significantly reduce the performance when the input face images are obtained under several undesirable conditions such as distance, illuminations, and low resolution. In particular, resolution deviation in images is often a critical problem which is usually encountered due to the subject’s high distance from the imaging system.

LR images induce several constraints in the real-life face analysis applications that require high quality images to achieve the desired performance. Therefore, before using the original LR images in these applications, it is necessary to obtain their corresponding high definition images. It is possible to restore HR images from the given LR images through the super-resolution (SR) techniques (Wang et al., 2014). Based on the recent literature, the existing SR methods can be grouped into two classes (i) interpolation-based SR (Dong et al., 2013, Sun et al., 2011, Zhang et al., 2012, Zhang and Wu, 2008) and (ii) learning-based SR (Baker and Kanade, 2000, Chang et al., 2004, Jiang et al., 2016, Jiang et al., 2014a, Jiang et al., 2014b, Jiang et al., 2017, Jiang, Yu et al., 2018, Ma et al., 2010). The interpolation-based SR methods obtain the HR images by estimating the prior knowledge (statistical properties) from the available LR images. However, these methods suffer from poor performance when an application requires images with a high magnification factor. In learning-based SR methods, the HR images are rendered by learning the mapping relationship between HR and LR exemplar/training sets. These methods obtain impressive results due to the informative and large training samples, specifically under a higher magnification factor. Thus in this work, we primarily focus on learning-based SR methods.

The pioneering model on FH was first presented in Baker and Kanade (2000) that obtains the HR images by employing the Bayesian approach. Based on the LR image representation schemes, existing learning-based SR methods can be broadly grouped into two subcategories: (i) global-features based, and (ii) local-patch based.

In global-features based approach, the entire LR face image is represented using some traditional statistical model. Few notable representative works are as follows: Principal Component Analysis (PCA) (Wang & Tang, 2005), the kernel-PCA (Chakrabarti et al., 2007), Singular Value Decomposition (SVD) (Jian and Lam, 2015, Jian et al., 2013), Canonical Correlation Analysis (CCA) (Huang et al., 2010), Two dimensional CCA (An & Bhanu, 2014), Locality Preserving Projections (LPP) (Zhuang et al., 2007), and Non-negative Matrix Factorization (NMF) (Wang et al., 2014). These methods compute the optimal coefficients of training samples in LR space for representing the LR face image and then use the similar coefficients in corresponding HR space for obtaining the HR face image. Though the implementation of global methods is easy, these methods show an inability to preserve the detailed facial features, especially when the input image is entirely different from the corresponding training images.

In contrast, the local patch-based methods provide better performance by exploiting the local manifold structure of training data effectively. They first divide the LR image into several small and equal regions called patches, then each of them is represented one by one with the help of training samples. Roweis and Saul (2000) propose a Local linear embedding (LLE) method for non-linear dimensionality reduction that follows an assumption where HR and their equivalent LR patches share the similar spatial structure. Further, considering the same assumption as LLE, Chang et al. (2004) introduce a neighbor-embedding (NE) based SR approach, which reconstructs the target HR patch using the most proximal neighbors of the input LR patch. Jiang et al. (2016) adopt similar idea; they improve the LR images by employing the Tikhonov regularization that minimizes the excessive variation in the data fidelity term of its objective function. Later, Zhang and Cham (2011) introduce the LLE approach in the discrete cosine transformation (DCT) domain to forecast the local features in faces through the AC coefficients.

Considering the similar position of an image patch, Ma et al. (2010) introduce a least-squares representation (LSR) based SR model, which reconstructs the input LR patch using the similar positioned LR training patches. However, it suffers instability in the results when the input patch has a lower dimension than the number of samples in the training. Subsequently, several methods adopt the concept of positioned training patches to reconstruct the target HR image. Jung et al. (2011) address the instability problem in Ma et al. (2010) by introducing the $l_{1}$ -norm constraint, which obtains more stable HR faces. Huang and Wu (2011) present an SR approach by approximating the non-linear relationship between the LR and HR faces using multiple local-linear transformations. Shi et al. (2014) propose a unified FH framework that jointly includes the local and global priors using a local sparsity model and a pixel correlation model together. Later, Jiang et al. (2017) include the smooth priors in the sparse representation based model, which assigns similar coefficients to the similar training samples.

Jiang et al. (2014b) present a locality-constraint representation (LcR) based FH method that shows promising performance by revealing the local topology in the nonlinear manifold. Meanwhile, Jiang et al. (2014a) propose an FH method that reconstructs the HR face using multi-layer LcR with iterative NE and intermediate dictionary learning (LINE). Later, Liu et al. (2018) adopt the idea of locality and proposed a noise-robust FH model, namely LcR with bi-layer representation (RLcBR) which obtains promising results under high impulsive noise. Recently, Jiang, Yu et al. (2018) extend the LcR method by incorporating the contextual patch information and present thresholding based LcR with a reproducing learning scheme (TLcR-RL). Due to the additional context patch information, TLcR-RL achieves better reconstruction performance and is capable of handling small misalignment problems too. However, owing to a large number of patch formation, it takes much time to reconstruct the target HR image, which is not suitable for real-time applications like LR face recognition. Recently, Nagar et al. (2020) propose a threshold-based pixel-wise dictionary learning LcR (T-DLcR) method, which effectively hallucinates the LR faces corrupted with high impulse noise. Further, Nagar et al. (2021) presented a PCA-mate face based mixed-noise robust face SR scheme, namely residual learning-based error suppressed nearest neighbor representation (RLENR) that effectively addresses the mixed-noise problem.

With the advancement in neural networks, many researchers have drawn their attention towards a deep learning strategy for addressing the pattern recognition and computer vision problems. For solving the image SR problem, Dong et al. (2016) first time present a three-layered convolutional neural network (CNN), which learns a mapping function between LR and HR images. Kim et al. (2016) improve the SR performance further by introducing a very deep CNN for HR image reconstruction. Zhu et al. (2016) present a cascaded deep bi-network based FH model that alternatively optimizes the dense field estimation and hallucination. Besides, its gated bi-network recovers the texture details under different levels. Song et al. (2017), first, employ an existing CNN for generating the HR facial components, then apply an enhancement strategy on face image obtained through the training HR images with these HR components. Cao et al. (2017) present a face hallucination model by utilizing an attention-aware scheme, which chooses the most appropriate components of faces using the reinforcement learning. Further, they used an enhancement network to reconstruct the local facial details. Huang et al. (2017) propose a face SR model, which predicts the wavelet coefficients for the training HR images before the target HR image reconstruction and then captures the local texture details and global topology information of facial images through a flexible and extensible CNN. Shi et al. (2019) employ deep reinforcement-learning and propose an attention-aware face SR framework that attains a series of patches and enhances the different facial parts by utilizing the global inter-dependency of images. Recently, Li et al. (2020) propose a deep face dictionary network (DFDNet) for blind face restoration. It employs K-means approach for generating the deep dictionaries of perceptually significant face components. Though the aforementioned deep-learning methods attain magnificent SR performance for noise-free facial images, they fail when the input LR faces are contaminated with heavy noise.

Recently, some researchers focus on exploiting different image features for attaining complementary and rich details in the target reconstructed facial image. For example, Liu et al. (2020) propose a multiple-feature learning and hierarchical structure (MLHS) based face SR model, which utilizes several horizontal and vertical gradients features under LR space. Moreover, it employs two levels, a middle layer and a higher layer, for hallucinating the LR input faces. Chen et al. (2019) present a joint-learning-based contextual face SR model for noisy LR faces, which jointly utilizes the HOG features with subdivided contextual-patch information to capture complementary details in the reconstructed HR face image. Though these methods improve the reconstruction performance for the LR face images that are affected with very small Additive White Gaussian Noise (AWGN), e.g., noise density of $5 % - 10 %$ , they are unable to effectively deal with the situation when the LR images are corrupted by high noise.

Recently, few face SR methods, e.g., (Chen et al., 2019, Liu et al., 2020) utilize a number of image features along with the basic pixel information. Here, the problem is fixing of predefined proportion of these features for all the input images that may have individual local geometrical information. The existing feature-based methods have fixed these values by experimentally analyzing their reconstruction results for a set of test images. Practically, it is not an optimal solution to calculate the appropriate proportion of each feature for different input LR images. For better reconstruction, it is possible to obtain an accurate proportion of these features for the individual test images using some existing optimization techniques. Recently, swarm intelligence (SI) based approaches have been shown achieving great success in solving such problems. SI based approaches mimic the interesting social behavior of swarms, flocks, herds, or schools of creatures in the environment. Some of the recent and popular SI based methods include Particle swarm optimization (PSO) (Kennedy & Eberhart, 1995), Ant colony optimization (ACO) (Dorigo et al., 2006), Artificial bee colony algorithm (ABC) (Basturk & Karaboga, 2006), Bat-inspired algorithm (BA) (Yang, 2010), Cuckoo search algorithm (CS) (Yang & Deb, 2009). The social behavior of birds-flocking mainly influences the PSO method. It utilizes multiple particles (solutions) that follow the best particle (a solution with the best position) and their personal best positions acquired so far. The ACO algorithm is inspired by the collective behavior of ants of foraging using the shortest path. The ABC is also a prominent algorithm that mimics the social functioning of bees for finding the source of food. The BA algorithm is inspired by the echolocation behavior of bats for finding and hunting their prey. The CS algorithm is inspired by the brooding activity of a few cuckoo species that lay their eggs in the nest of other host birds.

Recently, Mirjalili et al. (2014) propose a Grey wolf optimization (GWO) algorithm that mimics the social behavior of grey wolves for searching and hunting the prey. It effectively solves the optimization problem for several numerical functions without stuck in the local optima. Moreover, in GWO, only a single parameter is tuned. Hence, it is more suitable for solving the problems in real-world. Due to its significant advantages, GWO has been widely employed to solve various applications in computer science, e.g., image processing (Li et al., 2016, Rajput et al., 2019), machine learning (Emary et al., 2016, Mosavi et al., 2016), scheduling applications (Abualigah et al., 2020, Jiang and Zhang, 2018, Jiang et al., 2018), system reliability (Kumar et al., 2017, Kumar et al., 2019). Due to its simplicity and convergence capability, it can also be adopted for finding the solution of the problem as mentioned earlier, i.e., finding the optimal proportion of multiple features to adequately represent the LR face image using the LR-HR training samples.

Though the aforementioned methods have been found achieving satisfactory performance to solve the image SR problem, they possess few limitations. First, as image attributes, only spatial pixel information along with sole image features (e.g., gradient features, HoG features) have exploited to learn the mapping function or coefficient representation in corresponding LR-HR space. However, this limited feature information is not sufficient to represent the LR image accurately, resulting in unsatisfactory HR image reconstruction. Indeed, considering the diverse image features may capture several additional pattern descriptions that can provide complementary information to obtain better reconstruction results. Secondly, the inference of sharing a similar manifold structure between the LR and HR space is not always satisfied due to the one-to-many relationship between HR and LR samples. Consequently, the geometry of the LR patch space cannot represent the real conditions (neighboring relationship) of the HR space. Third, the traditional methods generate inappropriate reconstruction coefficients when the appearance of LR observation and the training samples is quite different due to several environmental factors, e.g., noise in the camera sensor, varying illuminations, hazy images. Therefore, the above methods are unable to achieve the desired reconstruction performance, especially for noisy LR images. In order to resolve the above limitations, this paper presents an adaptive multi-feature learning-based neighbor representation to hallucinate both the noise-less and noisy LR facial images effectively. The primary contributions of this work are summarized as follows:

•
Incorporate several informative image features of the transformation domain, e.g., multi-order gradients, contrast preserving features, multi-resolution features, along with spatial pixel information, to achieve certainty and high reconstruction performance.
•
Formulate a min-type objective function for determining the adaptive and optimal proportion of each predetermined image feature and employ the widely adopted GWO to obtain an optimal and stable solution.
•
The optimal representation of the observed LR image and its corresponding features are obtained by determining the appropriate threshold values for their respective LR training samples.
•
The performance of the presented SR model is validated on two standard face databases, one real-world image dataset, and some locally captured low-resolution surveillance images.

The rest of the paper is organized as follows. Section 2 defines the mathematical notations used in this work and provides a brief review of some seminal position-patch-based face SR methods. Section 3 consists of the proposed methodology in detail. Section 4 presents the results of various experiments under optimal parameter settings that justifies the effectiveness of the proposed face SR method concerning to the existing state-of-the-art SR methods. The conclusion and inherent future work are presented in Section 5.

Section snippets

Related work

In recent years, position-patch-based face SR methods have gained more attention owing to their better reconstruction capability, especially under substantial magnification factors. Moreover, these methods obtain different reconstruction coefficients for near similar local patch representation, which help to attain more detailed reconstruction features. Considering the above-mentioned advantages, we follow the position-patch based mechanism to reconstruct the target HR face image. A brief

Proposed methodology

This section comprises a detailed description of the proposed OMFPL method. The flow diagram of OMFPL is depicted in Fig. 1. The complete working pipeline of the proposed face hallucination method is depicted in Fig. 2.

Experimental results and evaluation

This section evaluates the quantitative and subjective SR performance of the proposed OMFPL method on different face image databases. We investigate its effectiveness on two publicly available face databases (i) FEI database (Thomaz & Giraldi, 2010) and (ii) CAS-PEAL-R1 database (Gao et al., 2008), a real-world image dataset named CMU＋MIT (Rowley et al., 1998), and a local surveillance image dataset named ABV-IIITM faces (Nagar et al., 2020).

Conclusion and future work

This work presents an adaptive multiple-image-features proportion learning (OMFPL) based model for hallucinating the LR face images. It employs several image features for preserving additional detail in the target HR image. Before representing an LR image through multiple-features arbitrarily, the proposed method adopts the GWO approach for computing the optimum proportion of each of these features. Besides, a reasonable limit has been applied to various feature training samples to represent LR

CRediT authorship contribution statement

Surendra Nagar: Conceptualization, Methodology, Data curation, Writing – original draft, Writing – review & editing. Ankush Jain: Methodology, Data curation, Visualization, Writing – original draft, Writing – review & editing. Pramod Kumar Singh: Supervision, Investigation, Writing – review & editing. Ajay Kumar: Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (61)

AnL. et al.
Face image super-resolution using 2D CCA
Signal Processing
(2014)
EmaryE. et al.
Binary grey wolf optimization approaches for feature selection
Neurocomputing
(2016)
HuangH. et al.
Super-resolution of human face image using canonical correlation analysis
Pattern Recognition
(2010)
JianM. et al.
A novel face-hallucination scheme based on singular value decomposition
Pattern Recognition
(2013)
JiangJ. et al.
Noise robust position-patch based face super-resolution via Tikhonov regularized neighbor representation
Information Sciences
(2016)
LiuL. et al.
Face hallucination via multiple feature learning with hierarchical structure
Information Sciences
(2020)
MaX. et al.
Hallucinating face by position-patch
Pattern Recognition
(2010)
MirjaliliS. et al.
Grey wolf optimizer
Advances in Engineering Software
(2014)
NagarS. et al.
Pixel-wise dictionary learning based locality-constrained representation for noise robust face hallucination
Digital Signal Processing
(2020)
NagarS. et al.
Mixed-noise robust face super-resolution through residual-learning based error suppressed nearest neighbor representation
Information Sciences
(2021)

ShiJ. et al.

Global consistency, local sparsity and pixel correlation: A unified framework for face hallucination

Pattern Recognition

(2014)

ThomazC.E. et al.

A new ranking method for principal components analysis and its application to face image analysis

Image and Vision Computing

(2010)

ZhuangY. et al.

Hallucinating faces: Lph super-resolution and neighbor reconstruction for residue compensation

Pattern Recognition

(2007)

AbualigahL. et al.

Ts-gwo: IoT tasks scheduling in cloud computing using grey wolf optimizer

Baker, S., & Kanade, T. (2000). Hallucinating faces. In Proceedings fourth IEEE international conference on automatic...

Basturk, B., & Karaboga, D. (2006). An artificial bee colony (ABC) algorithm for numeric function optimization. In...

Cao, Q., Lin, L., Shi, Y., Liang, X., & Li, G. (2017). Attention-aware face hallucination via deep reinforcement...

ChakrabartiA. et al.

Super-resolution of face images using kernel PCA-based prior

IEEE Transactions on Multimedia

(2007)

Chang, Hong, Yeung, Dit-Yan, & Xiong, Yimin (2004). Super-resolution through neighbor embedding. In Proceedings of the...

ChenL. et al.

Robust face image super-resolution via joint learning of subdivided contextual model

IEEE Transactions on Image Processing

(2019)

DongC. et al.

Image super-resolution using deep convolutional networks

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2016)

DongW. et al.

Nonlocally centralized sparse representation for image restoration

IEEE Transactions on Image Processing

(2013)

DorigoM. et al.

Ant colony optimization

IEEE Computational Intelligence Magazine

(2006)

GaoW. et al.

The cas-peal large-scale chinese face database and baseline evaluations

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

(2008)

Huang, H., He, R., Sun, Z., & Tan, T. (2017). Wavelet-SRNet: A wavelet-based cnn for multi-scale face super resolution....

HuangH. et al.

Fast facial image super-resolution via local linear transformations for resource-limited applications

IEEE Transactions on Circuits and Systems for Video Technology

(2011)

JianM. et al.

Simultaneous hallucination and recognition of low-resolution faces based on singular value decomposition

IEEE Transactions on Circuits and Systems for Video Technology

(2015)

JiangJ. et al.

Face super-resolution via multilayer locality-constrained iterative neighbor embedding and intermediate dictionary learning

IEEE Transactions on Image Processing

(2014)

JiangJ. et al.

Noise robust face hallucination via locality-constrained representation

IEEE Transactions on Multimedia

(2014)

JiangJ. et al.

Noise robust face image super-resolution through smooth sparse representation

IEEE Transactions on Cybernetics

(2017)

Cited by (3)

Superpixel-guided locality quaternion representation for color face hallucination
2022, Information Sciences
Recently, the learning based methods have been well exploited to hallucinate grayscale face images. When facing color images, however, the previous approaches either suffering from the non-flexibility for arbitrary pattern shapes or ignoring the inherent color information. To address these concerns, in this paper we propose a new learning model named as Superpixel-guided Locality Quaternion Representation (SLQR) for color face hallucination. Rather than handling squared patches with fixed size, the proposed method handles superpixels with adaptive shapes segmented from face images according to semantic contents, which can well preserve the face spatial features. Moreover, the superpixels are mapped into quaternion space to exploit the inherent spectral information for color image reconstruction. In addition, considering that images are inevitably corrupted by noise in practice, we extend the SLQR to the robust version (W-SLQR) by introducing a proper reweighting strategy into the objective function to suppress noise. Compared to some state-of-the-art methods, various experiments have been conducted to verify the superiority of our proposed methods in hallucinating clean and noisy face images.
Structural similarity-based Bi-representation through true noise level for noise-robust face super-resolution
2023, Multimedia Tools and Applications
Feasible Coalitions Formation and Tasks Allocation in Heterogeneous Multi Agent System: A Similarity Matching Based Approach
2022, SSRN

View full text

Adaptive optimal multi-features learning based representation for face hallucination

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed methodology

Experimental results and evaluation

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Signal Processing

Neurocomputing

Pattern Recognition

Pattern Recognition

Information Sciences

Information Sciences

Pattern Recognition

Advances in Engineering Software

Digital Signal Processing

Information Sciences

Pattern Recognition

Image and Vision Computing

Pattern Recognition

Ts-gwo: IoT tasks scheduling in cloud computing using grey wolf optimizer

Super-resolution of face images using kernel PCA-based prior

IEEE Transactions on Multimedia

Robust face image super-resolution via joint learning of subdivided contextual model

IEEE Transactions on Image Processing

Image super-resolution using deep convolutional networks

IEEE Transactions on Pattern Analysis and Machine Intelligence

Nonlocally centralized sparse representation for image restoration

IEEE Transactions on Image Processing

Ant colony optimization

IEEE Computational Intelligence Magazine

The cas-peal large-scale chinese face database and baseline evaluations

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Fast facial image super-resolution via local linear transformations for resource-limited applications

IEEE Transactions on Circuits and Systems for Video Technology

Simultaneous hallucination and recognition of low-resolution faces based on singular value decomposition

IEEE Transactions on Circuits and Systems for Video Technology

Face super-resolution via multilayer locality-constrained iterative neighbor embedding and intermediate dictionary learning

IEEE Transactions on Image Processing

Noise robust face hallucination via locality-constrained representation

IEEE Transactions on Multimedia

Noise robust face image super-resolution through smooth sparse representation

IEEE Transactions on Cybernetics