1 Introduction

A multitude of new studies have focused on the way to ameliorate pattern recognition performance. Face recognition (FR) along with a Single Training Sample per Subject (STSS) is a significant difficulty because of the insufficient information got out of the training set in order to predict the likely facial variations within the query image. This case makes it impossible for many learning algorithms to solve this issue because those algorithms require training samples per subject in order to represent the query face image. One of the conventional learning techniques for FR is the LGR technique [26], that consists in producing an intra-class variation dictionary. In order to establish an effective and strong classification/recognition application, the amount of gallery cases per subject is one of the principal difficulties. Recognition with a unique gallery sample by individual/subject, needs data to predict the variety among samples of the subject, except if it is used along with a dictionary [11]. Moreover, in diverse applications, an amount of training samples by person may be available, covering a variety of expressions, diversity in illumination, or occlusion [21].

For many years, FR has been considered as a challenging task because of the variety of types of large face variations. For example, one may cite illuminations, expressions and disguises. Generally, the theory of sparse coding or representation has been employed for signal processing [9]. The Sparse-Representation-based Classification (SRC) theory of face recognition [25] is to stand for a query face image just as a sparse linear compound of training samples from every class, from a dictionary, and hence representing it via just some non-zero coefficients existing within a sparse vector.

An SRC-algorithm-based face recognition through solving a l0-norm minimization problem was proposed by Fan et al. [5]. Indeed, the SRC algorithm non-zero coefficients should focus on the training cases having the same subject as a query sample. Besides, a new kernel sparse coding algorithm for efficient image classification and face recognition has been proposed by Geo et al. [7]. Wang et al. [20] presented a weighted SRC-algorithm-based locality that employed linearity and locality data. Khadhraoui et al. [10] put forward a multimodal biometric system based on the integration of 2D and 3D face modalities. Its main novelty was to apply a Relevance Vector Machine technique for score level fusion. An algorithm applying a sparse multiscale representation based on shearlets to extract the essential geometric content of facial features was proposed by Borgi et al. [2]. This article deals with the issue of available STSS. This subject is linked to a small sample size issue within object recognition. Even though the unique training sample has advantages, the creation of a face database is quick and simple, the storage conditions are modest.

Our algorithm is to identify the possible facial variations caused by a group of labeled faces (the variations group being considered as the label), that is able to classify the variety of any given query face. For this objective, a variant of uLBP called Patch uLBP (PuLBP), is employed as the face attributes. The very high effectiveness of PuLBP in facial recognition shows that it is able to efficiently generate discriminating facial geometric data. The extraction of nearby patches enables the construction of a local gallery dictionary, simultaneously a generic variation dictionary is constructed through the extraction of representative information out of external generic data in order to identify the likely possible facial variations in an automatic way. Every patch of the uLBP query sample is the only representation supplied at the corresponding location via the patch of uLBP gallery dictionary as well as the patch of uLBP variation dictionary. The half-quadratic optimization procedure is usually essential in order to find out a solution to the optimization problem. To conclude, the general residual representation of the uLBP query sample by every class is employed to execute the global classification.

This article is organized as follows: Sect. 2 describes the essential background of the Local Binary Pattern (LBP) versus the uniform LBP. A summary of the proposed approach is offered in Sect. 3. Section 4 presents the experimental results. The conclusion and future work are illustrated in Sect. 5.

2 Background of Local Binary Patterns

The initial Local Binary Patterns (LBP) operator was first offered by Ojala et al. [16] to characterize the image texture. The LBP is a new inexpensive image feature for texture classification/recognition and is successfully used with 2D face recognition [1]. In a formal way, The LBP result could be computed by setting a limit for the difference between the central pixel and its neighbors [1], and is expressed in decimals as:

$$\begin{aligned} LBP(x_c,y_c)=\sum _{k=1}^n s(i_n-i_c)2^n \end{aligned}$$
(1)

where n runs over the 8 neighbors of the central pixel and \(i_c\) and \(i_n\) in are gray-level values of the central pixel and the neighbouring pixels respectively. The function s(n) is defined as:

$$\begin{aligned} s(n)= \left\{ \begin{array}{ll} 1 &{} \text {if } \,n \ge 0,\\ 0 &{} \text {if } \,n < 0. \end{array} \right. \end{aligned}$$
(2)

The elementary LBP has a \(3 \times 3\) patch limitation is not big enough to capture the most important descriptors with the means of large scale structures. The LBP operator can be extended to a further extension named uniform [16]. The latter concept was presented in [16]. It was remarked that a number of binary codes came out as essential characteristics of the texture, and they represented for the big majority of possible models, superior to 90% [1]. These models were named “uniform models” since they had one common thing: At most the circular binary code had two transitions; either one-to-zero or zero-to-one.

To describe the uniformity of a neighborhood \(g_p\) concept, the uniformity measure \(U(LBP_{(P,R)})\) is essential and is defined as:

$$\begin{aligned} U(LBP_{(P,R)})=|s(g_{P-1}-g_c)-s(g_0-g_c)|+ \sum _{p=1}^{P-1} |s(g_p-g_c)-s(g_{P-1}-g_c)| \end{aligned}$$
(3)

The LBP operator \(LBP_{(P,R)}\) generates diverse \(2^p\) output values, corresponding to binary \(2^p\) codes [18]. It could be employed to represent spots, edges, fine lines, and corners. Ojala et al. called them uniform patterns [16], named \(LBP_{(P,R)}^{u2}\). The uniformity measure \(U(LBP_{(P,R)})\), as presented in Eq. (3), records the number of spatial transitions in the bit pattern as well as the uniform format, that includes at best two bit transitions, \(U(LBP_{(P,R)})\le {2}\). The operator uLBP \(LBP_{(P,R)}^{u2}\) is determined as follows:

$$\begin{aligned} LBP_{(P,R)}^{u2}(x,y)\left\{ \begin{array}{rl} I(LBP_{(P,R)}(x,y)) &{} \text {if } \, U(LBP_{(P,R)})\le {2},\\ (P-1)P+1 &{} \text {if } \, U(LBP_{(P,R)})> {2}. \end{array} \right. \end{aligned}$$
(4)

where u2 expressed in Eq. (4), shows that the definition is linked to the uniform patterns with a value \(U\le {2}\).

Using the uniform LBP code had two benefits. First is gaining memory and computation time. And second is that \(LBP_{(P,R)}^{u2}\) can solely recognize significant local textures, as corners, spots, edges and fine line [18]. In fact, Ojala et al. [16] showed that the uLBP contained more than 90% of image information.

3 Overview of Proposed Approach

In order to succeed in determining the class label of the uz query face, an uLBP-LGR-based classification framework can be suggested. Figure 1 presents an overview of the suggested approach. This approach operates in two phases: offline and online.

Fig. 1.
figure 1

Block diagram of proposed approach

To sum up, this can be depicted as a stage of feature extraction that would be based on uLBP and succeeded by an LGR based classification stage. The uLBP can efficiently capture the geometric elements in the facial image, to produce peculiar data of facial variation. This step is essential to ameliorate the recognition performance.

3.1 Offline Phase

In this phase, all the images from the generic training sets and gallery sets are composed as an STSS. The dataset gallery comprises the face images of the training. The generic training dataset is entirely divided into two sub categories or subsets that are: The reference subset \(G^r\) including neutral face images; and the variation subset \(G^v\) that includes variety in illumination, face expression and disguise (with sunglasses and scarf). Both sub categories are approximated by the uLBP for the sake of producing an uLBP reference subset \(uG^r\) as well as an uLBP variation subset \(uG^v\). They have these forms: \(uG^r=[uG_1^r,uG_2^r,\ldots ,uG_m^r,\ldots ,uG_M^r]\) and \(uG^v=[uG_1^v,uG_2^v,\ldots ,uG_m^v,\ldots ,uG_M^v]\), where \(uG_m^v\) is the subset of the \(m^{th}\) variation, \(m=1,2,\ldots ,M\). An uLBP generic subset is then fulfilled via the dissimilitude between the local representations of an uLBP variation subset \(uG^v\) and an uLBP reference subset \(uG^r\).

$$\begin{aligned} uD=[uG_1^v-uG^r,\ldots ,uG_m^v-uG^r,\ldots ,uG_M^v-uG^r] \end{aligned}$$
(5)

The full algorithm of uLBP-LGR-Training can be given as shown in Algorithm 1.

figure a

3.2 Online Phase

In this phase, the test face image is encoded or approximated by the uLBP. This test image is divided into several patches that are presented as \(\{uz_1,uz_2,\ldots ,uz_S\}\). Likewise, the partitioning of the gallery dictionary uX and the generic variation dictionary uD respectively gives, \(\{uX_1,uX_2,\ldots ,uX_S\}\) and \(\{uD_1,uD_2,\ldots ,uD_S\}\). \(uX_i\) and \(uD_i\) are respectively the PuLBP gallery dictionary and the PuLBP generic variation dictionary linked to every local patch \(uz_i\), \(i=1,2,\ldots S\). On the basis of the two dictionaries \(uX_i\) and \(uD_i\), every local patch \(uz_i\) is represented as follows:

$$\begin{aligned} uz_i=uX_i\alpha _i+uD_i\beta _i+e_i, i=1,2,\ldots S \end{aligned}$$
(6)

where \(\alpha _i\) and \(\beta _i\) are respectively vectors of the representation of \(uz_i\) over \(uX_i\) and \(uD_i\), and where \(e_i\) stands for the residual representation. The patch of every dictionary (\(uX_i\) and \(uD_i\)) is the only representation assigned to the corresponding location of every uLBP query sample patch. Lastly, the overall uLBP test sample residual representation by each class is employed to accomplish the global classification. The identity of a test sample uz is calculated as follows:

$$\begin{aligned} \text {label}(uz) = \arg min_k \sum _{i=1}^{S} w_i \frac{\Vert {uz_i-\left[ uX_{i}^{k},uD_i\right] \left[ \alpha _{i}^{k};\beta _i\right] \Vert }_2^2}{\Vert {\left[ \alpha _{i}^{k};\beta _i\right] \Vert }_2^2} \end{aligned}$$
(7)
$$\begin{aligned} \begin{aligned}&\text {Subject to}&uz_i=uX_i\alpha _i+uD_i\beta _i+e_i, i=1,2,\ldots S \end{aligned} \end{aligned}$$

The full algorithm of uLBP-LGR-Classifying can be provided as presented in Algorithm 2.

figure b

3.3 Optimization and Classification

The half-quadratic optimization [15] has been widely used to explain the problem of minimization below as it is the efficient and unique solution to calculate \(\{\alpha _i,\beta _i\}\).

$$\begin{aligned} \min _{\{\alpha _i,\beta _i\}} \sum _{i=1}^{S}l(\Vert {e_i}\Vert _2)+{\lambda }{R}{(\alpha _i,\beta _i)} \end{aligned}$$
(8)

Clearly, to find significantly better solutions of the vectors \(\alpha _i\) and \(\beta _i\), proper regularization must be applied on \(\alpha _i\) and \(\beta _i\) and the residual representation \({e_i}\) makes possible the definition of the suitable loss function. The latter is defined on \(l_2\)-norm of \({e_i}\) is represented by the \(l(\Vert {e_i}\Vert _2)\) and some regularizers imposed on the representation coefficients are provided by \({R}{(\alpha _i,\beta _i)}\).

Now, the task is to calculate the regularize \({R}{(\alpha _i,\beta _i)}\) as well as the loss function \(l(\Vert {e_i}\Vert _2)\). On the basis of the above analysis, one can simply get the below optimal regularization formulation:

$$\begin{aligned} \{\widehat{\alpha _i},\widehat{\beta _i}\}=\min _{\{\alpha _i,\beta _i\}}\Vert {uz_i-uX_i\alpha _i+uD_i\beta _i\Vert }_2^2+\lambda (\Vert {{\alpha _i}\Vert }_2^2+\Vert {{\beta _i}\Vert }_2^2) \end{aligned}$$
(9)

And then we compute the residual representation

$$\begin{aligned} e_i=\Vert {uz_i-uX_i\widehat{\alpha _i}+uD_i\widehat{\beta _i}\Vert }_2 \end{aligned}$$
(10)

After computing the residual representation is in Eq. (10), a Correntropy Induced Metric (CIM) [12] is characterized as:

$$\begin{aligned} CIM(e_i)=(k_\sigma (0)-k_\sigma (e_i))^{1/2} \end{aligned}$$
(11)

where \(k_\sigma (x)\) is the kernel function.

The CIM is employed to measure the residual representation of every patch. Finally, the suggested uLBP-LGR model becomes:

$$\begin{aligned} \min _{\{\alpha _i,\beta _i\}} \sum _{i=1}^{S}(l-k_\sigma (\Vert {e_i}\Vert _2))+\lambda (\Vert {{\alpha _i}\Vert }_2^2+\Vert {{\beta _i}\Vert }_2^2) \end{aligned}$$
(12)
$$\begin{aligned} \begin{aligned}&\text {Subject to}&uz_i=uX_i\alpha _i+uD_i\beta _i+e_i, i=1,2,\ldots S \end{aligned} \end{aligned}$$

The augmented or increased minimization problem of Eq. (12) can be described as:

$$\begin{aligned} \min _{\{A,w\}} \sum _{i=1}^{S}({\frac{1}{2}}w_i\Vert {uz_i-uX_i\alpha _i-uD_i\beta _i\Vert }_2^2+\varphi (w_i))+\lambda \Vert {A\Vert }_2^2 \end{aligned}$$
(13)

where \(A=[a_1,a_2,\ldots ,a_i,\ldots ,a_S]\) with \(a_i= [\alpha _i;\beta _i]\), \(w=[w_1,w_2,\ldots ,w_S]\) and \(\varphi (w_i)\) is the dual function. The above equation can be effectively optimized through the half-quadratic minimization [15], via the alternative update of A and w, when w is fixed, A shall be updated as:

$$\begin{aligned} \widehat{A}=\arg min_{A} \sum _{i=1}^{S}(w_i\Vert {uz_i-uX_i\alpha _i-uD_i\beta _i\Vert }_2^2)+\lambda \Vert {A\Vert }_F^2 \end{aligned}$$
(14)

when A is fixed, the weights w can be solved by

$$\begin{aligned} \widehat{w}_i=\frac{1}{\sigma ^2}exp(\frac{-\Vert {uz_i-uX_i\alpha _i-uD_i\beta _i\Vert }_2^2}{2\sigma ^2}) \end{aligned}$$
(15)

The weight \(w_i\) corresponds to the \(i^{th}\) patch and is used for the control the portion of \(\Vert {e_i\Vert }_2\) in the entire energy of Eq. (13).

3.4 Parameter Setting

In the experimental work, every image is resized and fixed as \(80 \times 80\) pixels and the patch size is fixed as \(20 \times 20\). It is employed for patch/block-based-methods comprising the SRC, the LGR and the suggested uLBP-LGR. Yet, the overlap between the neighboring patches is determined as 10 pixels which signifies that the test sample is divided into S = 25 patches. With the exception of the adjustment of the patch number and the size of plots, the existence of just two parameters set in the previously proposed uLBP-LGR is noted. The first one is the parameter of regularization that is fixed as \(\lambda =0.001\) and used in all experimentations. The second one is the parameter of scale employed in the kernel function \(k_\sigma (x)\).

$$\begin{aligned} \sigma ={\sqrt{\frac{1}{2S}}}\sum _{i=1}^{S}\Vert {uz_i-uX_i\alpha _i+uD_i\beta _i\Vert }_2^2 \end{aligned}$$
(16)

The parameters of competing algorithms are tuned for improved results.

4 Experimental Results

This section, presents our approachs results as well as the evaluation of its performance on the standard face databases that present multiple variations (facial expressions, illumination and disguise). Our results demonstrate the amelioration of the uLBP-LGR over the LGR and other methods and offer a more comprehensive evaluation of the performance of the uLBP-LGR. Our approach is also compared to the state-of-the-art approaches, as the RRC [23], CRC [25], the Directed Acyclic Graph (SVM-DAG) [8], the RKR [24], the one against all (SVM-OAA), the nearest neighbor (NN), the BHDT [3] and the MetaFace [22].

4.1 The Experiments Comparison: FR on FRGCv1, OR, CK+, FEI and GT Databases

This experiment, sums up the results obtained via the application of our proposed approach on the benchmark face databases, including the FEI [6] (200 images), the FRGCv1 [17] (152 images), the Georgia Tech (GT) [4] (50 images), the ORL (40 images) face databases, the Extended Cohn-Kanade [13] (123 images) as well as the AR [14]. All the images are resized and fixed to \(27 \times 32\). Table 1 displays the results obtained by applying our proposed approach on the five databases previously cited.

Table 1. Recognition accuracy on the FRGCv1, ORL, CK+, FEI and GT databases

4.2 Running Time Comparison

All the experimentations were applied on MATLAB version 7.0.1 and the tests were performed on a PC with Intel(R) Core(TM) i3 Processor, clock frequency 2.2 GHz and 4Go RAM. In concrete applications, the training is usually an offline stage and the recognition is usually an online one. Table 2 compares the average running time in seconds on FRGCv1, CK+, OR, FEI, GT and AR databases. Table 2 shows that uLBP-LGR is computationally less intense than the state-of-the-art approaches.

Table 2. The average running time in seconds on FRGCv1, OR, CK+, FEI and GT databases

5 Conclusion

In this paper, we present a novel Patch uLBP-LGR approach for the difficulty task of face recognition with SSPP. The uLBP-LGR takes advantage of the benefits of patch uLBP and generic representation. A generic intra-class variation dictionary has been built on the basis of uLBP generic dataset. A CIM has been adopted to consider the highly non-gaussian distribution of the residual representation concerning various patches, as well as to make the measurement of each patch success. This makes possible a more robust evaluation for the benefit of the various patches in face recognition. The application of this algorithm on four reference face databases has proven the efficiency of the method based on LGR that does not cease to give a better recognition rate in comparison with the preceedingly cited methods of the state-of-the-art STSS.

The considerable numerical tests and the elaborated comparison with usual and state-of-the-art approaches have proved that the article’s proposed uLBP-LGR approach is very competitive even in laborious classification tests. Besides, this method can be applied at a relatively smaller computational costs since it is based on the same effective structure that had been used in the LGR for the classification step.