Keywords

1 Introduction

The last years have seen an extensive investigation of image usage for human identification and authentication. Even though biometric technologies, such as fingerprint and iris scan, seem to be more accurate, they require more human collaboration than face recognition techniques. Moreover, the creation of 3D imaging technologies has brought a further boost in the development of face recognition. In fact, the new generation of acquisition devices is now capable of capturing the geometry of 3D objects in the three-dimensional physical space.

Besides shape information, face imaging, in general, has emerged as promising modality with respect to other biometrics recognition techniques, such as universal acceptance and non-invasiveness. Moreover, 3D face imaging addresses some limitations of its 2D counterpart, like pose and luminance variation, while opening-up new horizons for enhancing the reliability of face-based identification systems [5]. This trend has been further fueled by the advances of 3D scanning technology, which provides now 3D textured scans encompassing aligned shape and photometric data.

In this paper, after a brief literature review, an explanation of Mesh-LBP framework is given (Sect. 2), then an outline of the proposed approach (Sect. 3.1), analyzing its potentialities; finally, some preliminary results are presented (Sect. 4) to support our proposal.

1.1 Related Works

The state of the art is plenty of 3D face recognition approaches, making impossible to analyze all of them. Instead, we are going to present the works that guided our decisions categorizing them into three categories.

First we have approaches that base their strength in the local description given by Fiducial Points. Such methods use local representation of a face natively supporting partial matching, and in the last years are gaining credit in the community. In fact, a face can be described as a whole (global representation), or as combination of local partitions. Each partition, or region, is represented by a descriptor [25], and the combination of such descriptors is the representation of the face. Using fiducial points, it is also possible to get a face matching that handle face expressions distortions; it is in fact possible isolate and discard regions that are highly affected by such expression deformation, mouth and eyebrows above all. One of the first proposed approach [16], presented a keypoints detector based on SIFT [12]. However, it did not account partial scan and face rotation. Later on [14, 19], presented a SIFT based method modeled to work on mesh manifold instead of standard flat images. That new born mesh-SIFT has been used in [9] together with the Sparse Representation based Classifier (SRC) [24] to boost the keypoint matching.

As second category is composed by all the Local Binary Pattern based approaches. LBP has been proposed in [2] as a 2D descriptor that well performed in texture retrieval problems. Given its success it has been applied to face recognition problem in [7], and later in 3D face recognition. In fact, LBP is now widely used on depth images [8, 11], performing very well both from precision than performance perspectives. Moreover, LBP’s versatility allowed building several variants. In [18] it has been introduced the Local Normal Binary Pattern (LNBP) that uses normals angle instead of depth values. 3D-LBP [20] works on a mesh computing the code using two kinds of values, one is the depth values and the other is the angle between normals of vertex of the mesh. Such approach however, requires an elaborated processing on the mesh in order to obtain the neighborhood of a central vertex. Moreover, 3D-LBP does not support multiple scale resolution like other previous LBP variants.

Finally, there is the group of Multi-modal 2D-3D approaches. Multi-modal solutions aim to combine different processing paths, usually 2D and 3D, into a single framework in order to overcome criticisms of individual approaches. In [6] Principal Component Analysis (PCA) is applied to depth images and standard images separately, then the outcomes are combined to get the final result. In [13] Iterative Closest Point (ICP) is used to register the 3D face model, and combined with Linear Discriminant Analysis (LDA) applied to the 2D image to avoid illumination and pose variation problems. Finally, [15] performs face registration, to avoid pose variations, region segmentation, to account local geometry changes, a filtering of the scans using SIFT and 3D Spherical Face Representation (SFR), and then a region wise matching with the remaining faces focusing on region robust to expression distortions.

2 Mesh-LBP

Our reference work generates Local Binary Patterns (LBP) over a real 3D support represented by triangular mesh manifold. In fact, LBP has been recently refiled in [21, 23]. Since its definition [17] and its simplest application in face recognition [1], LBP is an 8-bit code obtained comparing pixels’ values inside a \(3\times 3\) window; the outcome of this comparison can be 1 or 0, whether the difference with neighbors’ values is grater or less than zero. This pattern can be extended at different scales by changing the windows dimension and adopting circular neighborhoods at different radii.

In [23], the LBP idea has been broadened to 2D-mesh manifolds implementing power and elegance of LBP on a real 3D support.

Instead of pixels, the mesh is composed by facets. In order to obtain an ordered ring around a generic central facet \(f_c\), the algorithm searches adjacent facets \(f_{out}\) and iteratively concatenate them as shown in Fig. 1. In such elegant way, it is now possible to generate a ring-like pattern at different radius scales. In fact, a new sequence of ordered \(f_{out}\) facets on the ring outer corner can be extracted allowing the ring construction procedure to be iterated (as shown in Fig. 2), generating concentric rings around the initial central facet \(f_c\).

Fig. 1.
figure 1

From left to right the rings construction pipeline for mesh-LBP framework: starting from the adjacent facets, then the construction of the ring [22].

Fig. 2.
figure 2

Concentric rings construction sequence [22].

The concentric rings generated form an adequate structure for Local Binary Pattern computation. The mesh-LBP operatorFootnote 1, around a generic central facet \(f_c\), is defined as:

$$\begin{aligned} meshLBP_m^r(f_c)&= \sum _{k=0}^{m-1} s\left( h\left( f^r_k\right) - h\left( f_c\right) \right) \cdot \alpha (k)\;, \\ \nonumber \text {with}\quad s(x)&= \left\{ \begin{array}{clcr} 1 &{} x \ge 0\\ 0 &{} x < 0 \end{array} \right. \end{aligned}$$
(1)

where parameters r and m control respectively the radial resolution and the azimuth quantization (see Fig. 2). Furthermore, a function \(\alpha (k)\) has been introduced to derive different LBP variants. In this work two variant have been studied:

  • \(\alpha _2(k)=2^k\), as originally suggested in [17];

  • \(\alpha _1(k)=1\), to obtain a simplified form that sum the binary pattern digits.

In Sect. 4 we will refer to these two function with \(\alpha _2\) and \(\alpha _1\) respectively. h(f) function can be any desired feature; it can represent shape or appearance information, depending on the feature used. For example, as shape descriptor a geometric feature can be extracted from the mesh surface, such as mean curvature or curvedness, rather than gray level values to represent appearance information. Such photometric values come from 2D flat images, acquired with standard cameras, and subsequently projected over the mesh using a mapping scheme embedded in the mesh itself.

3 Fusion Schemes

In order to proceed, a brief description of Face Recognition pipeline has to be presented. Mesh-LBP framework presented in [23] can be summarized in 5 main steps:  

Features extraction, :

since a mesh manifold is a structure, some features have to be extracted in order to describe the shape of the mesh surface.

Local Binary Pattern computation, :

applying Eq. 1 using the features beforehand extracted as input data.

3D grid construction, :

a grid is constructed and projected on the mesh manifold focusing on some stable region of the face.

Histograms computation and concatenation, :

for each point of the grid, a region is defined and an histogram computed inside it; the concatenation of all the region histograms form a signature for the examined face scan.

Face matching, :

checks differences between probe scan and a defined gallery.

 

As this framework operates at different level over the same structure, it is possible to perform descriptors fusion at each level of the pipeline. In [22] has been shown how a simple score fusion, between geometric and photometric descriptors, fits or sometimes even outperforms the state of the art [4, 10]. Furthermore, it presents two fusion schemes at histograms computation level: one concatenates two different histograms derived from geometric and photometric features (region histograms concatenation); while the other one counts the co-occurrences of the two features (2D-histogram). Such fusions show the potentiality of climbing the face matching pipeline to merge different descriptors.

The idea proposed in this paper is to do a step forward and make the fusion at Mesh-LBP computation level. Even if the results displayed in [22] show high accuracy rate, the histograms fusion introduces an increment of the face descriptor size. In fact, the more simple region histograms concatenation doubles the original histogram size, while 2D histogram, that adds one dimension to the standard histograms, sees a geometric increment of size. Instead, if the fusion is performed during, or even before, the mesh-LBP computation, it is possible to use both geometric and photometric data, keeping dimension and size equal to a single descriptor. Our aim is to produce a descriptor that holds the same size obtained with a single feature, but the information of two features (shape and appearance in our caseFootnote 2).

3.1 Early-Fusion

In this paper two kinds of early-fusion are presented. The first is a very basic fusion scheme that use logic operators (AND, OR and XOR). In order to get the LBP code, such operators have been added to the original formula:

$$\begin{aligned} meshLBP = \left\{ \begin{array}{l} AND(s_g(x),s_p(x))\\ OR(s_g(x),s_p(x))\\ XOR(s_g(x),s_p(x)) \end{array} \right. \end{aligned}$$
(2)

where \(s_g(x)\) and \(s_p(x)\) are computed as s(x) in Eq. 1 respectively for geometric and photometric information.

Fig. 3.
figure 3

Graphic comparison between standard ordering of a ring with single descriptor and the interleaving scheme with two descriptors.

In the second variant the mesh-LBP pattern is generated replacing the single feature function h(f), shown in Eq. 1, with a combination of extracted features \(h_g(f)\) and \(h_p(f)\). In particular, such new descriptor, named \(h_{g,p}(f)\), is composed by interleaved values from geometric and photometric data, respectively \(d^g\) and \(d^p\) (Fig. 3). For example, for an azimuth quantization \(m=12\), the \(h_{g,p}(f)\) sequence would be

$$\begin{aligned} h_{g,p}(f)= d_1^g, d_2^p, d_3^g, d_4^p, d_5^g, d_6^p, d_7^g, d_8^p, d_9^g, d_{10}^p, d_{11}^g, d_{12}^p \end{aligned}$$
(3)

Successively, the mesh-LBP code is obtained from the new combination \(h_{g,p}(f)\) applying Eq. 1 (Fig. 4).

From now on, these two variants will be referred to as Logic Fusion AND/OR/XOR and Interleaving Fusion respectively.

Fig. 4.
figure 4

Visual representation of early-fusion mesh-LBP code generated with radius \(r=4\) and azimuth quantization \(m=12\) in both \(\alpha _1\) and \(\alpha _2\) variants.

4 Experimentation

Experiments have been conducted on Bosphorus database [3], that is composed by 4666 scans of 105 subjects scanned in different poses, action units, and occlusion conditions. In addition to the shape structure, represented as mesh manifold, the database contains bitmap images of the scanned subject to provide appearance information as well. Since the aim of the project is to build a new LBP-like descriptor that can embed the strong points of a 3D environment, we did not focus on the matching algorithm. A naive template-matching-like method has been used, where each face probe descriptor is compared with a reference gallery using \(\chi ^2\) distance.

Comparing our results with [22], the same features have been chosen to be merged. In particular in Table 1 we show results obtained using the mean curvature to represent shape information, and the graylevel, got from the bitmap mapped on the mesh surface, for the appearance.

Table 1. Overall outcomes of Bosphorus database showing the accuracy of Logic operators (AND, XOR, OR) and Interleaving scheme compared with [22] single descriptors mean curvature (H) and graylevel (GL).
Table 2. Histogram sizes (per region) for each variant reported in Table 1 in number of bins.

Results from logic fusions show an accuracy rate close to the original single descriptor. Even if the size of logic descriptor is equal to a single one, the outcomes are not satisfying: this scheme shows a decrease in its descriptive power respect to what has been achieved in our reference paper. In fact, logic operators seem to annihilate the mutual information provided by the couple of features.

Interleaving scheme, instead, preserves the descriptive power of both geometric and photometric information, outperforming single descriptor precision and above mentioned histograms fusions. In particular, \(\alpha _1\), even if a bit lower in accuracy compared with \(Fusion_1\) and \(Fusion_2\) schemes, sees a drastic decrease of descriptor size (Table 2): half respect to region histograms concatenation schema, and even of the order of root square respect to the 2D-histogram (13 times smaller). \(\alpha _2\), instead, does not only keep the same size of single feature histogram, but also outperforms the region histograms concatenation fusion scheme.

The effectiveness of Interleaving early-fusion approach become clear if we think that 2D-histogram fusion scheme, cannot be computed for \(\alpha _2\), that is the original LBP variant. In that case, in fact, the 2D-histogram would have had \(1125 \times 136 = 153000\) bins instead of the 1125 of our proposed fusion scheme.

5 Conclusion

In this paper a novel early level fusion approach for actual 3D face recognition has been presented. The proposed method exploits mesh manifold potentialities as support structure. In particular, we extended mesh-LBP, a framework that enables to generate LBP-like codes directly on a triangular mesh. Our aim is to fuse different features during, or even before, the LBP descriptor computation. For this purpose logic operators and interleaving schemes have been used to generate a pattern comprehensive of photometric texture and geometric shape information. The experimentation, conducted on Bosphorus database, shows promising results, raising the curtains on the potentiality held by early feature fusion among real 3D support, like mesh manifolds. It is in fact now possible to consider more refined early-fusion techniques directly employed on a mesh manifold. In this manner, we can hold the descriptive power of two, or even more, descriptor, improving performances without increasing the descriptor size.