Keywords

1 Introduction

As a biometric identification technology, automatic fingerprint recognition is widely used in judicial, government, commercial and financial fields because of its advantages such as easy access, strong operability and high reliability. Automatic fingerprint identification system (AFIS) [1] generally includes: fingerprint acquisition, image enhancement, feature extraction, matching and other parts. Since the 1990s, algorithms of each part of AFIS have been continuously improved [2,3,4].Due to the importance of fingerprint orientation, a large number of scholars have conducted research in this field to improve the accuracy of fingerprint recognition.

One of the commonly used methods is a gradient-based algorithm, which performs a difference operation on the latent image. Therefore, it is very sensitive to image quality. Hong et al. [5] improved this method. They proposed to filter the directional field with a low-pass filter while correcting the isolated wrong direction. Another is the model-based approach. This method mainly uses the global constraints to model the orientation field mathematically. Sherlock et al. [6] proposed a zero-pole model that models the fingerprint orientation field based on the location of singular points. However, this method fails when there is no singularity in the fingerprint.

A few dictionary-based approaches have been proposed to improve latent orientation field estimation. Feng et al. [7] proposed a novel fingerprint orientation field extraction algorithm based on prior knowledge of fingerprint structure. The dictionary is constructed using a set of ground truth orientation fields, and the compatibility constraint among neighboring orientation patches. The dictionary-based approach has better generalization ability than the model-based approach, but its performance relies on large and diverse dictionaries, and results in higher computational cost.

Recent years, deep learning has made remarkable achievements in the field of pattern recognition. Convolutional neural networks (CNN) are widely used in image classification, object recognition, object detection and other fields [8,9,10]. Cao et al. [11] proposed a learning-based approach to classify the orientation field of a latent patch as one of a set of representative orientation patterns using a ConvNet. However, the pattern set’s quality is directly affected by the quality of database. In 2017, Yao et al. [12] proposed an end-to-end deep convolutional network combining domain knowledge and the representation ability of deep learning. In terms of orientation field, a classification network based on DeepLab v2 [13] is adopted. This pipeline achieves better results with expert network-marked labels, but it still meets with difficult in convergence and is easy to drop into local optimal solution.

Inspired by abundant achievements on semantic segmentation [14] in recent years, we propose an effective orientation extraction framework for latent fingerprint. Considering the poor quality of latent images, we first design preprocessing method combining local total variation (LTV) decomposition, band-pass filter and Gabor filter on latent fingerprints so that input condition of the network is improved. Processed images are passed to the proposed Convolutional Neural Network for high accuracy orientation field prediction. Experimental results on test database proves that proposed algorithm system defeats state-of-the-art fingerprint orientation estimation algorithms.

The contributions of this paper are summarized as follows:

  1. 1.

    A new algorithm system specific for fingerprint orientation estimation consisting of preprocessing and deep neural network part. Domain knowledge and the generalization ability of network are combined in this system.

  2. 2.

    Effective preprocess to enhance the potential ridge structure of poor quality fingerprints by specially designed algorithm combination.

  3. 3.

    A novel deep regression neural network(DRNN) is proposed, with higher accuracy, faster training speed and less difficulty during convergence.

  4. 4.

    A new structure sources from traditional boosting algorithm is introduced into proposed DRNN, solving label discontinuity problem and significantly improve network performance.

2 Proposed Method

2.1 Methods Overview

The basic idea is to build an algorithm system specific for fingerprint orientation estimation. Recent years, many works [12, 19, 20] show the necessity and tendency of combining domain knowledge of traditional image algorithms with deep learning. Along this way, we propose an algorithm consists of preprocessing part and full convolutional network part. Firstly, preprocessing part is introduced, which roughly extracts effective information of input images with designed traditional method combination, including cartoon-texture decomposition and Gabor filtration. Secondly, we discuss how to construct a deep neural network predicting the partial orientations, and make full use of preprocessed fingerprints (Fig. 1).

Fig. 1.
figure 1

The block diagram of the proposed method.

2.2 Latent Fingerprint Preprocessing

Firstly, the LTV model, a nonlinear filter pair which retains both the essential features of Meyer’s models and the simplicity and rapidity of the linear model, is used to decompose images. Then, a Log-Gabor filter [15] is utilized to enhance the potential ridge structure in marked ROI. Each latent image is divided into non-overlapping blocks of 64 × 64 pixels. In order to avoid the edge effect of the filter, only 16 × 16 pixel in the center of the block is taken after filtering. In the frequency domain, two-dimensional Log-Gabor Transfer function is defined as two parts:

$$ {\text{G}}\left( w \right) = exp\left( { - \left[ {ln\left( {w/w_{0} } \right)} \right]^{2} /2\left[ {ln\left( {k/w_{0} } \right)} \right]^{2} } \right) $$
(1)
$$ {\text{G}}\left( \theta \right) = exp\left( { - \left( {\theta - \theta_{0} } \right)^{2} /2\sigma_{\theta }^{2} } \right) $$
(2)

The final Gabor filter can be obtained as follow:

$$ {\text{G}}\left( {w,\theta } \right) = G\left( w \right) \cdot G\left( \theta \right) $$
(3)

Since the center frequency of Log-Gabor filter needs to be determined in advance, an automatic optimization method is used to find the appropriate frequency iteratively. Then, a set of 12 directional filters is generated which is used to obtain the responses in 12 directions, where two orientations with the highest responses are selected. Finally, the enhanced blocks are combined to generate the whole enhanced latent.

2.3 Deep Regression Neural Network

DRNN.

Fingerprint orientation estimation can be regarded as a pixel level segmentation question after down-sampling. Instead of widely used classification networks for image segmentation [14], a deep regression neural network (DRNN) has been designed in this work. Outputs of the network are directly the predicted angles, allowing continuous value of estimation. Meanwhile, we find that with small sample and relatively large category quantity, it’s hard for classification networks to convergence in practice. This is probably because in a segmentation network, the last layer divides every pixel into different classes. Structure of this layer can be regarded as an aggregation of classification outputs. The aggregation is much sparser than that of a single classification network. Suppose the aggregation’s size is 20 * 20 * 90 (which is the condition in our network during training), then there will be 20 * 20 = 400 1 s in the aggregation, and 20 * 20 * 89 = 35600 zeros. Positive samples are far less than negative ones, which is demonstrated in Fig. 2. But a regression network has dense outputs. It can perfectly avoid this problem. Right one in Fig. 2 shows the loss decrease rate by training iteration of DRNN and classification network. Two networks are the same in backbone structures, and the only difference is the final layer.

Fig. 2.
figure 2

(a) Demonstration of classification layer of segmentation network. It can be seen that positive samples (red) are much less than negative ones (blue), causing sample imbalance during training process. (b) Convergence curve of classification network (top) and regression network (bottom) with the same structure except the final output layer. X axis is number of training iterations and y axis is normalized loss. It can be seen that regression network’s convergence speed is much faster. Loss of regression network tend to be stable after 800 iterations while that of classification network still keep on descending after 1800 iterations. It should be noticed that the final magnitude of losses don’t represent two networks’ performance (Color figure online).

Boosting Structure.

Anew structure sources from traditional boosting algorithm is introduced into proposed DRNN’s output part. Boosting is a general machine learning algorithm for improving the accuracy of any given learning algorithm [16]. Boosting algorithm requires different kinds of weak learning machines, and then fuse the output of all learning machines together with a certain strategy. As a result, boosting algorithm solves the problem of discontinuity around 0° and produce a much more accurate output.

Our expected outputs are angles range from 0 to 180°. Angles near 0 and those approaching 180° are continuous in physical meaning but have a huge gap in scale, which causes mutations in labels, as displayed in Fig. 3. This is the problem of discontinuity around 0°, and labels around 0° in physical meaning are called bad zones in rest of this paper. Convolutional layers have the property of smoothing neighborhood outputs, after which bad zone outputs will deviance. As shown in left one in Fig. 3, labels nearer to bad zone result in larger deviation in outputs. Output nearly changes 90° when label is close to 0°. For this reason, if the regression result is directly taken as final output, the proposed DRNN will be a weak learning machine in this situation. In this work, boosting algorithm is introduced to upgrade this weak learning machine.

Fig. 3.
figure 3

Illustration of label discontinuity around 0° caused by angles’ definition (left). Cliff-type descent is observed, which is extremely harmful for network performance. Example of network outputs with single pass way (middle) and after using boosting structure (right). It’s clear that predictions biasing for around 90° in the middle are corrected by boosting structure.

Instead of only one layer of outputs (angles), the network has been adjusted to 3 the same pass ways, but each pass way has a different 0° definition. Figure 4 shows degree definitions of three pass ways, in which definitions for pass way 2 and 3 can be transformed from pass way 1 by (4) and (5). After this process, bad zone of 3 pass ways will not overlap.

Fig. 4.
figure 4

Demonstration of angles’ definitions in 3 pass ways. Angles increase along the counterclockwise. The last two pass ways’ 0 and 180° are defined as the first one’s 60° and 120° respectively.

$$ x2^{ '} = \left\{ {\begin{array}{*{20}c} {x2 + 120,x2 < 60} \\ {x2 - 60,x2 \ge 60} \\ \end{array} } \right. $$
(4)
$$ x3^{ '} = \left\{ {\begin{array}{*{20}c} {x3 + 60,x3 < 120} \\ {x3 - 120,x3 \ge 120} \\ \end{array} } \right. $$
(5)

Outputs of three pass ways are first reversed to normal definition, and output 1, 2, 3 are single results of three pass ways at the same position respectively. Then output strategy of this network is: if difference of output 1 and 2 is less than 10°, output is the average of first 2 output channels, or output will be the last one. Kindly sacrificing the simplicity of network, bad zones’ impact have been eliminated, causing large improvement in output accuracy. Detailed data is displayed in experiments section.

Network Architecture.

In practice, images of fingerprints are different in size and aspect ratio, so a full convolutional network has been proposed for this task. The first part of the network are 3 Conv-ReLu blocks. Instead of pooling, a Conv layer with stride 2 is used in each block to compress the variables, totally 8 times down-sampling. This is because pooling layers can create an invariance to small shifts and distortions [17], which is advantage in object detection tasks, but this task is sensitive to partial rotation. According to the results in [18], kernel size of the first part has been adjusted to 7 * 7, 5 * 5 and 3 * 3 respectively (Fig. 5).

Fig. 5.
figure 5

Detailed network architecture. Pooling layers are replaced by striding 2 convolutional layers. Each passing way generates area information in two different scales. Three pass ways are the same in kernel sizes, consisting a whole boosting structure.

Second part of the network used ASPP [13] layers of the same size in 3 parallel passing ways. In each passing way 2 atrous convolutional layers have been deployed with different sample rates. Both layers’ feature maps are fused together. The final layer is the direct overlap of three pass ways’ output. Implementing boosting algorithm, predicted orientation field is produced.

Label, Loss Function and Training.

As second part of network has three pass ways, labels are also transformed to match the designed regression results. Instead of traditional quadratic error between label and regression results, the loss function is defined as:

$$ loss = \frac{1}{NM}\sum ((1 - \left( {20 \cdot labels - 1)^{2} } \right) \cdot 100 \left( {new\_labels - reg\_result)^{2} } \right) $$
(6)

Where N is size of output orientation field, M is batch size, \( reg\_result \) represents the regression result of network’s second part, labels is original labels, and new_labels means transformed labels. According to scale of loss, scale of labels can be adjusted by multiplying a constant. To some extent, DRNN’s convergence speed can be controlled in this way. After experiments, rather than [0,180), we found smaller labels mapped into range [0, 0.01) help the network to convergence much easier. To improve the accuracy of results, a weight \( ((1 - \left( {20 \cdot labels - 1)^{2} } \right) \) is added to the loss function, thus bad zones get ignorable weights. We don’t care what bad zones predict and only consider the accuracy of effective areas. To speed up training process and improve network performance, input images are all normalized and masked at first.

After reversing the regression result to one channel using the method in boosting structure, accuracy is defined like:

$$ accuracy = 1 - \frac{{\sum \left| {labels - output} \right|}}{N} $$
(7)

In training process, to increase samples’ number, we segmented the training images into overlapping 160 × 160 blocks. Latent fingerprints are straightly used as inputs. Labels’ quality were worse than library fingerprints, but fingerprints’ patterns were the same with required inputs. In testing process, we used test images directly as input because the input size of our system are not constrained, and impact of edge effect can be eliminated.

3 Experiments

3.1 Database

Database used in this paper is collected by Beijing Hisign Technology Co., Ltd, winner of FVC-Ongoing 2017. Fingerprints are divided into 2 groups: library fingerprints and latent fingerprints, every latent image has its matched library fingerprint image, totally 2164 pairs. 500 pairs are made into testing samples and the rest are used for training. Each latent fingerprint is 512 × 512 pixels in size and 500 ppi in this paper, and library fingerprints are 640 × 640 pixels and 500 ppi. Latent images’ orientations are to be detected and used to enhance input latent fingerprints. Lacking of ground truth orientation information, labels are produced by fingerprint recognition SDK of Beijing Hisign Technology Co., Ltd.. Library fingerprints’ labels are more accurate, while latent images’ output labels will include more mistakes.

3.2 Identification Performance

To test the quality of our output orientations, an objective comparison with other methods is made. Gabor-based algorithm extracts orientation field on Gabor phase. Template -based algorithm extracts orientation fields by first clustering label block templates, then classifying fingerprint blocks into templates with a learned deep learning network. FingerNet is re-trained and tested using the same data set with ours. FingerNet extract orientation field with a learnt fully convolutional network based on DeepLab v2. As our labels were collected using Hisign SDK, SDK’s performance is also considered. After getting the output orientation fields of each methods, the same reinforcement method has been used to fuse the orientation information and latent fingerprint images. Finally, Hisign SDK was used to get the matching accuracy of each method. Results are shown in Table 1.

Table 1. Matching results of each method on testing dataset

The Cumulative Match Characteristic (CMC) curves of above seven methods on 500 latent images are shown in Fig. 6. Following the control variable principle, FingerNet(yellow) is re-trained and tested using the same data set with ours. For more convenient comparison, the results of some methods are placed separately in another figure shown below. Thus, we can see the trend of the curve of fingerprint recognition rate clearly.

Fig. 6.
figure 6

Identification performance (CMC curves) of different algorithms on all 500 latents.

The results show our method made an accuracy of over 85% in top 1 matching test, which is undoubtedly better than Gabor or masking method. The result is also 1.6 percent higher than the result of FingerNet’s outputs. Boosting algorithm and preprocess make clear contribution to the improvement of output quality. Comparing with SDK’s result, our method get some increase in accuracy, which means the network has the ability of generalization and corrects some mistakes made by SDK.

Figure 7 shows threshold-recall curves of proposed method and FingerNet. Recall is defined as proportion of test images with average angular precision higher than threshold. It shows that proposed method gets results closer to labels than FingerNet. Figure 8 compares the orientation fields from top 2 algorithms on latent fingerprints visually while the original latent image is also given. We observe that the proposed algorithm outperforms the other algorithms on latent fingerprints.

Fig. 7.
figure 7

Threshold-recall curves of proposed method and FingerNet.

Fig. 8.
figure 8

Result comparison of different methods on different cropped latents shown in column (a). Original fingerprints (b) Orientation fields obtained by proposed method and (c)–(d) enhancement images obtained by proposed method and SDK method.

4 Conclusion and Future Work

We propose a whole system to produce more accuracy orientation fields of latent fingerprints, including preprocess and orientation estimation. This system has combined domain knowledge got from preprocess and contextual information generated by deep learning method to outperform other orientation estimation algorithms. For better and faster training of the network, not classification but regression network was designed to get the output orientation field. To eliminate error in bad zones, boosting algorithm and new structure is adopted in network design.

Future work will include (1) integration of the whole system, (2) optimization of the network and preprocess, (3) extending this system to reinforcement and matching.