1 Introduction

Finger-vein recognition technology uses the texture of the finger-vein to perform identity verification, which is harmless and difficult to be forged. It is relatively easy for the acquisition of finger-vein image, and the recognition process is user-friendly. Therefore, the finger-vein recognition technology can be widely applied to the access control system in the fields of banking finance and government agencies.

Finger-vein is distributed below the skin with complex shape. The morphology of finger-vein is the result of the interaction of human DNA and finger development. Different fingers in the same person have different morphologies. These biological properties guarantee the uniqueness of the finger-vein. It also laid a solid biological foundation for the development of finger-vein biometrics.

Typically, finger-vein images are captured by near-infrared (NIR) light in a transillumination manner [1]. During the process of transmission, the NIR light is absorbed by hemoglobin flowing in the venous blood [2]. Then the finger-vein image with light and dark vascular lines is formed. The quality of the finger-vein images is very poor due to the attenuation of light in tissues [3]. Therefore, it is often difficult to extract reliable finger-vein features directly from original finger-vein images [4].

In some cases, finger-vein images may have irregular incompleteness due to external factors, like spots or stains on fingers, when capturing finger-vein images, as shown in Fig. 1. Hence, it is a common phenomenon that vascular networks are incomplete in the finger-vein images.

Fig. 1.
figure 1

Finger-vein images with spots or stains.

For the accurate feature extraction, it is an important topic to generate a realistic finger-vein vascular network based on the obtained finger-vein image. As far as we know, there is a few works to deal with the incomplete finger-vein collection, which motivates our work.

Recently, Convolutional Neural Networks (CNNs) have been widely applied in computer vision, especially in the field of image classification and image generation [5]. They also could be used to solve the problem of image inpainting and reconstruction. The finger-vein images with spots or stains belongs to the problem of image inpainting. Therefore, CNNs-based can also use in the finger-vein images inpainting and reconstruction. In [6], a multi-scale neural patch synthesis method is proposed, which achieves better performance for high-resolution image inpainting on the ImageNet dataset. In general, to achieve reasonable inpainting results, a lot of images are needed to train models. In [7], an image inpainting method based on contextual attention is proposed, which is very effective on large datasets such as the CelebA faces dataset. However, our dataset does not have so many images for training models. In addition, low-resolution grayscale images can affect the inpainting result. [8] proposes a context encoder approach for image inpainting using the combination of the reconstruction (L2) loss and adversarial loss [9]. Nevertheless, for the inpainting of the finger-vein image, blurred without smooth edges of vein is generated.

In this paper, inspired by these methods, we propose an inpainting scheme for finger-vein image with spots or stains. The detailed presentation of the proposed scheme as follows. First, the combination of Gabor filter and Weber’s low are used for image enhancement by removing illumination variation in finger-vein images. Second, we design a novel finger-vein image inpainting frame based on an encoder-decoder network. Finally, different loss functions are used to optimize the inpainting frame. Experimental results show that the proposed method can achieve better performance in finger-vein image inpainting of irregular incompleteness.

2 Finger-Vein Image Acquisition

Finger is the most flexible part of the human body. Finger-vein images can be captured by placing the finger into imaging device. To obtain finger-vein images, we have designed a homemade finger-vein image acquisition device [10], as shown in Fig. 2(a). The device uses a NIR light to illuminate a finger. A vascular network of finger-vein is acquired by image sensor.

Extraction of ROI regions is essential for improving the accuracy of finger-vein recognition. We employ the effective method proposed in [11] to locate the ROIs from finger-vein images, as shown in Fig. 2(b). Some finger-vein ROIs of the same collector are listed in Fig. 2(c).

Fig. 2.
figure 2

Image acquisition.

The homemade dataset is included 5,850 grayscale images of finger-vein, which are commonly used for biometric recognition. The ROIs of captured finger-vein images are resized to \(91\times 200\) pixel. We enhance the grayscale images and resize them to \(96\times 192\) pixel. Most of the finger-vein images are complete, and only a few are incomplete during the acquisition process. The imbalanced class distribution can destroy the training of the model. Therefore, we have manually added some samples of finger-vein images with spots or stains, as shown in Fig. 3. They are incomplete images of finger-vein with square-region, single irregular-region and multiple irregular-region. These incomplete situations need to be reconstructed in the experiments. The encoder-decoder network is trained to regress the corrupting pixel values and reconstruct them as complete images.

Fig. 3.
figure 3

Finger-vein image with spots or stains.

3 Method

3.1 Image Enhancement

In NIR imaging, finger-vein images are often severely degraded. This results in a particularly poor separation between veins and non-venous regions (see Fig. 4(a)). In order to reliably strengthen the finger-vein networks, finger-vein images need to be effectively enhanced. Here, a bank of Gabor filters [12] with 8 orientations and Weber’s Law Descriptor (WLD) [13] are combined for venous region enhancement and light attenuation elimination (see Fig. 4(b)). The Gabor filter is a linear filter for edge extraction, which is very suitable for texture expression and separation. This paper uses 8 orientations of Gabor filter to extract features. The WLD is used to improve the robustness of illumination.

Fig. 4.
figure 4

The results of image enhancement.

3.2 Image Inpainting Scheme

The finger-vein images inpainting scheme of the incomplete information can be achieved by four steps. First, finger-vein images with spots or stains are fed into the encoder as input images. The region of spots or stains are represented by larger pixel values in order to appear more apparent. And latent features are learned from the input images. Second, the learned features are propagated to decoder through a channel-wise fully-connected layer. Third, the decoder uses these features representation to obtain the image content of spots or stains. The output images of the encoder-decoder network are generated with the same size as the input images. Finally, the inpainting images are optimized by comparing with the ground-truth images. Figure 5 presents the overall architecture for the proposed image inpainting scheme.

Fig. 5.
figure 5

The overall process of finger-vein image inpainting.

Encoder-Decoder Generative Network. Figure 6 shows an overview of our encoder-decoder generative network architecture. The encoder-decoder generative network consists of three blocks: encoder, channel-wise fully-connected layer and decoder. The encoder is derived from AlexNet architecture [14]. The effect of encoder is to compress high dimensional input data into low dimensional representation. The encoder block has five convolutional layers using \(4\times 4\) kernels. The first convolution layer uses a stride of [2, 4] to reduce the spatial dimension. And a square feature of \(48\times 48\) is obtained. The following four convolutional layers use a stride of [2, 2]. Given an input image of size \(96\times 192\), we use the first five convolutional layers to compress the image into feature representation of \(3\times 3\times 768\) dimension. The channel-wise fully-connected layer is a bridge between encoder features and decoder features propagated information (see Fig. 7). The decoder is the final function of training encoder-decoder. It reconstructs the input image using five convolutional layers. The feature representation of \(3\times 3\times 768\) dimension abstracted by the encoder use five up-sampling layers to generate an image of size \(96\times 192\).

Fig. 6.
figure 6

Overview of our basic encoder-decoder generative network architecture.

Fig. 7.
figure 7

Connection between encoder features and decoder features.

3.3 Loss Function

There are usually multiple ways to fill an image content with spots or stains. Different loss functions result in different inpainting results. Optimizer minimizes the loss between the inpainting images and the ground-truth images. Proper loss function makes the inpainting images very realistic and maintain the consistence with the given context. In this paper, we employ L1 loss to train the proposed finger-vein image inpainting model. In [6], L2+adv loss function has achieved better performance in the field of image inpainting. The comparative experiments use L2 loss function, joint L2 loss with adversarial loss, and joint L1 loss with adversarial loss the same way as the Context Encoder [7]. For each training image, the L1 and L2 loss is defined as:

$$\begin{aligned} L_{L1}(G)={E}_{x,x_{g}}[||x-G(x_{g})||], \end{aligned}$$
(1)
$$\begin{aligned} L_{L2}(G)={E}_{x,x_{g}}[||x-G(x_{g})||^2], \end{aligned}$$
(2)

where x, represents the ground-truth image, \(x_{g}\) denotes a finger-vein image with spots or stains, G denotes encoder-decoder generative network, \(G(x_{g})\) represents the generated inpainting image.

The adversarial loss is defined as:

$$\begin{aligned} L_{adv}(G)={E}_{x_{g}}[-\log [D(G(x_{g}))+\sigma ]], \end{aligned}$$
(3)

where D is an adversarial discriminator, which predicts the probability that the input image is a real image rather than a generated one, and \(\sigma \) is set to a small value in case the logarithm of the true number is equal to zero.

The joint L2 loss with adversarial loss is defined as:

$$\begin{aligned} L=\mu {L}_{L2}(G)+{1-\mu }L_{adv}(G), \end{aligned}$$
(4)

The joint L1 loss with adversarial loss is defined as:

$$\begin{aligned} L=\mu {L}_{L1}(G)+{1-\mu }L_{adv}(G), \end{aligned}$$
(5)

where \(\mu \) is the weight of the two losses, which is used to balance the magnitude of the two losses in our experiments.

3.4 Evaluation

Peak Signal to Noise Ratio (PSNR), a full reference image quality evaluation index, which is used to calculate the peak signal-to-noise ratio between the ground-truth image and the inpainting image.

$$\begin{aligned} MSE=\frac{1}{H*W}\sum _1^H\sum _1^W(X(i,j)-Y(i,j))^2, \end{aligned}$$
(6)
$$\begin{aligned} PSNR=10\lg \frac{(2^n-1)^2}{MSE}, \end{aligned}$$
(7)

where X presents the ground-truth image, Y presents the inpainting image; H and W respectively are the height and width of the image; n is the number of bits per pixel, which is generally taken as 8, that is, the pixel grayscale number is 256.

We report our evaluation in terms of mean L1 loss, mean L2 loss and PSNR on test set. Our method performs better in terms of L1 loss, L2 loss and PSNR during the experiment.

4 Experiments

We evaluate the proposed inpainting model on homemade dataset. This dataset includes 5850 finger-vein images, 5616 for training, 117 for validation, and 117 for testing. Our encoder-decoder generative network is trained using four different loss functions respectively to compare their performance. For these loss functions, the parameters of encoder-decoder are set in the same way. The four loss functions are: (a) L2+adv loss, (b) L1+adv loss, (c) L2 loss, (d) L1 loss. From top to bottom, we input three images with arbitrary incompleteness as the first row. And we use (a)-(d) to refer to these loss functions. The ground-truth images correspond to the input images are placed in the last row. In the experiment, incomplete information is randomly generated. In the following, the effectiveness of our method is illustrated by images and specific data. In 4.3, the practicality of the proposed method is verified by the finger-vein images with square-region incomplete.

4.1 Single Irregular-Region Incomplete

We use the four methods discussed above to reconstruct finger-vein images with a spot or stain, that is, an irregular incompleteness needs to be reconstructed. The encoder-decoder generative network is trained with a constant learning rate of 0.0001.The inpainting results for single irregular-region incomplete using the four loss functions are shown in Fig. 8. High-quality inpainting results are not only clear on the finger-vein vascular networks but also consistent with surrounding regions, where the finger-vein images are spots or stains at different shapes. In practice, L2+adv and L1+adv loss produce blurred images without smooth edges of veins. The pixel values of vein region are obviously lost based on L2 loss. Compared with other methods, a smooth and complete finger-vein network is generated based on the method proposed in this paper. Table 1 shows qualitative results from these experiments. As shown in Table 1, our method achieves the lowest mean L1 loss and the highest PSNR.

Fig. 8.
figure 8

Performance comparisons use four methods to inpainting single irregular-region incomplete.

Table 1. Numerical comparison on single irregular-region incomplete with four methods.

4.2 Multiple Irregular-Region Incomplete

Similarly, we use the four loss functions to reconstruct finger-vein images with spots or stains, that is, multiple regions need to be reconstructed. The inpainting results for multiple irregular-region incomplete using the four methods are shown in Fig. 9. In practice, methods based on L2+adv and L1+adv loss produce blurred images without smooth edges of veins and pixels gathered together without any rules. Also, L2 loss makes the original pixel value loss more obvious. However, the inpainting images based on the L1 loss function are close to the ground truth images than the other methods. As Table 2 shows, the PSNR value is higher than the other methods. These results mean that the proposed method can achieve higher similarity with the ground-truth images than the other methods.

Fig. 9.
figure 9

Performance comparisons use four methods to inpainting multiple irregular-region incomplete.

Table 2. Numerical comparison on multiple irregular-region incomplete with four methods.

4.3 Square-Region Incomplete

The practicality and effectiveness of the proposed method is verified by the finger-vein images with square-region incomplete. Here, we also use the four methods to reconstruct finger-vein images with square-region incomplete. The inpainting results for square-region incomplete using the four loss functions are shown in Fig. 10. We can see that the results of using our proposed method are closer to the ground-truth images. Blurred images are generated based on L2+adv loss and L1+adv loss. In addition, the pixel values are seriously lost in the masked vein regions. Both visual images and specific data can show that our proposed method is effective (Table 3).

Fig. 10.
figure 10

Performance comparisons use four methods to inpainting square-region incomplete.

Table 3. Numerical comparison on square-region incomplete with four methods.

5 Conclusion

In this paper, a method for inpainting finger-vein grayscale images with spots or stains based on L1 loss function is proposed. A series of experiments are performed using four methods, and the proposed method is proved to be effective. As a future work, we plan to extend the method to ensure that all indicators are optimal.