1 Introduction

As one of the most convenient and safest identification technologies at present, biometric identification has received more and more attention from the academic community and industry [1,2,3]. Recently, as a new emerging biometric trait for identity authentication, the hand dorsal vein has been proved to possess considerable potential and practical significance, whether as a primary or auxiliary means of identification [4]. The characteristics of hand dorsal vein are considered to be unique and comparable to the retina. Compared with other popular biometric traits, hand dorsal vein recognition as a unique non-invasive biometric authentication has four characteristics: high security, easy-to-use, rapid identification, and highly accurate [5].

The traditional hand dorsal vein recognition is mainly based on the characteristics of the veins, such as the width and direction of the vein. The main processes include image acquisition, preprocessing, feature extraction, and feature matching. Firstly, a hand dorsal vein image is collected by a CCD camera under an infrared beam of 700–1000 nm [6]. Then, a series of preprocessing operations such as filtering are performed to obtain a vein pattern. Finally, Scale Invariant Feature Transform (SIFT), Gabor, Support Vector Machine (SVM), hash coding, and other algorithms are used for feature extraction and matching to obtain recognition results. However, traditional methods are easily affected by the type of database and the external environment so as to not be able to obtain the ideal recognition results. In recent years, deep learning has developed rapidly. Because of its powerful identification capabilities, many researchers have applied deep learning networks to biometrics, especially Convolutional Neural Networks (CNN) [7]. Based on neural networks, the CNN is a feedforward neural network designed for image classification and recognition, which has been successfully used in biometrics such as palmprint recognition [8] and face recognition [9]. Here, CNN is used to identify the hand dorsal veins.

In this paper, a method for recognizing Hand dorsal vein Based on Deep hash network (DHN) [10] is proposed, called HBD. DHN is a deep supervised hashing method integrating deep convolutional neural networks and hash coding. Due to its high precision and high efficiency, DHN is mainly used for large-scale graphic search [11, 12]. In [13], DHN has also been used for palmprint recognition with great success, but the proposed method is a non-end-to-end recognition network. However, HBD is an end-to-end recognition network, which inputs an image and outputs a hash code. First, the hand dorsal vein image after preprocessing is input into the simplified Convolutional Neural Networks-Fast (SCNN-F) [14] to obtain the convolution features. SCNN-F is simpler than CNN-Medium and CNN-Slow architectures, so it is more efficient. At the final fully connected layer, a sgn function is used to convert the output of each neuron to −1 or 1, so that each image is edited as a K-bit hash code. In theory, the more likely the two images are from the same person, the more similar the features are, and the more similar the hash code is. Hence, by comparing the Hamming distance of the hash code between every image pair, it can be judged whether they belong to the same category. The overview of HBD is shown in Fig. 1.

Fig. 1.
figure 1

Overview of our proposed hand dorsal vein identification based on HBD.

The objective of this paper is to further improve the accuracy of hand dorsal vein recognition through deep learning. Experiments are performed on the NCUT (North China University of Technology) [15], GPDS (Digital Signal Processing Group at the University of Las Palmas de Gran Canaria) [16], and NCUT+GPDS databases to evaluate the method. Experimental results show that the performance of HBD can reach the same level as the state-of-the-arts. When the ratio of training and test set is 7:3, the accuracy is higher and the Equal Error Rate (EER) is reduced to 0.08%. The specific contributions of our work are as following:

  1. (a)

    Based on DHN, we proposed HBD for hand dorsal vein recognition. With proper loss and training strategies, HBD can achieve effective results on NCUT, GPDS, and NCUT+GPDS hand dorsal vein databases collected from different devices.

  2. (b)

    The SCNN-F is applied to HBD. The structure of SCNN-F is simpler, with only four convolutional layers. When ensuring the accuracy, HBD has lower storage cost and faster query speed than the other methods based on VGG-Net.

The rest of the paper is organized as follows: Sect. 2 mainly introduces the related work. Section 3 presents the HBD method in detail. The detailed experiments and result analysis are presented in Sect. 4. Section 5 concludes the paper.

2 Related Work

Hand dorsal vein recognition is a new type of biometric technology developed in recent years and has received extensive attention. In terms of theoretical research, currently-used methods for identifying the hand dorsal veins include vein image template matching methods and vein character recognition-based methods. Tang et al. [17] used SIFT to realize vein recognition. In order to simplify the complexity of identifying characteristic matrices, Khan et al. [18] used the Principal Component Analysis (PCA) algorithm to ensure that information was not lost. Lajevardi et al. [19] used a novel algorithm called biometric graph matching (BGM), which extracted the global features of vein images and achieved relatively high accuracy in small and concise templates. Li et al. [20] proposed a modification Pyramid Local Binary Pattern (PLBP) by adding feature weighting, which combined multi-scale PLBP with structure information partition. Li et al. [21] built Width Skeleton Model, taking both the topology of the vein network and the width of the vessel into account.

In recent years, with the development of neural network technology, a large number of methods based on deep learning have also appeared in the field of hand dorsal vein recognition. J. Wang and G. Wang [22] imported the regularized Radical Basis Function (RBF) network into the CNN to realize the recognition task. Li et al. [23] investigated deep learning-based methods on hand dorsal vein recognition, and implemented AlexNet, VGG-Net, and GoogLeNet. Wan et al. [24] trained Reference-CaffeNet, AlexNet, and VGG depth CNN to extract vein image features, and the final recognition accuracy was over 99%.

As to DHN, it is mainly used for large-scale graphic search. Peng and Li [11] proposed a learning method of binary hashing based on DHN to accomplish large scale image retrieval. Song and Tan [12] presented a method to generate multi-level hashing codes for image retrieval based on DHN, and verified the effectiveness over several datasets. Using CNN and supervised Hashing, Cheng et al. [13] proposed a novel learnable palmprint coding representation and achieved satisfactory accuracy.

3 The Structure of HBD

DHN is an end-to-end framework of deep feature learning and binary hash encoding, combining CNN with hashing algorithm [25]. Based on DHN, HBD is also an end-to-end network for hand dorsal vein recognition. In HBD, first, every hand dorsal vein image is input into the neural network. After convolution and pooling operating, the convolution features are extracted and output at the last fully connected layer. Then the output of each neuron at output layer is converted to a code by a certain method. Ultimately, each hand dorsal vein image is converted into a K-bit hash code. The images from the same person are similar in hash codes and the distance between them is short; while the codes from dissimilar people have a big difference. The focus of HBD method is to set the structure of CNN and loss function reasonably.

3.1 The Structure of CNN

For the proposed HBD, the neural network structure has a great influence on the final recognition results. In fact, the efficiency of deep learning has always been a key factor restricting its wider application. For the same sample data, the complex network structure can obtain higher accuracy, but at the same time it will cause a lot of operational burden. In this paper, the Convolutional Neural Networks-Fast (CNN-F) is used as a neural network to obtain convolutional features. CNN-F is simpler than other popular network structures such as VGG-Net, and has been successfully used for palmprint recognition [13]. Due to the limited sample data size, the CNN-F network is simplified to avoid overfitting. The SCNN-F is shown in Table 1. SCNN-F consists of four layers of convolutions and three layers of full connectivity. The last layer has 128 neurons. The activation functions in the first few layers are Rectified Linear Unit (ReLU). In order to achieve coding, tanh function is used as activation function in the last full-connection layer, which ensures the output of neuron is limited to between −1 and 1. Then by using sgn function, the output value is set to −1 or 1. Therefore, every image can be ultimately encoded as a 128-bit hash code.

Table 1. Structures of simplified CNN-F and original CNN-F.

3.2 Definition of Loss Function

In neural networks, the effects and optimization goals of the model are defined by the loss function. On the one hand, in DHN, quantization errors are inevitably generated when sgn function is used for encoding. It is necessary to consider the quantization loss in the loss function. The form of quantization loss can be defined as Eq. (1) [26].

$$ L_{d} = \sum\limits_{i = 1}^{N} {\frac{1}{2}(\left\| {\left. {\left| {h_{i} } \right| - 1} \right\|} \right._{2} )} $$
(1)

Where hi is the encoding result of image gi, \( \left| \bullet \right| \) denotes absolute value operation, 1 is a vector of all ones, and \( \left\| \bullet \right\| \) denotes Ldnorm of vector.

On the other hand, the goal of optimization is that the codes of hand dorsal vein images from the same category are as similar as possible, while those from different classes are far away. Based on this goal, another loss, hash loss, is defined. Referring to the method in [26], for two images, gi and gj, the corresponding hash codes are hi and hj, and the hash loss between them is defined as Eq. (2).

$$ L_{h} (h_{i} ,h_{j} ,r_{ij} ) = \frac{1}{2}r_{ij} D_{h} (h_{i} ,h_{j} ) + \frac{1}{2}(1 - r_{ij} )max(T - D_{h} (h_{i} ,h_{j} ),0) $$
(2)

Where Dh(hihj) indicates the distance between hi and hj, and rij denotes the correlation between image gi and gj. If two images come from the same class, they will have a strong correlation, so rij= 1, otherwise rij= 0. Eq. (2) can be divided into two parts. The former assures that the distance between images of the same type is as small as possible, and the latter assures that the distance between dissimilarities is as large as possible [27]. In order to balance the two-part loss, a threshold T is set to limit the distance between two images. When Dh(hihj) > T, it means that the two images come from different categories, and the loss can be ignored directly. In training, assuming there are a total of N images, the total hash loss is:

$$ L_{h} = \sum\limits_{i = 1}^{N - 1} {\sum\limits_{j = i + 1}^{N} {L_{h} (h_{i} ,h_{j} ,r_{ij} )} } $$
(3)

Therefore, the total loss function contains two parts, quantization loss and hash loss. Two parts of the loss are combined by a weight W, as shown in Eq. (4).

$$ L = wL_{d} + L_{h} $$
(4)

4 Experiments and Results

In order to evaluate the performance of HBD algorithm, we conducted experiments in the NCUT [15] and GPDS [16] hand dorsal vein databases. The NCUT is established by the North China University of Technology, and GPDS database is collected by GPDS group from University of Las Palmas de Gran Canaria, Spain.

4.1 Databases and Preprocessing

  • NCUT database contains three sections, part A, B, and C. Most widely used by researchers, part A contains hand dorsal vein images from 102 individuals, including 50 males and 52 females. Each of them was collected 10 pictures from the right and left hands, respectively. The image in NCUT is a Near-Infrared (NIR) image of 640 × 480 pixels, which contains a complete back of the hand.

  • GPDS database has 1030 hand dorsal vein images collected from 103 people. During the acquisition process, the hand was illuminated by two arrays of 64 LEDs with a wavelength around 850 nm. A cylindrical handle with two pegs for positional reference was used to fix the hand so that the rotation angle was not too big. By a CCD camera with an attached Infrared Radiation (IR) filter, a 1600 × 1200 pixel 8-bit greyscale image of the hand dorsum was acquired [16].

Due to the influence of hand placement angle and noise during acquisition, preprocessing was first performed, mainly including noise reduction and region of interest (ROI) extraction. In this study, the mean and median filters were used to perform noise reduction, and then the maximum inscribed circle of hand region was extracted as the ROI. In the end, each image was uniformly set to 128 × 128 and input into the neural network. As shown in Fig. 2.

Fig. 2.
figure 2

Original image (a), ROI region (b), and extracted ROI (c) in NCUT database; original image (d), ROI region (e), and extracted ROI (f) in GPDS database.

4.2 Experiments and Result Analysis

In the experiments, data samples were divided into two parts: training set (G) and test set (P). The training and test sample size had a great influence on the experiments. Here, the ratio of the number of training and test sets was set to 5:5 and 7:3. In addition, the databases were combined into three forms, including NCUT, GPDS, and NCUT+GPDS, which contain 204, 103, and 307 categories, respectively. During training, the exponential decay learning rate was used, the parameter T was set to 180, and the weight w was set to 0.5. The pre-processed hand dorsal vein image was input into the network described in Chapter 3. After many iterations, the network parameters can be trained to the best.

During testing, each image in the test set was matched with the image of the same class in the training set as a genuine match and with the image of different class as an imposter match. Therefore, for NCUT database, a total of 5100 (5 × 5× 204) genuine matches and 1035300 (5 × 203 × 5× 204) imposter matches were generated when G:P = 5:5, and 4284 (3 × 7× 204) genuine matches and 869652 (3 × 203 × 7× 204) imposter matches when 7:3. For the GPDS database, there are a total of 2575 (5 × 5× 103) genuine matches and 262650 (5 × 102 × 5× 103) imposter matches when 5:5, and 2163 (3 × 7× 103) genuine matches and 220626 (3 × 102 × 7× 103) imposter matches when 7:3. And for NCUT+GPDS database, a total of 7675 (5 × 5× 307) genuine matches and 2348550 (5 × 306 × 5× 307) imposter matches were generated when 5:5, and 6447 (3 × 7× 307) genuine matches and 1972782 (3 × 306 × 7× 307) imposter matches when 7:3. The settings of test set are shown in Table 2.

Table 2. Settings of test set on different databases.

After obtaining the encoded data sets, Hamming distance between the genuine and imposter match was calculated respectively. By setting a threshold, they could be distinguished and the identification was completed. Then, combining the prior knowledge, we tested whether the output results were correct. Finally, the Receiver Operating Characteristics (ROCs) of the test sets were drawn, as shown in Fig. 3. The results of HBD algorithm on different databases are as shown in Table 3. The EERs of the test sets were 0.50% and 0.08% in NCUT, 1.11% and 0.43% in GPDS, and 1.20% and 0.60% in NCUT+GPDS, which proved that the HBD algorithm obtained satisfactory accuracy in the hand dorsal vein recognition. In NCUT, when G:P = 7:3, the accuracy rate reached the highest, and the EER dropped to almost 0. In addition, it can be seen that the accuracy of the GPDS is lower than that of the NCUT. This is because the number of samples in the GPDS is limited and the quality of image is low. At the same time, the performance is also better in the NCUT+GPDS, indicating that HBD can excellently identify the images captured in different devices.

Fig. 3.
figure 3

ROCs of the test set in different databases. a and d in NCUT; b and e in GPDS; c and f in NCUT+GPDS.

Table 3. Results of HBD algorithm on different databases.

Comparing with the State-of-the-Art Methods.

For comparison, we used traditional methods, Iterative Closest Point (ICP) and BGM algorithms, to conduct comparative tests on NCUT database. After the preprocessing of filtering, vein segmentation, refinement, and vein structure extraction, feature vectors were obtained by using ICP and BGM algorithms, respectively. Based on the feature vectors, final recognition was performed by using Kernel Density Estimation (KDE) and SVM. The recognition results are shown in Table 4. Furthermore, we refer to the methods using deep learning to identify hand dorsal veins, which are also performed on NCUT database in recent years, as shown in Table 4.

Table 4. Results of hand dorsal vein recognition on NCUT database in recent years.

Compared with the non-deep learning methods, HBD can obtain better recognition results, which reflects the high reliability of deep learning for biometric identification. Compared with other deep learning methods, HBD can get comparable accuracy. When G:P = 7:3, the recognition rate is much higher than them. However, the models proposed in [23] and [24], such as VGG-16 and VGG-19, are so complex that the requirements for training time and hardware platform are particularly stringent. The structure of HBD we proposed is relatively simple, with only four convolution layers. Under the same level of accuracy, the operating conditions are much lower. At the same time, the use of hash coding further speeds up the operation and improves the recognition efficiency.

5 Conclusion

This paper applies DHN to the recognition of hand dorsal vein and proposes HBD method. After preprocessing, the hand dorsal vein image is input into SCNN-F. Then the sgn function is used to encode the output of the last network layer as −1 or +1. Finally, each image is encoded as a 128-bit code. By comparing the distances between hash codes, it can be judged whether they belong to the same class, so as to complete the identification. The advantage of hash coding is that by calculating the distance between codes, the similarity of two images can be easily obtained. Experiments on NCUT, GPDS, and NCUT+GPDS databases were performed to evaluate the proposed method. In order to make comparisons, traditional identification algorithms, ICP and BGM, were used for comparison tests on NCUT database. The experimental results show that the proposed algorithm can achieve higher accuracy compared with traditional non-deep learning methods. Besides, compared with other deep learning methods performed on NCUT database in recently years, our method can obtain the same level of accuracy and reduce the EER by an order of magnitude when G:P = 7:3. More importantly, since the structure of HBD is much simpler than the others such as VGG-Net, it can operate faster and efficiently while maintaining the same accuracy.