Keywords

1 Introduction

In biometric recognition, the capture of iris images has been characterized by the use of sensors that work in the near infrared spectrum (NIR). This is mainly because in this wavelength it is possible to capture a better iris textural pattern and its internal structures regardless of the iris color. The use of NIR sensors has largely avoided the problem of low contrast between the iris and the pupil in those people who have dark colored irises. However, NIR sensors has restricted the more extensive use of this biometric that has proven to be very accurate. Generally, sensors used in video protection or access control applications work in the visible spectrum (VS). On the other hand, in recent years there has been a substantial increase in the use of this type of sensors in mobile devices for biometric applications integrated to them. The iris biometric has not been exempt from this phenomenon and there are already applications of this type [1], so it is important to address this topic due to the problems that affect this modality and the wide spectrum of applications that can be developed. Video as a modality for obtaining iris images in applications in real time is a subject that acquires great relevance in the current context of the use of VS sensors in mobile devices and video surveillance cameras [2, 3] because from the video it is possible to obtain more information about the iris region.

In this work, an approach for video iris recognition is proposed; it is based on a scheme for quality evaluation of the iris image while processing the video capture. For this purpose, a measure of iris image quality is proposed, it takes into account the elements defined in the ISO/IEC 19794-6: 2005 standard [4]. As part of this scheme we also propose a segmentation method that uses the direct iris detection from video frames and combines this process with the semantic classification of the iris image to generate the segmentation mask. This combination ensures that iris images are extracted avoiding the elements that negatively influence the identification process such as closed eyes and out-of-angle look. The work is structured as fallows. Section 2 discusses the related works, Sect. 3 presents the proposed approach, in Sect. 4 the experimental results are presented and discussed, and finally the conclusions of the work are set.

2 Related Works

The quality evaluation of iris images is one of the recently identified topics in the field of iris biometry [5, 6]. In general, quality metrics are used to decide whether the image should be discarded or processed by the iris recognition system.

The quality of iris images is determined by many factors depending on the environmental and camera conditions and on the person to be identified [5]. Some of the quality measures reported in literature [6] focus on the evaluation of iris images after the segmentation process, which results in feeding the system with a significant number of low quality images. Another problem of these approaches is that the evaluation of the iris image quality is reduced to the estimation of a single or a couple of factors [3], such as out-of-focus blur, motion blur, and occlusion. Other authors [6, 7] use more than three factors to evaluate the quality of the iris image: such as the degree of defocusing, blurring, occlusion, specular reflection, lighting and out of angle. They consider that the degradation of some of the estimated parameters below the threshold brings to zero (veto power) the measure that integrates all the evaluated criteria. This may be counterproductive in some systems where the capture conditions are not optimal, preventing the passage to the system of images that can have identification value.

In [8] a quality measure was proposed, it estimates the quality of the iris image within the whole eye image captured in video by a VS camera. This measure considers properties established in the ISO /IEC 19794-6: 2005 [4] and combines them with methods of image sharpness estimation. However, the estimation is carried out on the whole eye image and not on the specific region of the iris. Additionally, the degree of diversity of the iris texture is not considered.

We believe that a quality measure that considers the parameters established in the standard [4] combined with image sharpness estimation, the degree of diversity of the iris texture and evaluates detected iris image before the segmentation, can help reducing errors in the next steps of the system with a consequent increase in recognition rates.

Recently, from the great success reached by the image classification methods based on deep learning, several methods of iris segmentation have been proposed. Fully Convolutional Networks (FCN) have achieved the best results [9, 10]. The main limitation that these methods have is that they are based on the classification of the image into two classes (iris and non-iris) without taking into account other elements in the iris image such as eyelashes and specular reflections that negatively influence the subsequent identification process. In [11] the authors presented a semantic iris segmentation method based on the hierarchical classifier Markov Random Field. The experimental results did not significantly outperform the best state-of-the-art methods, but the authors showed that the idea of semantic segmentation was a promising route for iris recognition in the VS.

In [12] a multi-class approach for iris segmentation was presented. This method is based on the information of different semantic classes in an eye image (sclera, iris, pupil, eyelids and eyelashes, skin, hair, glasses, eyebrows, specular reflections and background regions) by means of a FCN. The experimental results of this work showed that, for iris segmentation, the use of the information of different semantic classes in the eye image is better than the iris and non-iris segmentation. However, the application of this method for a video-based system is not feasible since image classification is performed on the previously detected eye image, which leads to a high calculation time to classify the whole eye image.

3 Proposed Approach

Figure 1 shows the general scheme of the proposed approach. Our proposal is based on direct iris image detection, its combination with a new quality metric and semantic iris classification in 5 classes (iris, pupil, sclera, eyelashes and specular reflections).

Fig. 1.
figure 1

General scheme of the proposed approach

The application of this approach in a video-based iris recognition system can ensure that the iris images detected are free of elements that negatively influence the identification (illumination, sharpness, blur, gaze, occlusion and low pixel density of the image). On the other hand, the classification applied only to the iris image and not to the whole eye image will reduce the computational time used to classify the iris image, which will allow its use in video-based systems.

3.1 Iris Video Capture and Detection

In [13] the use of white LED light is proposed as a way to attenuate the effect of low contrast between dark colored irises when eye images are captured by mobile devices (cell phones) in the VS. The authors demonstrated that their proposal increases recognition rates to levels similar to those that can be reached by a system that works with NIR images. For implementation of our proposal a similar device was designed, but replacing the cell phone with a webcam for video capture.

For the direct detection of iris images in video a detector was trained through the classic algorithm of Viola and Jones [14]. The image representation called integral image, allows a very fast computation of the features used by the detectors. The learning algorithm based on Adaboost, allows to select a small number of features from the initial set, and to obtain a cascade of simple classifiers to discriminate them. A classifier was trained to detect the region of the iris within the eye image. This region includes the bounding box that encompasses the iris and the pupil (see Fig. 1).

The training set consisted of 3300 positive samples taken from the MobBio [15] (800), UTIRIS [16] (800) and 1700 from frames of a video iris dataset of our own (see Sect. 4.2). The training set was manually prepared by selecting the rectangular regions enclosing the iris region scaled to a size of 24 × 24 pixels. The learning algorithm for iris detection is based on the Gentle stump-based Adaboost. The detector was tested on a test set consisting of 20 videos from our database (see Sect. 4.2), achieving an effectiveness in the detection of 100% of both irises of each individual (at least one couple of iris is detected) and 0.01% of false detections.

3.2 Image Quality Evaluation

One of the most important parameters indicated in the standard [4], the focal distance indicates the optimal distance between the subject and sensor for a normed pixel density. Pixel density is defined in the standard as the sum quantity of the pixels that are on the diagonal length of the iris image. Normed pixel density in the standard should be at least 200 pixels and be composed of at least two pixel lines per millimeter (2 lppmm). For medical literature, it can be assumed that the iris approximately represents 60% of the detected bounding box of iris region under well illuminated environment. Then the pixel density of an iris image (Ird) can be calculated using the classical Pythagorean Theorem by Eq. 1, where w and h are the width and height in pixels of the detected image.

$$ Ird = 0.6\left( {\sqrt {w^{2} + h^{2} } } \right) $$
(1)

3.3 Quality Measure for Iris Images

One of the elements that negatively influences the quality of iris images is the degree of image sharpness because depending on it the image may be blurred or out of focus and the iris texture loses the details of the structures that make possible the identification. An evaluation process of NIR eye images using the Kang and Park filter is proposed in [3]. The filter is used to filter the high frequencies and then to estimate the total power based on Parseval’s theorem, which establishes that the power is conserved in the spatial and frequency domains. The filter is composed of a convolutional 5 × 5 kernel. It consists of three functions of 5 × 5 and amplitude −1, one of 3 × 3 and amplitude +5 and four of 1 × 1 and amplitude −5. The kernel is able to estimate the high frequencies in the iris texture in the NIR better than other operators of the state of the art at the same time it has a low computational time due to the reduced kernel size. Then we think that it may have a similar behavior for iris images captured in the VS.

The entropy of an iris image has proven to be a good indicator of the amount of information [17]. The entropy only depends on the amount of gray levels and the frequency of each gray level.

Taking into account these elements and considering that the pixel density of the iris image is another of the fundamental elements for the quality of the image, we propose its combination with the Kang & Park filter and the entropy estimation to obtain a measurement of the quality of the iris image (Qiris). The proposed Qiris is obtained by Eq. 2.

$$ Qiris = \frac{Ird*kpk*ent}{tird*tkpk*tent} $$
(2)

Where: kpk is the average value of the image pixels obtained as result of the convolution of the input iris image with the Kang and Park kernel. tird is the threshold established by the standard [4] for the minimum Ird to obtain a quality image. tkpk is the estimated threshold of kpk to obtain a quality image, in [3] the authors, from their experimental results, recommend a threshold = 15. ent is the entropy of the eye image. tent is the estimated threshold of ent with which it will be possible to obtain a quality image. Entropy of an iris image is calculated by Eq. 3, where the pi value is the occurrence probability of a given pixel value inside the iris region and n is the number of image pixels.

$$ ent = - \mathop \sum \limits_{i = 1}^{n} p_{i } log_{2} p_{i} $$
(3)

Experimentally, we have verified that the iris images with a quality according to the international standard have an entropy higher than 4. For this experiment, we took a set of 300 images standardized to a size of 260 × 260 from the MobBio [15] (150 images) and UBIRIS [16] (150 images) databases and performed the evaluation of their quality by the parameters of pixel density and response of the Kark and Park filter. The experimental results established that those with values greater than the corresponding thresholds have an entropy with a value equal to or greater than 4, so we assume this value as the value of tent. Qiris can reach values depending on the thresholds selected for Ird, kpk, and ent. Thus considering the threshold tird = 200 established by the standard, tkpk = 15 experimentally obtained in [3] and tent = 4 experimentally obtained by us, the minimum value of Qiris to obtain a quality eye image would be 1, higher values would denote images of higher quality and values less than 1 images with a quality below the standard. One question to explore in this case would be to determine under what minimum values of Qiris it is possible to obtain acceptable recognition accuracies for a given configuration of a system.

3.4 Iris Semantic Classification for Segmentation

The main motivation for using the semantic classification of iris image is that the introduction of semantic information can avoid or diminish the false positive classifications of pixels that do not belong to the iris. The objective of the iris semantic classification is to assign to each image region a label of a class describing an anatomical eye structure (iris, pupil or sclera). From a computational point of view this is expressed by assigning the value of the predefined class to each image pixel.

From the promising results obtained in [12], we decided to use a training scheme, where trained FCN was adapted to the case of the semantic classification of the iris through a process of inductive transfer learning [18]. We selected the FCN model. fcn8 s-at-once that was fine-tuned of the pre-trained model VGG-16 [19]. The fine-tuning process allows us to select the most promising hypothesis space to adjust an objective knowledge. The implementation of the fcn8 s-at once architecture was performed in Caffe [20]. Intermediate upsampling layers were initialized by a bilinear interpolation. The output number of the model was set according to the 5 used classes. The difference is that the classification made on the image in our proposal is directly performed on the iris image (Fig. 2b) unlike the proposal of [12] that is performed on the whole eye image (Fig. 2a). This proposal significantly reduces the calculation time with respect to the proposal of [12] due to the smaller size image to be classified and the smaller number of classes to consider.

Fig. 2.
figure 2

Segmentation examples. (a) by [12] (b) by proposed method

Due to the need for normalize the texture of the iris from the Rubbersheet model, we combine the pixel-based algorithm (FCN) with an edge-based approach that allows the detection of the coordinates of center and radius of the iris and pupil regions.

4 Experimental Results and Discussion

In order to validate the proposal, our experimental design was aimed at verifying the influence of the proposed approach in the iris segmentation and verification tasks by evaluating it using two benchmark iris image databases and an iris video dataset of our own. We compared the performance of our proposal with results obtained using the FCN [12].

Three basic functionalities compose the implemented pipeline for experiments: Iris image acquisition and Image segmentation: These two modules are based on the approach described in the previous sections. Feature extraction and comparison: For the purpose of experiments in this work, we experimented the combination of two feature extraction methods in order to verify the robustness of the proposed approach with respect to the use of different features for recognition. Scale-Invariant Feature Transform (SIFT) [21], and Uniform Local Binary Patterns (LBP) [22]. For the comparison we used dissimilarity scores corresponding to each of these methods. The first estimates the dissimilarity between two sets of key points SIFT [21]. The second one use the Chi-square distance metric [22] which is a non-parametric test to measure the similarity between images of a class. In both cases the minimum distance found between two images gives the measure of maximum similarity between them.

4.1 Iris Datasets

MobBio [15] is a multi-biometric dataset including face, iris, and voice of 105 volunteers. The iris subset contains16 VS images of each individual at a resolution of 300 × 200. UTIRIS dataset [16] is an iris biometric dataset in VS constructed with 1540 images from 79 individuals.

Our (DatIris) database consists of 82 videos of 41 people taken in two sessions of 10 s each at a distance of 0.55 m. The camera used was a Logitech C920 HD Pro Webcam at resolution of 1920 × 1080. The videos were taken in indoor conditions with ambient lighting and presence of specular reflections to achieve an environment closer to the poorly controlled conditions of a biometric application. The database contains videos of people of clear skin of Caucasian origin, dark skin of African origin and mestizo skin, it is composed by 26 men and 15 women, in a range of ages from 10 to 65 years (see an image example in Fig. 3).

Fig. 3.
figure 3

Samples of eye images from MobBio, UTIRIS and DatIris datasets.

4.2 Experimental Results

The accuracy of the proposed segmentation stage was measured by comparing the results obtained by the FCN approach [12] with our proposal on MobBio database. This database provides manual annotations for every image of both the limbic and pupillary contours, 200 images with Qiris > 1 were evaluated and chosen in order to guarantee the quality of the experiment. As evaluation metric we consider the E1 error proposed by the NICE.I protocol (http://nice1.di.ubi.pt/). This metric estimates the proportion of correspondent disagreeing pixels. The average time (in seconds) that the segmentation method takes to obtain a segmented iris image was calculated using a PC with an Intel Core i5-3470 processor at 3.2 GHz and 8 GB of RAM. Table 1 lists the E1 obtained by FCN [12] and our approach. The results show a similar performance in terms of E1 for both methods, however in the case of our proposal a substantial reduction of the computation time for the segmentation process is observed.

Table 1. Experimental results of segmentation stage for MobBio dataset

The influence of our proposal on the accuracy of the verification task was estimated by the error of false rejection (FRR) at false acceptance rate (FAR) ≤ 0.001%. Comparisons were made in the form of all against all to obtain the distributions of genuine and impostors. For DatIris dataset, the 41 videos of the session 1 were processed taking two images of each iris and comparing them against a database composed of two images of each iris taken from frames of the videos of session 2.

Table 2 shows the comparison of the error of FRR at FAR ≤ 0.001% obtained using the FCN approach [12] and our proposal, on the three experimented datasets, taking two different intervals of Qiris to reject or accept the iris images to be processed. The results show that as the value of the Qiris increases, the system supports high quality images and rejects low quality images. This increase in quality, results in a decrease in the FRR, with a significant result in UTIRIS where an FRR = 0.04 is achieved with the 79.5% of the database and Qiris > 1 using LBP features. However, in the MobBio, increase in the Qiris threshold results in a significant decrease in the number of images to be compared. The results obtained on the DatIris dataset show a high performance of the verification process obtaining an FRR of 0.01, which corroborates the relevance of the proposed quality index in a real application.

Table 2. Comparison of the FRR obtained by the implemented system using the FCN approach [12] and our proposal.

It is also observed that the levels of FRR remain similar for both methods which shows that the restriction of the semantic classification to the iris region besides reducing the computation time does not affect the accuracy of the recognition process.

Table 3 shows the comparison of the proposed in [8] quality measure (Qindex) with our proposal (Qiris), at values ≥1 using LBP features and the proposed segmentation method. It is appreciated that by including the entropy of the iris image, the system considers a greater percentage of images to be processed by the system while maintaining similar levels of accuracy in the recognition.

Table 3. Comparison of the proposed quality measure in [8] (Qindex) with our proposal (Qiris)

5 Conclusions

In this paper, we propose a video iris recognition approach based on image quality evaluation and semantic classification for biometric iris recognition in the VS. It combines automatic detection methods, a new image quality measure and semantic classification. We analyzed the relevance of the image evaluation stage as a fundamental step to filter the information generated from the iris video capture. The experimental results showed that the inclusion of the proposed quality measure before iris segmentation limits the passage of low quality images to the system, which results in an increase of recognition rates. The application of the semantic classification on the iris image directly in the process of video capturing allows the reduction of the computational time in the segmentation process maintaining the levels of accuracy in the recognition achieved by the FCN based state of the art method. In future work we will evaluate the use of deep neural networks for the process of iris detection in video.