Keywords

1 Introduction

Many biometric systems have been proposed in the literature to allow access to secure systems [1, 2]. These biometric systems include fingerprint recognition, face recognition, voice recognition, iris recognition, and palm recognition [1, 3]. Face recognition is one of the most studied biometric systems [4]. In the security and surveillance applications, a high recognition rate is mandatory.

Recently, machine learning algorithms have produced high accuracy in face recognition systems [5]. Machine learning algorithms have two building blocks, which are: data and the algorithm. However, machine learning based face recognition methods requires a large number of labeled samples which are expensive and time consuming to collect. The performance of these methods often improves with the amount and quality of the available data.

There are two possibilities to obtain a large amount of data, i.e., Collection of face data from users, and data augmentation from limited available data. Masi et al. [6] discussed the need of collecting huge numbers of face images for effective face recognition, and proposed an augmentation method to enrich the existing dataset by introducing face appearance variations for pose, shape, and expression. The methods of “one-to-many augmentation” can mitigate the challenges of data collection, and they can be used to increment the datasets [7]. They are categorized into four classes: data augmentation, 3D model, CNN model, and GAN model.

  • Data augmentation: Data augmentation methods consist of photometric and geometric transformations. Transforms include a range of operations from the field of image manipulation, such as shifts, flips and zooms [7];

  • 3D model: To enrich the diversity of training data, different generic 3D models are used for rendering to augment faces;

  • CNN model: Rather than reconstructing 3D models from a 2D image and projecting it back into 2D images of different poses, CNN models can generate 2D images directly;

  • GAN model: Generative Adversarial Network (GAN) is also used for image augmentation, which combines prior knowledge of the face data distribution (pose and identity perception loss). Wang et al. [8] compared traditional transformation methods with GANs to the problem of data augmentation in image classification.

The specific data augmentation techniques used for a training dataset must be chosen carefully considering the context of the training dataset and knowledge of the problem domain. Besides, it can be useful to experiment with data augmentation methods in isolation and test to see if they result in a measurable improvement to model performance, perhaps with a small prototype dataset, model, and training. These techniques are robust but can be computationally intensive.

On the other hand, collecting real-world face database is expensive and time-consuming. However, real-world collected data provide better contextual meaning and allows the classifier to learn efficiently. For this reason, a data collection tool can help improve classification accuracy, especially for small and secure systems. In this paper, we propose a data collection and image processing tool that can be used to collect data for facial recognition. Our proposed tool is evaluated and updated from the feedback of users in three stages. Importantly, this tool allows for capturing 96 facial pose variations while making user-interaction with the system pleasant. Rest of the paper is organized as follows: Sect. 2 describes the stages considered in the data collection process and the main visual and ergonomic parameters. Next, the evolution of the developed tool is presented in detail in Sect. 3, followed by the user evaluation in Sect. 4. The conclusions and future research lines are placed in Sect. 5.

2 Data Collection Process

The user image capturing process may take a few seconds from image acquisition to processing and then subsequent use by recognition system. Some aspects may affect depending on the task that will be performed, for example, distance with which it was taken to photo and capture angle. Different studies recognize a necessary number of photos (approximately 30) but do not define the values of visual and geometric parameters present in the interaction with users [4, 6, 7]. Based on this analysis, a data collection tool that is composed of several visual and ergonomic parameters with minimal interaction is proposed that should be evaluated with users for parameters adjusting in any recognition system.

The following parameters are considered in each iteration:

  • Participants height: This aspect was considered to determine the height of camera from ground for final data collection;

  • Camera height: The current height of the camera from ground level;

  • Camera angle: The current angle of camera with respect to horizontal axis;

  • Capture stages: The stages of user’s image capture process;

  • Yaw angle: The maximal horizontal rotation of the head by users;

  • Pitch angle: The maximal vertical rotation of the head by users;

  • Number of images: The number of face images captured per user;

  • Discarded faces: The average number of discarded faces per user;

  • Average capture Time: The average capture time per user;

  • Worst capture Time: The worst capture time per user.

Mennesson et al. [9] showed that the degree of head yaw rotation is very important for the task of face detection (e.g ±15\(^{\circ }\)). The authors further commented about how the number of detected faces decreases to zero with a Gaussian decay when user pose is far from the frontal face. Evidently that the maximum of detected frontal faces is obtained with a yaw angle near zero degree (a frontal face).

Visual and ergonomic concepts were studied to facilitate data collection. During the process, three major challenges need to be addressed:

  1. 1.

    To guide the user naturally considering comfort while moving his head;

  2. 2.

    Considering that most of the users are not familiar with face recognition technology, an efficient visual language is necessary to give instructions, when something is not going well;

  3. 3.

    Identify external factors that can influence the quality of experience while using Face Recognition.

The first step was searching parameters that could be used to mediate the human-technology communication. In this regard, three aspects of user interface were observed [10, 11]:

  1. 1.

    Physical aspects (operating with a device as a physical object);

  2. 2.

    Handling aspects (the logical structure of interaction with the interface);

  3. 3.

    Subject-object-directed aspects (the mapping of objects “in the computers” with the objects in the real world).

Fig. 1.
figure 1

Second stage - rectangular matrix.

3 Image Processing Tool

In the first stage, users start from a frontal position and perform yaw and pitch movements of the face responding to the text indications received from the device.

In the second stage, the registration process was divided into three steps for a total of 56 faces as shown in Fig. 1. The principal problem in this stage was that the interface didn’t provide comfort and freedom for users. The users commented that the process was slow and the matrix interface was artificial and they needed mechanic movements.

In the third stage, the register was divided into two steps for a total of 96 faces. As you can see in Fig. 2 the initial steps of the previous stage were merged. In order to improve the human-computer interaction, after testing with the user, we detected improvement points that were implemented in this phase, for example:

  • Facilitate the movements of the head at the capture time;

  • To reduce time and effort to capture faces;

  • Increase the amount of captured faces.

The strategy tested at this stage showed one critical result. A significant number of users failed to turn their face in the 30\(^{\circ }\) for yaw and pitch angles. This leads to a result that the during the test phase, these angles are not expected to go beyond this limit.

Fig. 2.
figure 2

Third stage - pie chart.

The fourth stage was divided into two steps. The registered number of faces is set to 96, but changes are made in user interaction. In the first step, the register was divided into four quadrants where eight images were taken with 5\(^{\circ }\) of head variation in each quadrant. In the second step, we divide into eight pieces where four photos were captured per quadrant, with a maximum head variation of 15\(^{\circ }\).

This last strategy provided a more intuitive and comfortable interaction, because by reducing the head angle movement, and the capture time, we promoted more natural movement for the user.

4 Users Evaluation

The experiment took place in an environment with controlled lighting conditions, where participants were tested individually with an average time of 3 min per user. A total of 79 users served as participants for this experiment. Their ages ranged from 21 to 45 years.

Table 1 shows experimental evaluation and the parameter settings. Based on the parameters from the state-of-the-art in face recognition [7], the initial parameters for the first Proof of Concept (PoC) were established. To cover all possible face poses and shapes, n numbers of images are captured (\(n=96\), in current experiments).

Fig. 3.
figure 3

Fourth stage.

Table 1. Parameter settings.

About ergonomic parameters, the feedback in all stages allowed to adjust these parameters, such as camera height and camera angle. The camera height is changed to 131 cm in the second stage and 140 cm in the last stage, in response to user discomfort in the experiments. The camera angle was only increased in the last stage for usability reasons. Another interesting element was to capture the rotations (yaw, pitch, and roll) in an angle greater than 15\(^{\circ }\), a situation that made the user lose control and attention. This difficulty was removed by making improvements in the design (Fig. 3).

Among the functionality parameters, two steps were achieved in the final stage to complete the register, being insufficient and very ambitious to achieve it in one stage, and considered as excessive more than two stages. The different parameters adjustments allowed reducing the average registration time to two minutes, with the variance of one minute, highlighting as an acceptable time of user interaction.

The proposed changes meet the requirements from development and design teams. In the future, we intend to improve the communication when the user is not doing the correct head movements.

5 Conclusions

In this method, we propose a data collection and image processing tool for face recognition applications. We first analyze the process of the facial data collection and explains the data collection phases. The most important geometric and visual parameters are discussed and analyzed. In the final stage of the data collection tool, those parameters are selected that conform the system requirements and also allows a comfortable user interaction with the system. The parameters and their adjustments, although considered in other studies, show their importance in specific people and contexts. In conclusion, this system allows to collect facial data with important poses covered in the most user friendly manner and that in addition, high quality collected data can be obtained for subsequent tasks of face recognition in different scenarios.