1 Introduction

Authentication involves verifying the authenticity of a proof of identity. One of the earliest and most well known examples of (biometric) authentication are hand written signatures. Signatures are widely used to sign legally binding documents, even though these signatures are rarely verified, nor its authenticity is often disputed. For example, how many people check the signature on the ID card of the signing party or even on their credit card in the chip and sign payment paradigm? Moreover, signatures are easy to forge, reducing their level of security even further. Yet, signatures work well in the real world, probably due to the physical context as well as a strong legislation against forgery. Thus, even though hand written signatures are a great example of a biometric authentication modality, they cannot be adopted in a digital and/or remote authentication scenario. Still, institutions like banks have a great interest in systems for automatic signature validation due to their pervasiveness.

In online authentication systems, passwords were a much better alternative as they are easier to input and easier to match for a machine. Passwords, however, have become a nuisance for users, as well as a security burden. Users develop password headaches due to increasing password lengths and a growing number of password protected accounts. Additionally, long passwords are troublesome for mobile devices. Furthermore, system providers do not store passwords safely, as password database breaches have become the new ‘normal’ [36]. Hence, passwords have become a security risk, as they are often breached or easily phished due to password reuse. Furthermore, streaming service providers are pestered by credential sharing, which circumvents the personal service they deliver.

Security researchers have been searching for alternative authentication schemes for the past twenty years [8]. Biometric technologies have great potential, as they are able to recognize the user in a near frictionless manner. However, the adoption of biometrics is still low because recognition performance, security and privacy need to be improved collectively.

Recent advances in machine learning drastically increased recognition performance and this aids in securing the biometric system itself. In this chapter, we show where machine learning technology is applied in the biometric authentication pipeline. However, the blade cuts both ways, as the use of machine learning enlarges the attack surface of biometric recognition systems.

This book chapter mainly deals with biometric authentication. Biometric systems are also used for more controversial topics such as surveillance and law enforcement, which are out of scope of this chapter. Recently, big biometric recognition vendors – such as IBM, Amazon and Microsoft – are refusing to sell their biometric recognition products to law enforcement and even openly question whether this research direction should be pursued [33, 35, 61]. The pretext is recent work by Buolamwini et al. [9] that showed that biometric recognition performance differs for different gender and skin color. This discrimination of biometric systems is probably due to human bias introduced by system designers.

2 Biometric System

In this section we familiarize the reader with the main concepts in biometrics: we first describe the main building blocks [10] and introduce the two main phases: the enrollment and query phases. Second, we map the biometric system on a typical ML pipeline. Third, we identify the attack surface. Last, we introduce the evaluation metrics.

2.1 System Design

A biometric system can operate as a verification or a identification system. The verifier performs a 1-to-1 matching between the query sample that is presented at the input and the enrollment sample stored under the user’s profile. Differently, the identifier carries out a 1-to-many matching, which compares the query biometric to a database of enrolment samples. Thus, the tasks answer two different questions: in the first case, we want to verify whether the user is who they claim to be. In the latter case, we are looking for an unknown person within a set of known identities.

Fig. 1.
figure 1

Architecture of a biometric authentication system [10]

Figure 1 shows the typical building blocks in a biometric system, however, in the real world boundaries are blurred and components might overlap. The first step in the chain acquires a biometric measurement. Sensors – such as fingerprint scanners – measure the unique biometric signal and act as the interface between the user and the system. This can require cooperation by the user, e.g. active measurement in face recognition on mobile devices, or happen in the background without prompting any command, e.g. passive analysis of gait patterns via cameras or motion sensors.

The signal is then forwarded to the biometric feature extraction module. Here, the relevant features for user identification are extracted. Prior to biometric feature extraction, the measured signal might undergo pre-processing or a sanity check. These biometric features are stored in a database and retrieved during matching. The biometric template, is constructed from the set of biometric features.

The database is populated during the enrollment phase. The user presents their biometric trait several times to the sensor. The templates are extracted, encrypted, and stored. The enrollment phase is typically short and does not allow to capture samples in many different contexts. The length of the enrollment phase is limited by usability and security: an enrollment phase that lasts for weeks is not user friendly. Furthermore, the enrollment phase happens in a special security context as it is a trusted phase. Thus, the input obtained during enrollment has a significant impact on the integrity of the system.

After a user is enrolled in the system, their identity can be verified. During the query phase the newly obtained query sample is compared with the biometric template. In a verification scenario, the user claims an identity which is used to query, or index, the database. In case of identification, the database is scanned to search for a match between the query sample and the biometric templates that populate the database.

The matching is performed within the comparison function block, and, based on its output, a decision is made on whether to grant or deny the user access to the protected environment, i.e. accept or reject the user. Optionally, to deal with changes in biometric traits, a second decision block can decide to add the query sample to the enrollment samples in the biometric database [4].

The pipeline can be implemented online or offline, maintaining its structure. However, the separation between the client and the server is often subtle. We can draw a separation between the sensor and the biometric feature extraction step or choose to extract biometric features locally. Depending on the situation, appropriate measure to store the biometric template in a secure and private fashion applies.

Fig. 2.
figure 2

Machine learning pipeline and related blocks in the biometric authentication pipeline.

2.2 ML-Enabled Biometric Authentication

Recent developments in machine learning (ML) have boosted biometric recognition accuracy. Depending on the chosen modality, ML can be more or less integrated in the different steps described in Sect. 2.1. For example, feature extraction is commonly performed by means of automated algorithms, which has been demonstrated to bring better performance than statistical and manually-engineered features.

We can look at a ML model as a parametric function \(f_{\theta }(x)\), where \(\theta \) are the parameters of the model and x is the input vector. Pre-processing and manual feature engineering are usually performed before feeding x to the model. Therefore, training is reduced to a search for the best set of parameters w.r.t. a pre-defined evaluation metric that drives the training phase.

We can draw a parallel between the ML pipeline and the biometric pipeline as depicted in Fig. 2. The sensor measuring the biometric signal outputs the input x. A set of relevant features for the given task is extracted by the subsequent module, i.e. the feature extractor. As outlined before, pre-processing and signal refinement are, optionally, carried out between these two steps. The model can incorporate the decision function as well as the template database. We can learn a model that encodes the identity of a pre-defined set of users as an alternative to traditional distance function, like computing the Hamming distance. Finally, a decision is made based on the output of the model.

By looking at the biometric pipeline as a ML pipeline, we introduce new vulnerabilities that are inherited from the nature of the ML system itself. These can be divided in threats affecting the input of the system, the model, and the output.

2.3 Attack Surface

As a biometric system is complex, with different building blocks, its attack surface is quite extensive. An adversary can target every building block in Fig. 2 or the interfaces between them [65]. Depending on the attack point and the attacker’s goal different attack techniques are used [5].

Input Level. The sensor captures the biometric trait. This raw input data is transformed into a machine comparable representation during pre-processing and feature extraction. Therefore, we define the input level as the combination of the sensor and the biometric feature extractor. Depending on the current state of the system – enrollment or query phase – the input data is stored in the biometric database or processed by the comparison function. The following attack techniques target the input level:

  1. 1.

    Spoofing or presentation attack. This type of attack presents fabricated biometric traits to the sensor. Defenses – presentation attack detection or liveness checks – aim to verify whether the submitted data is real or fake.

  2. 2.

    Replay. Replay attacks are launched at the interfaces between the building blocks. Not only raw sensor data, for example an image found on social media, can be replayed. An attacker can operate one step later and replay extracted feature vectors. Defenses rely on secure encrypted communication channels that implement some sort of challenge response protocol, however, this does not defend against insider threats.

Biometric DB Level. Attacks on this level focus on corrupting the database or stealing the templates. Biometric templates are valuable as they contain sensitive information (function creep), or can later be used to perform replay attacks or craft spoofing samples.

  1. 1.

    Adversarial examples. Adversarial examples are not only a threat to the comparison function, but to the biometric database as well. As mentioned before, in some biometric authentication schemes the biometric template is a machine learning model, with other words, the biometric database contains a personalized comparison function. Adversarial examples that target the comparison function level are injected during query phase, while those that target the biometric database are injected during enrollment phase, or during query phase. Examples that are injected during query phase target the automated update strategy. A secure enrollment phase is paramount in defending against these attacks. Defenses for the update strategy focus on detection [43].

  2. 2.

    Hill climbing. The goal of hill climbing attacks is to extract the biometric templates from the template database. The attacker injects arbitrary samples at the interface, which they change depending on (fine-grained) output from the authentication system. As these attacks also target the interfaces between the building blocks, the same defenses as for replay attacks apply.

  3. 3.

    Insertion, deletion and theft. Inserting or deleting elements in the biometric database allows the attacker to manipulate the outcome of the authentication system. Alternatively, an attacker can steal the templates to perform replay attacks, to craft spoofing material or to extract private information from the template. Defenses consist of biometric template protection [64] such as secure sketches, fuzzy commitment schemes [72] or cancelable biometrics [66, 83].

Decision Level. During query phase, the comparison function provides a score that encapsulates the likeliness that the query sample originates from the same person as the enrollment samples. As discussed above, this function takes input that is generated with the aid of machine learning or is a machine learning algorithm itself. Attacks on this level manipulate the output of the comparison function:

  1. 1.

    Adversarial examples. The attacker crafts special examples that fool a machine learning component to manipulate the output of the comparison function. These examples are injected prior to arriving at the comparison function, thus either at the interfaces or at the sensor itself. Defenses against this type of attacks focus either on detection or robustness of ML algorithms.

  2. 2.

    Malware infection. An attacker or insider manipulates the node that runs the comparison function to thwart the outcome of the comparison function. Good security practices – secure code, end-point security, input sanitation, etc. – defend against this threat. We do not further explore this threat, as it is not a specific biometric system vulnerability.

Depending on the attacker’s goal, knowledge and capabilities they will apply one of the aforementioned attack techniques. In general, the attacker has the capability to influences either the query phase or the enrollment phase. If the system allows for template updating, the attacker can try to influence this mechanism instead of the enrollment phase. An attacker that acts during the query phase aims to evade the system either through spoofing [76] or generating physical realizable adversarial examples [77, 98]. Alternatively that attacker can launch a hill climbing attack [57] to violate biometric template privacy. Contrary, influencing updates or enrollment leads to poisoning attacks. In a poisoning attack [4, 43], the attacker compromises the biometric template to either violate system integrity or its availability, i.e. rendering it useless.

2.4 Evaluation Metrics

Biometric authentication performance is measured with the equal error rate (EER). As illustrated in Fig. 3, the EER occurs at the threshold where false acceptance rate (FAR), also called false match rate (FMR), equals the false rejection rate (FRR), also called false non-match rate (FNMR).

$$\begin{aligned} FAR = \frac{FP}{FP + TN} \end{aligned}$$
(1)
$$\begin{aligned} FRR = \frac{FN}{FN + TP} \end{aligned}$$
(2)
Fig. 3.
figure 3

The equal error rate is defined by the similarity score at which the FAR and FRR are equal to each other.

Determining the threshold for EER post factum is incorrect. In a real system the threshold for the authentication decision is determined upfront, typically on the validation set. The EER threshold from the validation set is then used in the test set, which determines the FAR and FRR. To express recognition performance as one number the half total error rate is defined as:

$$\begin{aligned} HTER = \frac{FAR+FRR}{2} \end{aligned}$$
(3)

When setting the accept threshold in a real system, the system designer can opt for more security, by setting the threshold more strict (typically higher). A more strict threshold however, leads to more false rejects, which penalizes usability.

3 Biometric Feature Extraction

A biometric system deals with biometric feature vectors and biometric templates. Biometric feature vectors are obtained during enrollment and compared with the stored enrollment templates during verification. In this section we discuss all steps necessary to obtain these biometric feature vectors and templates. They are acquired by first capturing the biometric trait with a biometric sensor. Then, the raw biometric data is transformed through pre-processing, and finally the biometric feature vector is extracted through feature extraction.

3.1 Sensors

The sensor is the interface between the physical world and the digital biometric authentication system. It captures the biometric trait and presents the digital signal – typical a multivariate time-series, a 2D image or a video – to the feature extractor. The nature of the sensor – the type of data captured, sampling rate, resolution, etc. – impacts the performance of the system drastically, but also its security, as some traits are inherently more costly to forge, for example face authentication with 2D or 3D images [38]. Fusing sensors further enhances the security without hindering authentication accuracy [14].

Since the sensor provides the input to the system, the integrity of the system is highly dependent on the trust placed in the sensor to adequately provide signals originating from a real biometric trait. If the sensor can be circumvented, arbitrary input can be given to the system, which leads to a variety of attacks. Therefore biometric systems are often implemented as trusted end-to-end pipelines where the identity verifier controls all building blocks [7], e.g. physical access and local authentication on smart devices. In remote authentication scenario’s however, the pipeline is not fully under control of the verifier as sensing is done locally and (some of) the other building blocks are implemented on a remote server. The degree of trust w.r.t. the input depends on the system design, however, for all types of systems trusted input is paramount for integrity. Therefore, trust in sensor input can be enhanced by relying on secure hardware modules [30], sensor authentication by leveraging physically unclonable functions (PUFs) [13] or sensor authentication by profiling unique sensor noise with ML [16, 17, 47, 86].

A sensor that measures a biometric signal will always capture very personal information that can be used for other purposes than authentication [18]. Therefore, a user should trust that the sensor does not leak the raw biometric data, and that the obtained biometric signal or a derivative will not be misused further down the biometric authentication chain [70].

3.2 Pre-processing

Pre-processing is paramount for recognition performance. In this step the quality of the signal is checked and improved by removing noise and performing alignment. For this end pre-processing leverages ordinary signal processing such as denoising, as well as machine learning. A good example can be found in IMU based gait authentication where the signal is segmented in steps, this removes phase shifts, as well as provides a single coherent unit to extract features from. Furthermore, IMU based gait authentication systems need to account for variations in sensor orientation, e.g. the multivariate time series signal is transformed into gait dynamic image (GDIs) [103] by estimating the angle between the acceleration vectors on distinct time steps.

Face recognition is another example of a biometric modality with a vast body of literature. Older methods, subdivide the pre-processing pipeline of face authentication systems in four steps: face detection [95, 97, 101], pose estimation [50], landmark detection and finally face alignment based on these landmarks. Face alignment procedures are typically categorized either as template fitting [104] or regression based [12]. The downside of these older methods is that they rely on manual feature extraction methods that try to extract shapes, which is challenging in unconstrained environments with varying lightning conditions, occlusions, radical face poses and different facial expressions. Convolutional neural networks (CNNs) seem to perform a lot better in these real world environments. Therefore the most recent face alignment systems rely on deep learning technology. One of the most well known examples is MTCCN [99] which is used among others in the more recent implementations of FaceNet [74]. Still, alignment of faces remains challenging under different poses. Recently GANs have been proposed to deal with poses by performing pose estimation, pose correction and finding a pose-invariant representation [85]. Thus, more recent methods no longer subdivide face alignment in different steps and perform the operation in one go.

We provide a last example in fingerprint authentication. As fingerprint matching happens mainly based on the coordinates of extracted minutiae, a correct alignment is important. Recent work by Shuch et al. correct for rotations by leveraging a siamese networks architecture built on top of CNNs [75].

3.3 Feature Extraction

The feature extraction phase aims to find a low-dimensional – compared to the original input signal – representation of the input data such that inter-person similarity is minimized and intra-person similarity maximized. Finding a discriminate feature set is paramount for recognition performance. Before the introduction of deep learning, feature extraction was performed manually by statistically describing the signal or extracting discriminating features in different domains, e.g. time-domain, frequency-domain, ... Deep learning improved upon this situation by automatically extracting discriminate representations of highly complex signals. Therefore, in most biometric modalities deep learning is used to find a discriminate representation of the biometric data, a note worthy counter example are fingerprints, where most systems still rely on minutiae for matching [60]. Even though minutiae are handcrafted features, due to poor image quality, skin conditions, and other sources of noise minutiae benefits from deep learning [19, 53, 73, 82].

Hand-Crafted Features. Robust feature extraction depends heavily on a good pre-processing phase. For example, IMU based gait authentication is highly depended on robust step segmentation [22, 28, 46, 69]. A user’s step characteristics are represented by a length normalized time sequence that captures one step. Matching is then performed by dynamic time warping of the average of the enrollment samples and the query sample.

Step segmentation however is hard, therefore later approaches no longer segment steps and extract statistical features derived from sliding or tumbling windows over the accelerometer and gyroscope traces [44, 93]. Statistical features are powerful to describe the general characteristics of a signal, but often fail to adequately capture spatial-temporal relations. Hidden markov models can capture these temporal relations by relying on the cyclic nature of gait patterns [54, 87]. HMMs work well in lab conditions, however, they seem to lack expressiveness to function well in less constrained environments.

Automated Feature Extraction. In the visual domain CNNs and RNNs have been applied successfully to capture spatial-temporal relations in complex data. IMU based gait authentication also benefits from these techniques. CNNs have been applied successfully on segmented steps [27], gait dynamic images [102] and variable length sequences [88]. Alternatively RNNs have been adopted in a similar manner, both on segmented [25] steps and windowed signals [96]. Even when deep learning techniques are applied, signals are pre-processed and different representations of the signals are fed to the system, e.g. gait dynamic images [103] and the extraction of roll and pitch [88] to accommodate for changes in sensor orientation; or steps segmentation is performed to align the phase of the signal. With other words, unnecessary sources of noise are removed.

3.4 Attacks and Defenses

As mentioned in Sect. 2.3, an attacker can try to attack the system by presenting something different than the real biometric trait to the sensor, i.e. presentation attack or by injecting forged or observed samples at the interfaces between the different components, i.e. replay attack. For example, in face recognition, presentation – also known as spoofing [76] – attacks are performed by presenting a mugshot, a video or a mask to the camera. Alternatively, fingerprint sensors can be tricked by fakes made from wood glue or print outs with conductive ink [11]. To perform a replay attack, an adversary needs to obtain access to the interfaces between the different components, for example by physically opening devices or by breaking ’secured’ communication channels – insider attacks.

Some authentication modalities are more robust to spoofing as it is more costly to produce a fake. 2D face authentication is by far the cheapest to spoof, as personal images are found all over the internet and high quality print outs are relatively cheap. 3D face authentication is more expensive to spoof, as it requires a mask to be built. On top of that, obtaining an accurate 3D representation of a face without the real model is harder.

Presentation attack detection – also known as liveness detection – is an unsolved problem in the biometrics community. The topic has dedicated tracks on the major biometrics conferences: IEEE International Conference on Biometrics: Theory Applications and Systems (BTAS), IAPR International Conference on Biometrics (ICB) and International Joint Conference on Biometrics (IJCB); and a biyearly competition LivDet [1, 49]. Typically liveness detection is done either by adding additional hardware or in software by looking for specific characteristics of spoofs. Software based liveness checks rely on ML, the 2015 LivDet competition winner [56] adopts CNNs to detect spoofs. They show that their method is superior to earlier proposed solutions that rely on manually engineered features such as local binary pattern (LBP) extraction and more traditional ML methods like SVMs. Raja et al. [63] further explored the difference in expressiveness between manually engineered features – texture based, LBP, etc. – and automated features learned by a deep learning model and confirm the earlier observation: PAD benefits from deep learning based features, because they are more robust and lead to better recognition performance. Sadly attackers can also leverage the power of machine learning to produce new attack vectors, for example, due to the recent successes in speech synthesis, arbitrary speech, i.e. authentication systems that do not rely on a fixed keyphrase, have become easy to spoof. Luckily, recent work by Yan et al. [94] showed that it is possible to characterize the difference of sound generated by humans and machines.

Defences against replay attacks are typically implemented in one of two ways: a challenge response system or by correlating the biometric signal with an additional out-of-band signal received from another sensor. The challenge response system relies on applying a signal that is beyond the control of the attacker, and then retrieving this same signal from within the biometric data. For example one can use the screen of a smartphone to produce a color coded signal during face authentication. If the reflections match the applied pattern one can be fairly sure that the biometric trait is not replayed [78, 81]. Alternatively some biometrics rely on applying a changeable signal, the biometric system can thus observe the unique reaction to the signal, i.e. the biometric used for matching, and the signal itself for PAD, an example of such system is hand geometry [41]. EEG is an example of a more complex biometric with the same potential.

Defences based on out-of-band signal correlation obtain two signals, one with the biometric and another via a different communication channel and different sensor. Both signals are correlated since they capture a similar pattern. For example, the heart rate can be obtained in a recording of a face as well as through a PPG sensor or a finger held to the camera [80]. Another example requires to move the camera during face authentication. The movements from the video are then matched to the ones observed by the IMU sensor [42]. We want to point out that most of the replay prevention methods also defend against spoofing attacks as they defend against a stronger attacker.

4 Biometric DB

Storing a feature vector in a biometric DB threatens the user’s privacy as well as the integrity of the whole authentication system. Therefore, modification are applied to transform a feature vector into a template, which is retrieved for matching at verification time. We discuss here the requirements for a secure and private template storage, the techniques that ensure its protection, and the attacks that aim to alter it.

4.1 Template Enrollment

The template enrollment phase populates the template DB (cfr. Fig. 1). Depending on the given modality, these representations can vary greatly and present different implementation challenges. The stored template can be either a (set of) feature vector(s) or a model function. Furthermore, these templates can be self-updatable or static.

Static templates are usually updated on-demand, as per request of the user or the service provider. Differently, self-adapting templates deal with changes in appearance applying continuous updates. The system can simply harness the samples provided by the user at verification time, thus becoming invisible to the users themselves. Self-updating strategies help to cope with natural and artificial modifications of one’s biometric trait – such as a haircut or old age wrinkles for face images. In practice, the decision made at verification time is fed back to the biometric DB to update the stored template based on a specific update policy (cfr. Fig. 1).

The storage step is highly sensitive, since losing these biometric information, e.g. following a data breach, threatens the availability and integrity of the system. Moreover, biometric templates are unique per user and non-renewable. Hence failing to securely store them jeopardize users’ privacy.

4.2 Template Matching

Template matching is performed in the Comparison Function module. This module however does not always apply its own function. If the template consists of a ML model, for example OneClass-SVM, the matching function is embedded in the template itself. Thus, the comparison function itself is passed to the Comparison Function module. Contrary, if the template is a (set of) feature vector(s), matching happens based on the function defined in the comparison function, which is often some sort of distance function.

4.3 Attacks and Defenses

In real-world applications, a template should not be stored in plain-text to prevent breaches and the permanent loss of users’ identity. Encrypting the reference template protects against data breaches by preventing their reconstruction. However, the template is recovered for matching, thus can potentially be stolen at each new authentication attempt. Therefore, biometric template protection (BTP) schemes apply a transformation that does not require the raw biometric during the query phase. BTP schemes instead store derived artefacts which comply with the following three requirements (ISO/IEC FCD 24745):

  1. 1.

    Irreversibility: It should be computationally hard to reconstruct the reference template.

  2. 2.

    Unlinkability: It should not be possible to link different templates derived from the same biometric sample.

  3. 3.

    Revocability: The reference template should be revocable in a way that the original biometric sample is re-usable across services.

By allowing the reuse of a single sample across different services, BTP prevents cross-matching and minimizes the risks associated to unauthorized breaches. The transformation requires a secret key that is provided first during enrolment. At authentication time, the features and the key are used to compute a reference template that is matched with the stored one. Thus, the matching happens in the transformed domain.

Irreversibility and revocability are achieved by using cryptographic-alike one-way functions. The most delicate step of the process is the management of the keys. In BTPs, this is done in two ways [68]: (1) biometric cryptosystems bind a key to or generate a key from a biometric feature vector. The retrieval and the generation is driven by the so-called helper data, which is provided by the user during both operational phases. (2) Cancelable biometrics apply a transformation that can be invertible or non-invertible. The key that is used to perform the given transformation is protected via a second factor that can be, e.g., a password or a second biometric trait.

Key challenges of BTPs include [51]:

  • Take into account advances in the feature extraction process. Templates derived from fingerprints and face images usually yield a high accuracy due to being easy to transform into a binary and consistent representation. A common example is the use of bloom filters [67]. However, face images are bounded to deep learning algorithms that extract feature that might be unsuitable to existing template protection schemes. Moreover, voice recognition is one example of a biometric trait that is unsuitable to traditional transformations. In fact, it has been proven to be less consistent and yield pure result, therefore relying on a model rather than a template for authentication [52].

  • Bridge the gap between performance and security. Estimating the biometric distribution is key to trade-off between privacy and accuracy. However, this is rarely achieved in practice.

  • Address the unlinkability of biometric templates. Unlinkability evaluation is an unsolved problem in BTP schemes. This is due to the schemes being often bounded to the use of a second factor [6]. General frameworks exist to evaluate the unlinkability of templates generated using different keys [32]. A different approach to prevent this problem is to work in the encrypted domain by harnessing Fully Homomoprhic Encryption (FHE) [84]. On the one hand, FHE has the advantage of preventing degradation of data, since the operation are performed in the encrypted domain but on the original, unmodified, sample. On the other hand, only a limited set of operations are possible, and a substantial computational overhead applies.

Privacy Leakages. BTP schemes are effective against data breaches. A stolen template can be revoked, thereby reducing the risk of theft and loss. However, other blocks of the system still leak sensitive data about the user [62]. The attacker might harness the verification output to gain knowledge about the system and the user. The amount of information varies based on the output received from the system: it can be a binary response (accepted/rejected) or a score (e.g. model output). In the latter case, we talk about Hill-Climbing attacks [57, 58]. Hill-Climbing has been used to uncover encrypted face images from observing a quantized output [2]. An input is presented to the system and compared to a reference target template. Based on the output matching score, modifications are applied. Throughout the process, only modifications yielding an higher matching score are retained.

To counteract information leakage, output randomization could be used. This protects the model and the templates by preventing or hindering reconstruction attacks. These attacks are usually performed as a first step before carrying out other attacks for which knowledge is key (e.g. adversarial attacks).

Adversarial ML: Poisoning. Poisoning attacks aim to inject a set of malicious samples within a target training set. The adversary targets the self-update strategy of the ML model, such that every re-training steps moves the function learned by the model in the direction specified by the adversary. This contamination leads to two threats: a denial of service, which disrupt the service, and a careful modification of the model function that selectively grant or deny access to a target user.

In the context of biometric authentication, this attack targets self-adapting templates during the enrollment phase. The goal of the adversary can be either to find the best modification to be authenticated as a rightful user (untargeted attack) or deny authentication to a predefined victim (targeted attack). In the target poisoning case, Biggio et al. [4] have analyzed the resilience of a biometric template under white box access to the model and no further protection layer. Garofalo et al. [29] have investigated the poisoning setting for a OneClass-SVM model authenticating a single user. This last attack relies on strong assumptions on the adversaries capabilities. The adversarial sample needs to comply with the following three features: (1) it can by-pass the PAD layer, (2) it is accepted as a sample for the victim user, (3) is able to trigger a template update (i.e. accepted with a high certainty) and (4) acts within a realistic budget in terms of number of queries. The aforementioned assumptions are relaxed by Lovisotto et al. [43], who perform an attack on biometric systems during the query phase, leveraging template updating.

5 Comparison Functions

Authentication/identification tasks are comparisons between (a subset of) previously enrolled features (biometric template), and the biometric features originating from the new query sample. In this section we give more details on the most used comparison functions and the attacks against them.

5.1 Distance Functions

There are a few different options for the actual comparison function. If one wants to work with simple distance functions between features, the choice usually depends on the model the features were derived from [91]. For example, in facial authentication systems, the current state-of-the-art feature extractors are variant of the well-known cross entropy loss, which has an inherent angular component [21]. Therefore the features derived from those models should be compared with an angular distance loss. On the other end, some feature extractors for facial authentication systems are trained on losses that are explicitly trained to emit features that are separated in \(L_2\) distance [74], which leads obviously to the use of Euclidean distance functions to compare features in subsequent authentication attempts.

Regardless of which distance function is used, some different schemes for which two vectors to compare exist. Since using only one of the enrolled features is not a good representative of the enrolled person, the new feature vector is usually compared against a weighted combination of enrolled feature vectors, called the centroid. The weighting scheme can be simple where each feature vector is weighted equally. Alternatively, a higher weight is given to the most recent feature vectors. These are a few of the often used options, but many more exist [3].

Another variable to play with is which subset of the feature vectors to take into account. The intuitive option is to use every enrolled feature vector in the centroid. This option is intuitive because the more feature vectors are used in the centroid, the better the centroid should be an approximation of the actual centroid of the target identity. But different schemes exist that offer different trade-offs. One could for example use only a subset of the feature vectors, namely those that correspond to the feature vectors that have the highest distance to the new feature vector. Intuitively this makes the system more resilient to false accepts, since the hardest comparison is chosen, and should not make a difference for benign comparisons. Again, multiple different options exist, each suited for different system designs and trading off security, usability and other metrics.

These distance functions are often used in open-ended systems and therefore usable for both identification and verification. For verification, it is simple to query the desired centroid from the template database, calculate the feature vector from the authentication attempt and make the decision whether or not it is authenticated (based on some threshold, e.g. EER). For identification this is much the same, except that instead of making a binary classification against one specific centroid, the distance between all centroids (identities) is calculated, and the identity with the closest match is returned. A variant of this does also return the closest identity, but only if the distance falls below a specified centroid. This to ensure that returning the “least negative” isn’t detrimental to the security of the system, and can be thought of as having a “other” type of class.

5.2 Learned Functions

SVM are also a common decision function in biometric authentication. Since authentication can’t be modelled as a binary classification task (it is hard to model the negative set, since all other users belong to it), one-class SVMs are used [27, 29]. They are commonly used to model e.g. smartphone access, or other systems where there is only one legitimate user. They are then trained on feature vectors from that legitimate user which form the positive set. They are then able to discern between new feature vectors from that user, where the user is authenticated if the new feature vector is classified as positive by the one-class SVM.

Another way to perform open set recognition is by calculating the loglikelihood ratio of the probability of the biometric sample under a universal background model (UBM) and under a personal adapted model [72, 87]. These universal background models used to consist of gaussian mixture models (GMMs), however, in speaker verification they found a more discriminate low dimension representation called i-vectors (identity-vectors). I-vectors are compared by the use of linear discriminant analysis (LDA) [20].

5.3 Attacks and Defenses

Attacks Against Classic Distance Functions. Since the classic distance functions only rely on the new feature vector, it is the specific model that is the weak link in this case. If the model can be tricked into emitting a feature vector that is close enough to an intended feature vector, the attack is successful. This is easily doable with the now well-known adversarial examples.

Adversarial ML. While using machine learning has been very beneficial for authentication systems, it has introduced additional attack vectors for attackers to exploit. Evasion attacks in this context are defined as making the decision function output the wrong decision, to the benefit of the attacker. There exist a few variations under the umbrella of evasion attacks. These variations consist of specifics such as: flipping the decision in binary cases (e.g. authenticated vs not authenticated) versus a claimed identity, or changing the final classification of closed-ended system [5, 77]. The latter is not common since authentication systems are better suited for open-ended systems.

Another important distinction in evasion attacks depends on the intended classification. For binary cases this is self-evident, but for open-ended systems there are two variations. Either an attacker can trick the decision function to identify as a specified target (targeted evasion attacks) [39, 98], or as any target that is not the original (untargeted evasion attacks). In the case of verification, where matching is performed against a chosen identity, the target is usually chosen (for obvious reasons). In the case of verification, both attacks are valid options, dependent on the goal of the attacker.

Defenses. Defenses against adversarial input focus on improving the robustness of the machine learning classifier during training [34, 71, 92] or on detecting adversarial input [15]. The two main strategies to improve model robustness are adversarial training or adapting the loss function to obtain better separated and more condensed classes in feature space. Detection systems look for adversarial noise, which is typically constraint in order to be physically realizable.

6 Summary

Biometric authentication might just be the future for authentication systems. They are more user friendly and offer unique security properties. Machine learning plays a massive roll in improving recognition performance by constructing more complex comparison functions or providing more discriminate presentations of highly complex biometric data. Furthermore, machine learning aids in solving some of the security problems such as spoofing and replay attacks. Machine learning however, does also enlarge the attack surface introducing new attack vectors that can be exploited.

To study the impact of machine learning on biometric authentication, we introduced the different building blocks of such system. We then showed how these biometric building blocks map upon a machine learning pipeline. Within this comprehensive model we discussed the security and privacy challenges of the biometric authentication system. Thereby providing the reader with a comprehensive model to further study the application of machine learning for biometric system security. To end this chapter we summarize how biometric authentication is best modelled as a machine learning problem; why deep learning does not only provide benefits, but also introduces new threats; and what (research) challenges still remain.

6.1 Biometric Authentication as an Open Set Problem

Biometric authentication, i.e. the verification of a claimed identity, is within the security community sometimes modelled as a closed-world binary classification problem [79, 89, 100], however, this has several disadvantages: (1) from a practical perspective it makes it hard to train your classifier as the data set is highly unbalanced due to short enrollment phases. Thus, there is a limited amount of positive data – you –, while there is a lot of data necessary to represent the negative class – not you. (2) it is close to impossible to represent the negative class accurately. The classification of a previously unseen person will depend heavily on the population used to represent the negative class. If that specific user has traits more similar to the positive class then any of negative classes, that person will be authenticated. (3) Zhao et al. [100] proved that closed world binary classifiers do not sufficiently constrain their decision boundaries, which leads to worse then expected performance.

All of the biometric authentication work discussed in the previous sections model the verification problem as an open-world problem. This better corresponds with the limited amount of enrollment data that is available in real biometric systems. Furthermore, an open-world assumption is a more realistic problem representation, as not the whole population can be seen at training time. The most used open-world techniques are: dynamic time warping, which measures the similarity of two temporal sequences; one-class support vector machines; universal background models and few shot learning with siamese networks or other deep learning based feature extractors.

6.2 Threats Linked to New Factors and Deep Learning

Deep learning has paved the way for the use of novel bio- and behavior-metrics for authentication, like finger-vein [40], ECG [37], hand geometry [41], EEG or eye movement [24, 45], however, these techniques enlarge the attack surface. The use of deep learning in critical infrastructure sparked a whole new domain of research, i.e. adversarial examples. Additionally, human bias is represented in machine learning algorithms, which leads to unfair algorithms. Due to this human bias, security levels of authentication systems are different depending on skin color, ethnicity and gender [9, 55]. Lastly, due to deep learning function creep has become a bigger threat, because more complex functions can be learned, thus also functions that map biometric feature vectors to soft biometrics or even the raw biometric data [31, 48]. In general, due to deep learning any form of personal data has become more sensitive, as the amount of leakage is unclear. The other way round, as deep learning enables to estimate biometric features from other biometric traits, sharing any kind of personal data poses a security risk for biometric authentication [23].

6.3 Future Directions

Biometric authentication systems still have a long way to go before they become a satisfactory authentication methodology that can be applied in a wide set of use cases. More research is needed to further reduce user interaction, limit the attack surface and ensure that biometric authentication does not violate user privacy. Ross et al. [70] described the future directions for biometrics in general. We focus here on biometric authentication specifically.

To make biometric authentication systems fully frictionless they should be able to run unobtrusively in the background, constantly monitoring the user context and natural human-computer interactions. Biometrics must help drive the shift from the current single shot authentication paradigm to a paradigm with continuous authentication. As a biometric modality that fits all use cases does not exist, the future biometric authentication system will most probably leverage many different biometric modalities and sensors [26]. Such system should be adaptive and deal with missing modalities, temporal discrepancies and decide when to perform step-up authentication at the penalty of user interaction [90].

Throughout this book chapter we’ve shown the elaborate attack surface of biometric authentication systems. The vast body of literature on presentation attack detection still does not describe the holy-grail. We are in need of anti-spoofing methods that generalize over different attack types and datasets [59]. Defenses that are successful against replay attacks and spoofing rely on challenge-response methodologies [41, 81] or correlate different biometric signals [80]. However, these techniques add more friction or are not universally applicable. Lastly, the use of deep learning introduces new threats such as deep fooling, back door attacks – where the model is learnt to react differently for pre-defined input patterns – and model stealing – where the expensive machine learning model is stolen. New research can focus on either showing the extent of the treats or coming up with novel detection strategies.

Biometric authentication keeps struggling with privacy threats [58]. More research is necessary to better understand the extent to which private attributes are leaked and how to limit this leakages without sacrificing recognition performance. Alternatively, future research could focus on the security threat for biometric authentication due to the sharing of biometric traits in other context, for example, medical data, fitness trackers, social media, online DNA tests, etc.