Fusion of biometric algorithms in the recognition problem

https://doi.org/10.1016/j.patrec.2004.09.021Get rights and content

Abstract

This note concerns the mathematical aspects of fusion for several biometric algorithms in the recognition or identification problem. It is assumed that a biometric signature is presented to a system which compares it with a database of signatures of known individuals (gallery). On the basis of this comparison, an algorithm produces the similarity scores of this probe to the signatures in the gallery, which are then ranked according to their similarity scores of the probe. The suggested procedures define several versions of aggregated rankings. An example from the Face Recognition Technology (FERET) program with four recognition algorithms is considered.

Introduction

This note concerns the mathematical aspects of a fusion for algorithms in the recognition or identification problem, where a biometric signature of an unknown person, also known as probe, is presented to a system. This probe is compared with a database of, say, N signatures of known individuals called the gallery. On the basis of this comparison, an algorithm produces the similarity scores of this probe to the signatures in the gallery, whose elements are then ranked according to their similarity scores of the probe. The top matches with the highest similarity scores are expected to contain the true identity.

A variety of commercially available biometric systems are now in existence; however, in many instances there is no universally accepted optimal algorithm. For this reason it is of interest to investigate possible aggregations of two or several different algorithms. See Xu et al. (1992), Ho et al. (1994), Lam and Suen (1995), Kittler et al. (1998), Jain et al. (2000) for a review of different schemes for combining multiple matchers. A common feature of many recognition algorithms is representation of a biometric signature as a point in a multidimensional vector space. The similarity scores are based on the distance between the gallery and the query (probe) signatures in that space (or their projections onto a subspace of a smaller dimension). Because of inherent commonality of the algorithms, the similarity scores and their resulting orderings of the gallery can be dependent for two different algorithms. For this reason traditional methods of combining different procedures, like classifiers in pattern recognition are not appropriate. Another reason for failures of popular methods like bagging and boosting (e.g. Schapire et al., 1998; Breiman, 2004) is that the gallery size is much larger than the number of algorithms involved. Indeed the majority voting methods used by these techniques (as well as in analysis of multi-candidate elections and social choice theory, Stern, 1993) are based on aggregated combined ranking of a fairly small number of candidates obtained from a large number of voters, judges or classifiers. The axiomatic approach to this fusion leads to the combinations of classical weighted means (or random dictatorship) (Marley, 1993).

As the exact nature of the similarity scores derivation is typically unknown, the use of nonparametric measures of association seems to be appropriate. The utility of such statistics such as rank correlation statistics, like Spearman’s rho or Kendall’s tau, for measuring the relationship between different face recognition algorithms, was reported by Rukhin et al. (2002). Rukhin and Osmoukhina (in press) employed the so-called copulas to study the dependence between different algorithms. They had shown that for common image recognition algorithms the strongest (positive) correlation between algorithms similarity scores happens for both large and small rankings. Thus, in all observed cases the algorithms behave somewhat similarly, not only by assigning the closest images in the gallery but also by deciding which gallery objects are most dissimilar to the given image exhibiting significant positive tail dependence. This finding is useful for construction of new procedures designed to combine several algorithms and also underlines the difficulty with a direct application of boosting techniques.

Notice that the methods of averaging or combining ranks can be applied to several biometric algorithms, one of which, say, is a face recognition algorithm, and another is a fingerprint (or gait, or ear) recognition device. Jain et al. (1999), and Snelick et al. (2003) discuss several experimental studies of multimodal biometrics, in particular, fusion techniques for face and fingerprint classifiers. They can be useful in a verification problem when a person presents a set of biometric signatures and claims that a particular identity belongs to these signatures.

The example considered in Section 4 comes from the Face Recognition Technology (FERET) program (Phillips et al., 2000) in which four recognition algorithms each produced rankings from galleries in three 1996 FERET datasets of facial images.

The authors are grateful to P. Grother and J. Phillips for these datasets.

Section snippets

Averaging of ranks via minimum distance

It is suggested to think of the action of an algorithm (its ranking) as a permutation π of N objects in the gallery. Thus π(i) is the rank given to the gallery element i; in particular, if π(i) = 1, then the item i is the closest image in the gallery to the given probe, i.e., its similarity score is the largest.

If the goal is to combine K independent algorithms whose actions πk, k = 1,  , K, can be considered as permutations of a gallery of size N, then the combined (average) ranking of observed

Linear aggregation

Since we have to estimate matrix C and numerical evaluation of (2) for large N can be difficult, one may look for a simpler aggregated algorithm.

Such an algorithm can be defined by the matrix P, which is a convex combination of the permutation matrices P1,  , PK, P=j=1KwjPj. The problem is that of assigning non-negative weights (probabilities) w1,  , wK, such that w1 +  + wK = 1, to matrices P1,  , PK. The fairness of all (dependent) algorithms can be interpreted as EPi = μ with the same “central” matrix μ

Example: FERET data

In order to evaluate the proposed fusion methods, four face-recognition algorithms were selected for aggregation (I: MIT, March 95; II: USC, March 97; III: MIT, Sept 96; IV: UMD, March 97). In accordance with the Face Recognition Technology (FERET) protocol, these algorithms were ran on three 1996 FERET datasets of facial images, dupII (D1), dupI training (D2), and dupI testing (D3) (Table 1), yielding similarity scores between gallery and probe images. These scores were used for training and

References (19)

  • L. Lam et al.

    Optimal combinations of pattern classifiers

    Pattern Recog. Lett.

    (1995)
  • M.S. Bazaraa et al.

    Linear Programming and Network Flows

    (1990)
  • L. Breiman

    Population theory for boosting ensembles

    Ann. Statist.

    (2004)
  • D.E. Critchlow

    Metric Methods for Analyzing Partially Ranked Data

    (1985)
  • P. Diaconis

    Group Representations in Probability and Statistics

    (1988)
  • T.K. Ho et al.

    Decision combination in multiple classifiers system

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1994)
  • A.K. Jain et al.

    Personal Identification in Networked Society

    (1999)
  • A.K. Jain et al.

    Statistical pattern recognition: A review

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • J. Kittler et al.

    On combining classifiers

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1998)
There are more references available in the full text version of this article.

Cited by (0)

View full text