Elsevier

Pattern Recognition

Volume 43, Issue 10, October 2010, Pages 3660-3673
Pattern Recognition

Context-aware fusion: A case study on fusion of gait and face for human identification in video

https://doi.org/10.1016/j.patcog.2010.04.012Get rights and content

Abstract

Most work on multi-biometric fusion is based on static fusion rules. One prominent limitation of static fusion is that it cannot respond to the changes of the environment or the individual users. This paper proposes context-aware multi-biometric fusion, which can dynamically adapt the fusion rules to the real-time context. As a typical application, the context-aware fusion of gait and face for human identification in video is investigated. Two significant context factors that may affect the relationship between gait and face in the fusion are considered, i.e., view angle and subject-to-camera distance. Fusion methods adaptable to these two factors based on either prior knowledge or machine learning are proposed and tested. Experimental results show that the context-aware fusion methods perform significantly better than not only the individual biometric traits, but also those widely adopted static fusion rules including SUM, PRODUCT, MIN, and MAX. Moreover, context-aware fusion based on machine learning shows superiority over that based on prior knowledge.

Introduction

Biometrics, known as the study of methods for uniquely recognizing humans based upon one or more intrinsic physical or behavioral traits, has become an active research area as well as a widely adopted technique in many real applications. Typical biometric traits include face, gait, fingerprint, iris, voice, vein, hand geometry, etc. Most biometric systems currently deployed in real applications rely on a single source of information, which are called unimodal biometric systems. Such systems often suffer from practical problems like noisy sensor data, non-universality and/or lack of distinctiveness of the biometric trait, unacceptable error rates, and spoof attacks [1]. To solve these problems, multimodal biometric systems are proposed by combining evidences from different sources [2]. These sources might be multiple sensors [3], multiple classification algorithms [4] or multiple instances [5] for the same biometric trait, or directly from multiple biometric traits [6], [7]. Among them, the systems based on multiple biometric traits, i.e., multi-biometric fusion, are generally believed to be more robust than others [8] and thus become the main form of multimodal biometric systems.

Up to the present, most work on multi-biometric fusion is static fusion, i.e., the fusion rules are predefined and remain fixed when the system is running. However, in reality, the reliability of a biometric trait might vary with the changes of context. Dey [9] defines context as “any information that can be used to characterize the situation of entities”. Human factors and physical environment are regarded as the two most important aspects of context [10]. Examples of typical biometric traits and common influential context factors are tabulated in Table 1. As can be seen, the reliability of each biometric trait varies depending on certain context factors. If these traits are to be combined, then the relationship among them in the fusion should accordingly change.

However, static fusion cannot adapt to the changing environment and individual users, which might make multi-biometric systems unstable, unreliable, or even fail to work in real applications. While adaptive information fusion has recently attracted much attention in several areas, little work has been done for context-aware multi-biometric fusion. Some recent work on quality-based multimodal biometrics [11], [12], [13], [14], [15], [16], [17], [18] can be viewed as the first few attempts toward context-aware multi-biometric fusion since the differences in data quality are usually caused by external context factors, such as sensor quality, illumination condition, background noise, etc. In quality-based fusion, a quality assessment algorithm is necessary to calculate a quality score. The assessment usually focuses on the biometric samples themselves, using quality measures directly calculated from the data, such as the signal-to-noise-ratio [14], [11] and the high frequency components of Discrete Cosine Transformation [17]. However, at least at present, a single quality assessment algorithm dealing with all influential context factors is still unrealistic. While we can regard data quality measures as a proxy for knowledge about some external factors, there are certain advantages to trace back to the source context factors of the variability, which include:

  • 1.

    Quality-based fusion may suffer reluctant responding to new oscillations which are not defined by the quality measures. Adopting the ‘divide and conquer’ strategy by dealing with the source context factors individually could make the issue much more manageable than trying to pool all variability into a single quality score.

  • 2.

    Additional devices, such as a laser distance sensor, can be used to detect the variation of context, which are usually much more reliable than quality assessment algorithms.

  • 3.

    The combination of context factors could be much more complex than a single quality score, such as the context-aware fusion based on neural network, which will be proposed in Section 3.3.2.

Moreover, context means more than those factors determining quality. According to a draft of the INCITS Biometric Sample Quality Standard [19], quality of a biometric sample has three components, namely character of the source, fidelity of the sample to the source, and utility of the sample within a biometric system, which can all be reflected by the accuracy of the biometric system. The basic rule behind quality-based fusion is to give more weight to the more accurate (higher quality) biometric in the fusion. However, a good fusion strategy is not only influenced by the accuracy of individual biometric traits. According to the classifier ensemble theory [20], a good ensemble should be the combination of accurate and diverse classifiers. Multi-biometric fusion certainly is a special case of classifier ensemble. Thus in addition to accuracy (quality) of individual biometric, the diversity of different biometrics should be considered as well. For example, in the fusion of left-hand fingerprint, right-hand fingerprint and face, usually the former two fingerprint biometrics are more accurate than the face as individual traits. But the similarity between the two fingerprints might prevent them from compensating each other. Considering diversity, the best fusion strategy might be giving higher weights to one fingerprint and the face, while assigning lower weight to the other fingerprint. Since both accuracy and diversity are considered in context-aware fusion while only accuracy is considered in quality-based fusion, the latter can be viewed as part of the former.

To provide more general solutions, a comprehensive framework of context-aware multi-biometric fusion is proposed in this paper. As a typical application, the context-aware fusion of gait and face in video for human identification is investigated. The application scenario is an intelligent identification system deployed in home or workplaces, which can automatically identify the people in a watch list. Both gait and face are unobtrusive biometric traits and can be simultaneously obtained by most video surveillance systems. The main context factors that affect the relationship between gait and face in the fusion are view angle and distance from the subject to the camera. Usually side view is the best view angle for gait recognition because more motion characteristics can be captured from this angle, while face recognition prefers frontal view because the whole face presents at this angle. Moreover, gait recognition is not very sensitive to subject-to-camera distance (but when subject is too far way, the motion characteristics might partly lose due to low resolution), while face recognition performs better when the subject is close to the camera because face images of higher resolution can be obtained. Thus when view angle or subject-to-camera distance changes, the relative importance of gait and face in the fusion should accordingly change. Methods of incorporating these two context factors into the fusion process with different degrees of freedom, either based on prior knowledge or machine learning, are proposed and compared with conventional static fusion rules, such as SUM, PRODUCT, MIN, and MAX. Note that there are other context factors which might affect the accuracy of gait recognition and face recognition, such as illumination. But the influence of illumination on both biometrics is similar, i.e., both gait and face perform better in good illumination and worse in poor illumination. Thus although illumination variation might affect the accuracy of individual biometrics (and consequently the accuracy of the fusion), it does not apparently change the relationship between them in the fusion. Since the focus of this paper is the influence of context factors on the relationship between modalities in the fusion, the context factors like illumination are not considered in the case of fusion of gait and face.

The rest of this paper is organized as follows. The framework of context-aware multi-biometric fusion is proposed in Section 2. Then the context-aware fusion of gait and face is investigated in Section 3. Experiments are reported in Section 4. Finally, conclusions are drawn in Section 5.

Section snippets

Multi-biometric fusion adaptable to context

Suppose B consists of all information about individual biometric traits used in a multi-biometric system, then static fusion can be expressed by the functionF=fs(B),where F is the fused information, and the function fs defines how the components in B are combined. The real implements of B and F depend on the fusion level. For instance, in feature level fusion, B consists of all biometric feature vectors of one person, and F is the fused feature vector of this person. In score level fusion, B

Context-aware fusion of gait and face

As a typical application of context-aware multi-biometric fusion, the adaptive fusion of gait and face in video is investigated in this section. Both gait and face can be extracted from the same source, i.e., the video images, without need of additional sensors. This makes it possible to study multi-biometric fusion based on certain databases for gait recognition, rather than the rare multi-biometric databases, or the virtual ‘chimeric persons’. The data used in this paper are the Dataset A and

Datasets

The data used in the experiment is the Dataset A (the former NLPR Gait Database [37]) and a subset of Dataset B in the CASIA Gait Database [21]. There are 20 different subjects in Dataset A. All the videos are captured in an outdoor environment. Each subject walks along a straight-line path back and forth twice with three different angles between the path line and the image plane: lateral (0°), oblique (45°), and front/back (90°). Using the convention shown in Fig. 4, the view angles θ included

Conclusion and discussion

This paper proposes context-aware multi-biometric fusion. Up to the present, most existing work on multi-biometric fusion is based on static fusion. On the contrary, context-aware fusion can perceive the changes of the external factors and dynamically adapt the fusion rule to those changes. To illustrate the advantages of context-aware fusion, the fusion of gait and face in video which is adaptable to view angle and subject-to-camera distance is investigated. Several context-aware fusion

Acknowledgments

This work was partially supported by the Australian Research Council Discovery Grant (DP0987421), the National Science Foundation of China (60905031), and the Jiangsu Science Foundation (BK2009269). The authors would like to thank the associate editor, Professor John Illingworth, and the anonymous reviewers for their comments and suggestions which greatly improved this paper.

About the Author—XIN GENG received the B.Sc. (2001) and M.Sc. (2004) degrees in computer science from Nanjing University, China, and the Ph.D. (2008) degree in computer science from Deakin University, Australia. His research interests include computer vision, pattern recognition, and machine learning. He has published over 25 refereed papers in these areas, including those published in prestigious journals and top international conferences. He has been a guest editor of several international

References (47)

  • A. Ross, A.K. Jain, Multimodal biometrics: an overview, in: Proceedings of the European Signal Processing Conference,...
  • R. Brunelli et al.

    Person identification using multiple cues

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1995)
  • L. Hong et al.

    Integrating faces and fingerprints for personal identification

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1998)
  • A.K. Dey

    Understanding and using context

    Pers. Ubiquitous Comput.

    (2001)
  • S. Chu, M. Yeung, L. Liang, X. Liu, Environment-adaptive multi-channel biometrics, in: Proceedings of the International...
  • J. Fiérrez-Aguilar, Y. Chen, J. Ortega-Garcia, A.K. Jain, Incorporating image quality in multi-algorithm fingerprint...
  • H.P.-S. Hui, H.M. Meng, M.-W. Mak, Adaptive weight estimation in multi-biometric verification using fuzzy decision...
  • O. Fatukasi, J. Kittler, N. Poh, Quality controlled multimodal fusion of biometric experts, in: Proceedings of the 12th...
  • K. Nandakumar, Y. Chen, A.K. Jain, S.C. Dass, Quality-based score level fusion in multibiometric systems, in:...
  • U. Park, A.K. Jain, A. Ross, Face recognition in video: adaptive fusion of multiple matchers, in: Proceedings of the...
  • N. Poh, S. Bengio, Improving fusion with margin-derived confidence in biometric authentication tasks, in: Proceedings...
  • INCITS Project 1672-D, Biometric sample quality standard draft (revision 4), Technical Report INCITS/M1/06-0948,...
  • A. Krogh, J. Vedelsby, Neural network ensembles, cross validation, and active learning, in: G. Tesauro, D.S. Touretzky,...
  • Cited by (34)

    • Biometric recognition by gait: A survey of modalities and features

      2018, Computer Vision and Image Understanding
      Citation Excerpt :

      Additional investigation is warranted, especially across multiple sensory modalities to determine which is least vulnerable to obfuscation and spoofing whilst being robust to covariates. There are a number of examples of combining gait features from a single modality with other biometrics (e.g., Geng et al., 2010; Kimura et al., 2014; Muramatsu et al., 2013; Shakhnarovich et al., 2001; Vera-Rodríguez et al., 2012; Vildjiounaite et al., 2007; Zhou et al., 2007). Depending on the application (e.g., smartphone security), a particular combination of biometrics (e.g., face and gait from accelerometry) may be advantageous.

    • QFuse: Online learning framework for adaptive biometric system

      2015, Pattern Recognition
      Citation Excerpt :

      Traditional multi-biometric systems work on static fusion rules which may not adapt itself to the dynamically changing environment and thus degrade the performance as the environment changes. Geng et al. [13] proposed a context aware fusion scheme that takes into account the viewing angle and distance of the subject from the camera to select an appropriate fusion scheme for improved performance. Abaza and Ross [14] proposed including image quality in the fusion scheme to enhance the performance in the presence of weak matchers or low quality input images.

    • An adaptive bimodal recognition framework using sparse coding for face and ear

      2015, Pattern Recognition Letters
      Citation Excerpt :

      Particularly, when one modality confronts severe data degeneration, multimodal system may perform worse than the unimodal system using the other modality. Given the relative independence between the face and the ear, the key to robust multimodal biometric is in effective biometric quality-based adaptive fusion, which is necessary to assign lower weight to the less reliable modality while assign higher weight to the good one [1,9]. Biometric quality-based fusion involves biometric quality assessment and dynamic weight selection.

    View all citing articles on Scopus

    About the Author—XIN GENG received the B.Sc. (2001) and M.Sc. (2004) degrees in computer science from Nanjing University, China, and the Ph.D. (2008) degree in computer science from Deakin University, Australia. His research interests include computer vision, pattern recognition, and machine learning. He has published over 25 refereed papers in these areas, including those published in prestigious journals and top international conferences. He has been a guest editor of several international journals, such as PRL and IJPRAI. He has served as a program committee member for a number of international conferences. He is also a frequent reviewer for various international journals and conferences.

    About the Author—KATE SMITH-MILES is a professor and head of the School of Mathematical Sciences at Monash University, Australia. She has held Chairs in three disciplines—Mathematical Sciences, Information Technology, and Engineering—and is involved in many cross-disciplinary research projects. Kate obtained a B.Sc. (Hons) in mathematics and a Ph.D. in electrical engineering, both from the University of Melbourne, Australia. She has published two books on neural networks and data mining applications, and over 180 refereed journal and international conference papers in the areas of neural networks, combinatorial optimization, intelligent systems and data mining. She is on the editorial board of several international journals including IEEE Transactions on Neural Networks, and has been involved in organizing numerous international conferences in the areas of data mining, neural networks, and optimization. She is a frequent reviewer of international research activities including grant applications in Canada, UK, Finland, Hong Kong, Singapore and Australia, refereeing for international research journals, and Ph.D. examinations. From 2007 to 2008 she was Chair of the IEEE Technical Committee on Data Mining (IEEE Computational Intelligence Society). In addition to her academic activities, she also regularly acts as a consultant to industry in the areas of optimization, data mining, and intelligent systems.

    About the Author—LIANG WANG received the Ph.D. degree from the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CAS), China, in 2004. Currently, he is a lecturer with the Department of Computer Science, University of Bath, UK.

    His major research interests include machine learning, pattern recognition, computer vision, multimedia processing, and data mining. He has widely published at highly ranked international journals such as IEEE TPAMI, IEEE TIP, IEEE TKDE, IEEE TCSVT, IEEE TSMC, CVIU, and PR, and leading international conferences such as CVPR, ICCV and ICDM. He has obtained several honors and awards such as the Special Prize of the Presidential Scholarship of Chinese Academy of Sciences and the Research Commendation from University of Melbourne in recognition of Excellent Research. He is currently a senior member of IEEE (Institute of Electrical and Electronics Engineers) as well as a member of IEEE Computer Society, IEEE Communications Society and BMVA (British Machine Vision Association).

    He is serving with more than 20 major international journals and more than 40 major international conferences and workshops. He is an associate editor of IEEE Transactions on Systems, Man and Cybernetics—Part B, International Journal of Image and Graphics (WorldSci), International Journal of Signal Processing (Elsevier), Neurocomputing (Elsevier), and International Journal of Cognitive Biometrics (Inderscience). He is a leading guest editor of three special issues appearing in PRL (Pattern Recognition Letters), IJPRAI (International Journal of Pattern Recognition and Artificial Intelligence) and IEEE TSMC-B as well as a co-editor of five edited books. He has also co-chaired one invited special session and five international workshops.

    About the Author—MING LI received his B.Sc. (1987), M.Sc. (1990) and Ph.D. (2004) from Peking University (China), Chinese Academy of Science (China), and University of Technology, Sydney (Australia). He is a lecturer of School of Information Technology at Deakin University (Australia). He has (co)authored more than 30 peer-review technical publications, and technical reports. His research areas include pattern recognition, routing in mobile ad hoc networks, delay-tolerant networks, and distributed computing, etc.

    About the Author—QIANG WU received the B.Eng. and M.Eng. degrees in electronic engineering from Harbin Institute of Technology, China, in 1996 and 1998, and the Ph.D. degree in computing science from University of Technology, Sydney, Australia, in 2004. In 2003, he joined in the School of Computing and Communications, University of Technology, Sydney (UTS), Australia, where he is currently a senior lecturer. He has been a member of iNEXT research centre in UTS since 2007. He has held visiting appointments in Japan and China and was on the organizing committees of numerous international workshops and conferences. He was a guest editor of Pattern Recognition Letters for a special issue on Image/Video-Based Pattern Analysis and HCI Applications. He also served as a reviewer for several international journals such as IEEE TSMC-B, PRL, IJPR, and EURASIP Journal on Image and Video Processing. He is a member of the IEEE.

    View full text