Elsevier

Pattern Recognition Letters

Volume 107, 1 May 2018, Pages 17-24
Pattern Recognition Letters

CrowdFaceDB: Database and benchmarking for face verification in crowd

https://doi.org/10.1016/j.patrec.2017.12.028Get rights and content

Highlights

  • A large crowd video face dataset, termed as CrowdFaceDB, is prepared.

  • Ground truth annotations along with benchmarking results of popular face detection and verification algorithms is provided.

  • The results on CrowdFaceDB reflect the poor state of face detection and verification algorithms in unconstrained environments.

  • The results reflect the poor state of the art of face detection and verification algorithms in unconstrained environments.

  • An end-to-end evaluation package is provided to facilitate benchmarking for both face detection and recognition.

Abstract

Face recognition research has benefited from the availability of challenging face databases and benchmark results on popular databases show very high performance on single-person per image/video databases. However, in real world surveillance scenarios, the environment is unconstrained and the videos are likely to record multiple subjects within the field of view. In such crowd surveillance videos, both face detection and recognition are still considered as onerous tasks. One of the key factors for limited research in this direction is unavailability of benchmark databases. This paper presents CrowdFaceDB video face database that fills the gap in unconstrained face recognition for crowd surveillance. The two fold contributions are: (1) developing an unconstrained crowd video face database of over 250 subjects, and (2) creating a benchmark protocol and performing baseline experiments for both face detection and verification. The experimental results showcase the exigent nature of crowd surveillance and limitations of existing algorithms/systems.

Introduction

Research in face recognition started from cooperative individuals in constrained environment and is now attempting to address uncooperative individuals in unconstrained environment, such as surveillance. One of the key contributing factors in modern day face recognition algorithms is availability of challenging databases. Fig. 1 shows sample images from some of the most challenging video face databases currently being used for benchmarking. Most of these databases comprise videos where every frame contains one or two individuals performing certain actions in semi-controlled environments. However, the ultimate application of face recognition, i.e. surveillance in crowd, is significantly more challenging and requires the availability of databases where each frame contains multiple individuals with varying environments and actions. Current surveillance applications require face recognition algorithms to recognize face images in challenging crowd settings as shown in Fig. 2. Further, these applications require algorithms to handle variations due to low resolution, noise, pose, expressions, and illumination along with multiple subjects in a frame. The problem is exacerbated when both gallery and probe are captured in unconstrained conditions. Table 1 lists the publicly available face databases used for benchmarking video based face recognition algorithms. A brief summary of these databases is presented below:

  • 1.

    Face-In-Action (FIA) by Goh et al. [5]: FIA database is specially created for border-security-passport-checking application and contains videos that requires user cooperation. It includes 6470 videos covering total of 180 different subjects. However, in this dataset there is only one subject per video.

  • 2.

    Honda UCSD by Lee et al. [12]: Honda UCSD dataset serves the dual purpose of face tracking as well as face recognition. The dataset has been created in constrained manner and with user acquaintance.

  • 3.

    ChokePoint by Wong et al. [19]: ChokePoint [19] database is designed to deal with person identification/verification under real-world surveillance conditions using prevailing technologies. It has 48 videos pertaining to 54 subjects.

  • 4.

    YouTube Faces (YTF) by Wolf et al. [18]: The YTF dataset has been created with the primary purpose of studying face recognition in unconstrained environment. The YTF dataset contains 3425 videos of 1595 subjects. The dataset is composed of celebrity videos which are collected from YouTube with the constraint of only one subject present in a video.

  • 5.

    Point and Shoot Challenge (PaSC) by Beveridge et al. [2]: The PaSC database contains videos captured using hand-held and high definition devices. PaSC dataset encompasses 2802 videos of 265 subjects.

  • 6.

    SN-Flip by Barr et al. [1]: SN-Flip database was created with the requirement of having multiple subjects in one video sequence. It includes 28 videos of 190 subjects.

  • 7.

    IAPRA Janus Benchmark Datasets, IJB-A [10] and IJB-B [17] include face and non-face images and videos to facilitate face detection and recognition challenge. The IJB-B, which is superset of IJB-A, includes 7011 videos of 1845 subjects.

The extent of challenges present in some of these databases such as Honda UCSD have been addressed and 100% accuracy has been achieved. On the other hand, challenging databases such as YouTube [18] and Point and Shoot Challenge [2] are used to enhance the capabilities of modern algorithms. However, these databases do not help us understand the performance of current face recognition algorithms in unconstrained videos of crowd, i.e., two or more subjects in each video.

It is our assertion that there is a significant scope of improving the capabilities of face recognition performance in unconstrained environment, especially the crowd scenarios. Fig. 3 shows some of the challenges of crowd videos which make face detection and recognition tasks difficult. Specifically,

  • in unconstrained environment, it is not easy to get acceptable level of performance in face detection due to variations in illumination, pose, and occlusion

  • in low quality videos, it is difficult to differentiate between the face and the background; therefore, both face detection and recognition are challenging

  • activities performed by different individuals in a video can lead to occluded face images. Such occlusions make detection and recognition tasks difficult for automated processing, and

  • at times, it is possible that the subject is at a distance from the sensor and therefore, face area can be small. Such variations make detection as well as recognition very difficult.

To promote face detection and recognition (verification) research in challenging crowd scenarios, this paper presents CrowdFaceDB: an unconstrained video face database. The key contributions of this research are:

  • 1.

    CrowdFaceDB dataset that includes total of 385 crowd videos pertaining to 257 subjects. The database includes manually annotated facial landmark points for every frame which has one or more face images in it. Along with the videos and landmark points, a set of protocols and end-to-end MATLAB software package are designed to evaluate the performance of face verification algorithms on this dataset.

  • 2.

    Face detection baseline is provided by comparing the results of manual annotation and four publicly available codes: 1) Viola Jones [16] face detector (MATLAB open source), 2) HOG descriptor based C++ open source library dlib [9], 3) face detection aided by fiducial points [4], and 4) Faster R-CNN [14], [15].

  • 3.

    To establish the face verification baseline, results are reported with OpenBR [11], VGG-Face [13], and a commercial-off-the-shelf system, FaceVacs.

Section snippets

CrowdFaceDB for face verification in crowd

The proposed CrowdFaceDB dataset contains 385 videos (50,152 frames) of 257 subjects, captured at different locations and each video contains up to 14 subjects.2 The videos are recorded using handheld devices without mounting on any tripod or similar structure. Consent for collecting these videos is taken from all the subjects. Fig. 4 shows samples from the database and dataset statistics are summarized in

Evaluation protocol and package

In order to use the database and evaluate the performance of face detection and recognition (verification) algorithms, we have created independent evaluation packages for both of them. The package is designed to make the overall evaluation process convenient and user friendly. The database along with the evaluation package will be made available to the research community at http://iab-rubric.org/resources.html.

Face detection and recognition: baseline results

Along with the manually annotated ground-truth of both detection and recognition, baseline evaluation with automatic algorithms is also performed on the CrowdFaceDB dataset. The following existing face detection algorithms are used:

  • Viola Jones Face Detector [16]

  • Faster R-CNN face detector [14], [15]

  • Face detection aided by fiducial points [4]

  • Face detection based on Histogram of Oriented Gradient (HOG) [9]

For establishing the baseline face verification performance, the results of three matchers

Usage for CrowdFaceDB dataset

The proposed CrowdFaceDB dataset can be used for evaluating the performance of face detection, face recognition (verification and identification), and re-identification algorithms.

  • Face Detection: With the availability of manual annotations, the database can be used for evaluating the performance of face detection algorithms in crowd.

  • Face Verification and Identification: With predefined training-testing splits and protocols, the dataset can be useful for evaluating the frame-to-video,

Conclusion

Face recognition from video in unconstrained environment has attracted a lot of research interest due to its various applications. Multiple frames in a video provide temporal and intra-class variations that can be leveraged for efficient face recognition. While existing research has primarily focused on single-person-per-video, one of the key applications of face recognition is in crowd surveillance where multiple subjects appear in same frame/video. In order to instigate research in this

Acknowledgements

This research is supported through a grant from Ministry of Electronics and Information Technology, Government of India. T.I. Dhamecha was also partially supported through TCS PhD Fellowship. M. Vatsa and R. Singh are also partially supported by Infosys Center for Artificial Intelligence, IIIT-Delhi, India.

References (19)

  • M. Everingham et al.

    Taking the bite out of automated naming of characters in TV video

    Journal of Image and Vision Computing

    (2009)
  • K.C. Lee et al.

    Visual tracking and recognition using probabilistic appearance manifolds

    J. Comput. Vision Image Underst.

    (2005)
  • J.R. Barr et al.

    Active clustering with ensembles for social structure extraction

    IEEE Winter Conference on Applications of Computer Vision

    (2014)
  • J.R. Beveridge et al.

    The challenge of face recognition from digital point-and-shoot cameras

    IEEE International Conference on Biometrics: Theory, Applications and Systems

    (2013)
  • T.I. Dhamecha et al.

    Annotated crowd video face database

    2015 International Conference on Biometrics

    (2015)
  • R. Goh et al.

    The CMU face in action (FIA) database

    Analysis and Modelling of Faces and Gestures

    (2005)
  • G. Goswami et al.

    MDLFace : memorability augmented deep learning for video face recognition

    IEEE/IAPR International Joint Conference on Biometrics

    (2014)
  • G. Goswami et al.

    Face verification via learned representation on feature-rich video frames

    IEEE Trans. Inf. Forensics Secur.

    (2017)
  • V. Jain et al.

    FDDB: a Benchmark for Face Detection in Unconstrained Settings

    Technical Report, UM-CS-2010-009

    (2010)
There are more references available in the full text version of this article.

Cited by (5)

1

Equal contributions by student authors.

View full text