Elsevier

Applied Soft Computing

Volume 53, April 2017, Pages 168-180
Applied Soft Computing

Face detection and recognition in an unconstrained environment for mobile visual assistive system

https://doi.org/10.1016/j.asoc.2016.12.035Get rights and content

Highlights

  • We present a convolutional neural network based face detection and recognition system for a mobile application that can be used to aid the visually impaired.

  • A custom video dataset is created to evaluate the performance of the proposed system. This can also motivate further research in the area of unconstrained face recognition while the camera and subjects are in motion.

  • We present a generic framework for implementation of the system with smartphones and wearable devices.

  • The results are promising in adequate lighting conditions. The challenge lies in the recognition of faces in low-light conditions and reduction of false positives.

Abstract

We present a visual assistive system that features mobile face detection and recognition in an unconstrained environment from a mobile source using convolutional neural networks. The goal of the system is to effectively detect individuals that approach facing towards the person equipped with the system. We find that face detection and recognition becomes a very difficult task due to the movement of the user which causes camera shakes resulting in motion blur and noise in the input for the visual assistive system. Due to the shortage of related datasets, we create a dataset of videos captured from a mobile source that features motion blur and noise from camera shakes. This makes the application a very challenging aspect of face detection and recognition in unconstrained environments. The performance of the convolutional neural network is further compared with a cascade classifier. The results show promising performance in daylight and artificial lighting conditions while the challenges lie for moonlight conditions with the need for reduction of false positives in order to develop a robust system. We also provide a framework for implementation of the system with smartphones and wearable devices for video input and auditory notification from the system to guide the visually impaired.

Introduction

Computer vision algorithms execute some of the most computational intensive tasks in problems such as pattern recognition and motion analysis [1]. The simple object detection algorithms [2], [3] require a significant amount of computational power due to the amount of data that needs to be processed in large-scale applications. Modern desktop computers are able to execute these applications in real-time, howsoever, the challenge is for mobile applications to handle computationally intensive tasks that produce heat and rapidly consume battery power. Modern computers are able to execute these programs in real-time without any major issues whereas the challenge is for mobile applications due to limitations in battery power and heavy computation that creates heat. Mobile devices can harness the full power of real-world computer vision applications when applications are built taking into account the limitations that are faced by them [4].

Mobile object detection systems have a wide range of applications due to their portability [5], [6]. While detection of static objects in general is a relatively easier task, the detection of moving objects is more challenging [1]. Some of the examples of mobile object detection are assistive systems for disabled persons [6] and iris recognition systems [5]. The inclusion of motion in computer vision applications incorporates major difficulties which can include blur, constant scale and position changes, obstructions, and illumination changes [3]. Advanced detection methods such as neural networks are required to account for these challenges with the hope to achieve satisfactory performance [7], [8]. The SmartVision prototype [9] is an example of a mobile-based assistive system that provides navigation for disabled persons. It used a combination of computer vision, geographic information system and global positioning system for object, obstacle and path detection. Moreover, Willis et al. presented a mobile-based assistive system that allowed users to navigate an environment using a radio frequency Identification (RFID) tag grid [10]. This tag grid had RFID tags programmed with coordinates and descriptions of the surroundings for providing navigation to users. Furthermore, a mobile iris recognition system has been presented where the system provided pupil and iris segmentation with a detection rate of 99% [11].

Neural networks consist of interconnected processors called neurons which are loosely modelled after biological neurons [8]. Convolutional neural networks (CNNs) are specialised neural networks that are primarily designed for image recognition tasks [12]. Some of these include face detection, expression recognition, object detection and object recognition [13], [14], [15].

CNNs have been well suited for difficult problems that include recognition and detection [12] and can also be applied to large-scale video classification problems [16]. Howsoever, they have been mostly deployed for constrained and indoor vision applications that do not have problems of motion blur and noise which results from a moving camera. Therefore, the challenge is for them to be deployed for mobile devices. A cloud-based support system can be a solution to this problem of portability and computation power, however, good internet quality would be required for real-time implementation. Although mobile face detection and recognition has been getting popular [17], we gathered through the literature that there has not been much work done in the area of mobile face detection and recognition in unconstrained environments [18], [19]. Mobile face detection and recognition consists of detection and recognition from a mobile source on stationary subjects and moving subjects which leads to input that contains motion blur and noise.

This paper presents a visual assistive system that features mobile face detection and recognition in an unconstrained environment from a mobile source using CNNs. The goal of the system is to effectively detect and recognise individuals who approach facing towards the person equipped with the system. Due to the shortage of related datasets, we present a dataset of videos captured from a mobile source that features motion blur and noise in an unconstrained environment from the mobile camera. This makes the application a very challenging aspect of face detection and recognition in unconstrained environments. The performance of the detection and recognition problems are evaluated using CNNs and cascade classifiers in different lighting conditions which include artificial light, daylight and moonlight.

The proposed approach contributes to a larger system designed to aid visually impaired persons through mobile face detection and recognition. We also provide a framework for implementation of the system with smartphones and wearable devices for video input and auditory notification from the system. This paper extends previous work that focused on face detection with CNNs [20] and mobile application framework [57].

The rest of the paper is organised as follows. We present the background and related work in Section 2 and the proposed mobile visual assistive system in Section 3. Section 4 describes the experimental design and also presents the experiment results. Section 5 gives a discussion and Section 6 concludes the paper with directions for future work.

Section snippets

Face detection and recognition

Face detection and recognition are the processes of verifying faces in a given environment via computer vision algorithms that usually involve machine learning [15], [19]. Face recognition is performed in a wide range of conditions based on facial features, emerging technologies and learning algorithms [18], [21]. Some of these methods use emerging technologies such as infra-red camera [22] and involve three-dimensional face recognition systems [23]. Some of the related methods for this paper

Mobile visual assistive system

The proposed face detection and face recognition in an unconstrained environment is part of a mobile visual assistive system through a mobile application designed to assist visually impaired persons. We first describe the architecture of the system and their interaction and then provide their implementation details. We note that the major component is the intelligent systems module which can be implemented using either CNNs or cascade classifiers, depending on their performance from simulation

Simulation and results

We present simulation study of the proposed intelligent system module that features detection and recognition using CNNs. Cascade classifiers are used for further comparison of the results. We use the simulation study methodology and video dataset with wide range of conditions described in the previous section.

Discussion

In the experiments, CNNs were compared to cascade classifiers. The performance of the cascade classifier recognition system makes it suitable for deployment on mobile devices since it performs computation in a short amount of time and therefore uses less energy. This makes cascade classifiers suitable for use in scenarios where internet connectivity is not available due to weak mobile network signals, weather conditions, etc. In this case, the mobile application can use cascade classifier-based

Conclusions and future work

This paper presented a visual assistive system that features mobile face detection and recognition in an unconstrained environment from mobile source using CNNs. The system's intelligent systems module included a detection module for face detection and recognition module for face recognition. The performance of the modules were evaluated using CNNs and cascade classifiers in different lighting conditions which included artificial light, daylight and moonlight. A dataset of videos captured from

References (57)

  • K.-H. Pong et al.

    Multi-resolution feature fusion for face recognition

    Pattern Recognit.

    (2014)
  • J. Zhou et al.

    Moving vehicle detection for automatic traffic monitoring

    IEEE Trans. Veh. Technol.

    (2007)
  • A. Ess et al.

    Moving obstacle detection in highly dynamic scenes

  • Q. Xie et al.

    Dynamic thermal management in mobile devices considering the thermal coupling between battery and application processor

  • P. Viola et al.

    Rapid object detection using a boosted cascade of simple features

  • J.M.H. du Buf et al.

    The smartvision navigation prototype for blind users

    JDCTA Int. J. Dig. Content Technol. Appl.

    (2011)
  • S. Willis et al.

    RFID information grid for blind navigation and wayfinding

  • Y. LeCun et al.

    Convolutional networks and applications in vision

  • D.C. Cireşan et al.

    Flexible, high performance convolutional neural networks for image classification

  • Y. Sun et al.

    Deep convolutional network cascade for facial point detection

  • A. Karpathy et al.

    Large-scale video classification with convolutional neural networks

  • G.B. Huang et al.

    Labeled faces in the wild: A database for studying face recognition in unconstrained environments, Tech. rep., Technical Report 07-49

    (2007)
  • R. Jafri et al.

    A survey of face recognition techniques

    J. Inf. Process. Syst.

    (2009)
  • C. Zhang et al.

    A survey of recent advances in face detection, Tech. Rep. MSR-TR-2010-66

    (June 2010)
  • S. Chaudhry et al.

    Unconstrained Face Detection from a Mobile Source Using Convolutional Neural Networks

    (2016)
  • T. Ahonen et al.

    Face recognition with local binary patterns

  • H. Abdi et al.

    Principal component analysis

    Wiley Interdiscip. Rev.: Comput. Stat.

    (2010)
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Ann. Eugen.

    (1936)
  • Cited by (17)

    • A convolutional neural network model for abnormality diagnosis in a nuclear power plant

      2021, Applied Soft Computing
      Citation Excerpt :

      CNN is one of the derivatives of deep neural networks inspired by the workings of the visual processing system in the human brain, which only responds to its local receptive field [11]. The network has shown considerable success in image analysis tasks including facial recognition [12,13], handwritten character recognition [14], image semantic segmentation [15], and medical image classification [16,17]. The premise of the current research is that CNNs, with their ability to deal with image data, can be effective in handling the massive amount of plant data generated in real time and diagnosing abnormal events, if the data is properly converted into an image format.

    • Iris anti-spoofing through score-level fusion of handcrafted and data-driven features

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      However, due to uncertainty about the textural patterns for real and attack iris images, several distinct patterns belonging to the same class may probably be formed. Therefore, a deliberately designed handcrafted feature may be insufficient to handle all possible patterns [13,14]. On the other side, a wide variety of image-based methods typically incorporate iris segmentation as an essential step prior to engaging the local descriptors for feature extraction [15–18].

    • Deep convolutional neural network for object classification: Under constrained and unconstrained environments

      2020, Handbook of Research on Deep Learning-Based Image Analysis Under Constrained and Unconstrained Environments
    View all citing articles on Scopus
    View full text