On surveillance for safety critical events: In-vehicle video networks for predictive driver assistance systems

https://doi.org/10.1016/j.cviu.2014.10.003Get rights and content

Highlights

  • A distributed camera-sensor system for driver assistance and situational awareness.

  • Systematic, comparative evaluation of cues for prediction of safety critical events.

  • Real-time prediction of overtaking and braking maneuvers.

  • Detailed temporal analysis of the utility of various cues for maneuver prediction.

  • Early prediction (1–2 s) before the maneuver is shown on real-world data.

Abstract

We study techniques for monitoring and understanding real-world human activities, in particular of drivers, from distributed vision sensors. Real-time and early prediction of maneuvers is emphasized, specifically overtake and brake events. Study this particular domain is motivated by the fact that early knowledge of driver behavior, in concert with the dynamics of the vehicle and surrounding agents, can help to recognize dangerous situations. Furthermore, it can assist in developing effective warning and driver assistance systems. Multiple perspectives and modalities are captured and fused in order to achieve a comprehensive representation of the scene. Temporal activities are learned from a multi-camera head pose estimation module, hand and foot tracking, ego-vehicle parameters, lane and road geometry analysis, and surround vehicle trajectories. The system is evaluated on a challenging dataset of naturalistic driving in real-world settings.

Introduction

Distributed camera and sensor networks are needed for studying and monitoring agent activities in many domains of application [1]. Algorithms that reason over the multiple perspectives and fuse information have been developed with applications to outdoor or indoor surveillance [2]. In this work, multiple real-time systems are integrated in order to obtain temporal activity classification of video from a vehicular platform. The problem is related to other applications of video event recognition, as it requires a meaningful representation of the scene. Specifically, event definition and techniques for temporal representation, segmentation, and multi-modal fusion will be studied. These will be done with an emphasis on speed and reliability, which are necessary for the real-world, challenging application of preventing car accidents and making driving and roads safer. Furthermore, in the process of studying the usability and discriminative power of each of different cues, we gain insight into the underlying processes of driver behavior.

In 2012 alone, 33,561 people died in motor vehicle traffic crashes in the United States [3]. A majority of such accidents occurred due to an inappropriate maneuver or a distracted driver. In this work, we propose a real-time holistic framework for on-road analysis of driver behavior in naturalistic settings. Knowledge of the surround and vehicle dynamics, as well as the driver’s state will allow the development of more efficient driver assistance systems. As a case study, we look into two specific maneuvers in order to evaluate the proposed framework. First, overtaking maneuvers will be studied. Lateral control maneuvers such as overtaking and lane changing represent a significant portion of the total accidents each year. Between 2004 and 2008, 336,000 such crashes occurred in the US [4]. Most of these occurred on a straight road at daylight, and most of the contribution factors were driver related (i.e. due to distraction or inappropriate decision making). Second, we look at braking events, which are associated with longitudinal control and their study also plays a key role in preventing accidents. Early recognition of dangerous events can aid in the development of effective warning systems. In this work we emphasize that the system must be extremely robust in order to: (1) engage only when it is needed by maintaining a low rate of false alarm rate, (2) function at a high true positive rate so that critical events, as rare as they may be, are not missed. In order to understand what the driver intends to do, a wide range of vision and vehicle sensors are employed to develop techniques that can satisfy real-world requirements.

The requirement for robustness and real-time performance motivates us to study feature representation as well as techniques for recognition of temporal events. The study will focus on three main components: the vehicle, the driver, and the surround. The implications of this study are numerous. In addition to early warning systems, knowledge of the state of driver allows for customization of the system to the driver’s needs, thereby mitigating further distraction caused by the system and easing user acceptance. On the contrary, a system which is not aware of the driver may cause annoyance. Additionally, under a dangerous situation (e.g. overtaking without turning on the blinker), a warning could be conveyed to other approaching vehicles. For instance the blinker may be turned on automatically.

Our goal is defined as follows: The prediction and early detection of overtaking and braking intent and maneuvers using driver, vehicle, and surround information.

In the vehicle domain, a few hundred milliseconds could signify an abnormal or dangerous event. To that end, we aim to model every piece of information suggesting an upcoming maneuver. In order detect head motion patterns associated with visual scanning [5], [6], [7] under settings of occlusion and large head motion, a two camera system for head tracking is employed. Subtle preparatory motion is studied using two additional cameras monitoring hand and foot motion. In addition to head, hand, and foot gesture analysis, sensors measuring vehicle parameters and surrounding vehicles are employed (Fig. 1). A gray-scale camera is placed in order to observe lane markings and road geometry, and a 360° color camera on top of the vehicle allows for panoramic analysis. Because visual challenges that are encountered in different surveillance domains, such as large illumination changes and occlusion, are common in our data, the action analysis modules studied in this work are generalizable to other domains of application as well.

We first perform a review of related literature in Section 2, while making a case for holistic understanding of multi-sensory fusion for the purpose of driver understanding and prediction. Event definition and testbed setup will be discussed in Sections 8 Experimental evaluation, 4 Instrumented mobile testbed and dataset, respectively. The different signals and feature extraction modules are detailed in Section 5. Two temporal modeling approaches for maneuver representation and fusion will be discussed in Section 6, and the experimental evaluation (Section 8) demonstrates analysis of different cues and modeling techniques in terms of their predictive power.

Section snippets

Related research studies

In our specific application, prediction involves recognition of distinct temporal cues not found in the large, ‘normal’ driving class. Related research may fall into three categories, which are roughly aligned with different temporal segments of the maneuver: trajectory estimation, inference, and intent prediction,with the first being the most common. In trajectory estimation, the driver is usually not observed, but IMU, GPS [8] and maps [9], vehicle dynamics [10], and surround sensors [11]

Event definition

Commonly, a lane change event or an overtake event (which includes a lane-change) are defined to begin at the lane marker crossing. On the contrary, in this work the beginning of an overtake event is defined earlier when the lateral motion started. We note that there are additional ways to define a maneuver such as an overtake or a lane-change (see [7]), and that our event start definition occurs significantly earlier than in many of the related research studies. For instance, techniques

Instrumented mobile testbed and dataset

A uniquely instrumented testbed vehicle is used in order to holistically capture the dynamics of the scene: the vehicle dynamics, a panoramic view of the surround, and the driver. Built on a 2011 Audi A8, the automotive testbed is outfitted with extensive auxiliary sensing for the research and development of advanced driver assistance technologies. Fig. 1 shows a visualization of the sensor array, consisting of vision, radar, lidar, and vehicle (CAN) data. The goal of the testbed buildup is to

Maneuver representation

In this section we detail the vision modules used in order to extract useful signals for analysis of activities.

Temporal modeling

A model for the signals extracted by the modules in Section 5 must address several challenges. First, signal structure must be captured efficiently in order to produce a good modeling of maneuvers. Second, the role of different modalities should be studied with an appropriate fusion technique. Two types of modeling schemes are studied in this work, one using a Conditional Random Field (CRF) [36] and the other using Multiple Kernel Learning (MKL) [37]. The limitations and advantageous of these

Experimental setup

Several experiments are conducted in order to test the proposed framework for recognition of intent and prediction of maneuvers. As mentioned in Section 3, we experiment with two definitions for the beginning of an overtake event. An overtake event may be marked when the vehicle crossed the lane marking or when the lateral movement began. These are referred to as overtake-late and overtake-early, respectively. Normal driving is defined as events when the brake pedal was not engaged and no

Experimental evaluation

Temporal modeling: The first set of evaluations is concerned with comparison among the choices for the temporal features and temporal modeling. Each cue is first modeled independently in order to study its predictive power. The results for LDCRF and MKL under experiment 1a, overtake-late/brake are shown in Fig. 9 for raw trajectory features. LDCRF demonstrates better predictive power using each modality independently when compared to MKL. For instance, lane information provides better

Concluding remarks

In this work, a surveillance application of driver assistance was studied. Automotive driver assistance systems must perform under time-critical constraints, where even tens of milliseconds are essential. A holistic and comprehensive understanding of the driver’s intentions can help in gaining crucial time and save lives. Prediction of human activities was studied using information fusion from an array of sensors in order to fully capture the development of complex temporal interdependencies in

Acknowledgments

The authors would like to thank the reviewers and editors for their helpful comments. The authors gratefully acknowledge sponsorship of the UC Discovery Program and associated industry partners including Audi, Volkswagen Electronics Research Laboratory, and Toyota Motors. Support of colleagues from the UCSD Laboratory for Intelligent and Safe Automobiles is also appreciated.

References (40)

  • R. Simmons, B. Browning, Y. Zhang, V. Sadekar, Learning to predict driver route and destination intent, in: IEEE Conf....
  • S. Lefèvre, C. Laugier, J. Ibanez-Guzmán, Exploiting map information for driver intention estimation at road...
  • M. Liebner, M. Baumann, F. Klanner, C. Stiller, Driver intent inference at urban intersections using the intelligent...
  • M. Ortiz, F. Kummert, J. Schmudderich, Prediction of driver behavior on a limited sensory setting, in: IEEE Conf....
  • V. Gadepally et al.

    A Framework for Estimating Driver Decisions Near Intersections

    IEEE Trans. Intell. Transp. Syst.

    (2014)
  • S. Lefèvre and J. Ibañez-Guzmán and C. Laugier, IEEE Symposium on Comp. Intell. Veh. Transp. Syst....
  • A. Doshi, M.M. Trivedi, Tactical driver behavior prediction and intent inference: a review, in: IEEE Conf. Intelligent...
  • A. Doshi et al.

    On-road prediction of driver’s intent with multimodal sensory cues

    IEEE Pervasive Comput.

    (2011)
  • M.-I. Toma et al.

    Determining car driver interaction intent through analysis of behavior patterns

  • S. Haufe et al.

    EEG potentials predict upcoming emergency brakings during simulated driving

    J. Neural Eng.

    (2011)
  • Cited by (71)

    • Investigation of transfer learning for image classification and impact on training sample size

      2021, Chemometrics and Intelligent Laboratory Systems
      Citation Excerpt :

      For instance, they have been applied in the smart city plan to manage vehicles in the urban area [10]. Video-based security systems also adopted the novel DL methods to improve their surveillance efficiency, which presents a lot of good case studies, such as falling detection for the elders [11], drone-based crowd surveillance [12], and vision-based driver assistance systems [13]. In chemical industries, Zhu et al. [14] proposed an infrared image-based monitoring system for an ethylene pyrolysis reactor using image segmentation techniques.

    • Human activity recognition using improved complete ensemble EMD with adaptive noise and long short-term memory neural networks

      2020, Biocybernetics and Biomedical Engineering
      Citation Excerpt :

      However, the technological development and the high availability of resources have negatively influenced our daily activities, favoring a sedentary lifestyle, bad postures, and an unbalanced diet. Automatically identifying and monitoring the activities we carry out have been of increasing interest given the large number of applications that can be conceived, for example, fall detection [1], recognition of sedentary behavior [2], video surveillance [3], comfort in smart homes [4], and for driver assistance and situational awareness in intelligent vehicles [5]. The complexity and diversity of activities that humans can perform and the high dimensionality of the data collected make recognizing human activities a challenging but promising task [6].

    View all citing articles on Scopus
    View full text