Elsevier

Image and Vision Computing

Volume 32, Issue 12, December 2014, Pages 1102-1116
Image and Vision Computing

Review article
Dynamic scene understanding using temporal association rules

https://doi.org/10.1016/j.imavis.2014.08.010Get rights and content

Highlights

  • Uses temporal mining technique event recognition in dynamic scenes

  • Temporal association rules are then generated from frequent patterns. These association rules help model the sequence cycle.

  • Spatio-temporal anomalies are identified and detected in a hierarchical manner.

Abstract

The basic goal of scene understanding is to organize the video into sets of events and to find the associated temporal dependencies. Such systems aim to automatically interpret activities in the scene, as well as detect unusual events that could be of particular interest, such as traffic violations and unauthorized entry. The objective of this work, therefore, is to learn behaviors of multi-agent actions and interactions in a semi-supervised manner. Using tracked object trajectories, we organize similar motion trajectories into clusters using the spectral clustering technique. This set of clusters depicts the different paths/routes, i.e., the distinct events taking place at various locations in the scene. A temporal mining algorithm is used to mine interval-based frequent temporal patterns occurring in the scene. A temporal pattern indicates a set of events that are linked based on their relationship with other events in the set, and we use Allen's interval-based temporal logic to describe these relations. The resulting frequent patterns are used to generate temporal association rules, which convey the semantic information contained in the scene. Our overall aim is to generate rules that govern the dynamics of the scene and perform anomaly detection. We apply the proposed approach on two publicly available complex traffic datasets and demonstrate considerable improvements over the existing techniques.

Introduction

In visual surveillance, there has been an increasing interest in recognizing object behaviors, by interpreting high-level semantics of scene dynamics. However, computing relationships between different actions in the scene or detecting rare events in an ocean of video data is a daunting task. Analyzing event interactions manually is practically impossible, and is solely dependent on human operators. In addition, as the scene gets crowded, the complexity of the relationships between the agents increases as well. Even though it has become an active research area, it is still a complex problem with a lot of constraints, and an unsupervised method is required to make the task easier. An elegant solution to this problem can open doors to a wide spectrum of applications, such as video surveillance [1], anomaly detection [2], and crowd analysis [3].

Typically, the input to a dynamic scene analysis system is a video, and the first task is to detect moving objects and record their motion characteristics, in the form of object trajectories (or optical flows). Each trajectory denotes an individual event in the scene during a time interval. This step is generally followed by behavior or activity segmentation, which identifies semantically meaningful components and groupings to reveal different events. Traditionally, algorithms such as K-means and fuzzy clustering have been used extensively, while many recent works have explored spectral clustering and normalized cuts [4]. The resulting clusters model the various events, indicating the spatial layout of the scene. Finally, the last step is to learn the temporal scene behavior. Behavior in our context explains the way an object acts in relation to the other objects in the scene. It can be defined as a sequence of events with spatial and temporal constraints. Recently, probabilistic methods such as Dynamic Bayesian Networks (DBN) [5], Hidden Markov Models (HMM) [6], and Probabilistic Topic Models (PTM) [2], have been used extensively by the computer vision community to learn the scene dynamics.

The dynamic scene understanding problem can be expressed as: obtain the motion patterns in the scene, build the scene structure and lastly, interpret the high-level semantics of the scene. A dynamic scene may also involve multiple agents interacting with one another, and the actions may occur in parallel with one another or recur over time. Thus, we are interested in answering questions such as: what is happening in the scene, where the objects are located and how they interact within their environment. In this work, we aim at developing a robust system that can learn the scene model with minimal human intervention. In this regard, video mining can help extract salient information from a video without such supervision [7]. In order to analyze and discover the temporal interdependencies and relationships between various events occurring in a scene, we make use of temporal mining algorithms. These relationships between events are modeled as temporal patterns, discovered using a frequent temporal pattern mining algorithm. A frequent temporal pattern can be defined as a set of composite events that occur repetitively in the video, and are expressed using temporal relations in Allen's taxonomy [8], such as before, after, and meet. Once these frequent patterns are obtained, forward temporal association rules are generated. These rules capture the correlations between the frequent temporal patterns present in the video.

We define an anomaly as an atypical behavioral pattern based entirely on the model in context, thus every scene can have a different set of anomalies. In this work, anomaly detection is performed in a hierarchical manner. First, we identify unusual events within a spatial context. These spatial anomalies can be found once unique event clusters are identified. The second type of anomalous behavior can be found by using frequent temporal patterns (and their time duration) to discriminate between the usual and the unusual complex composite events.

Our goal is to extract complex activity patterns in a multi-agent environment. This is not trivial, as in most real-world scenarios, the underlying dynamic scene behavior is very complex and perhaps ambiguous, making high-level activity interpretation a challenge. Most of the existing techniques employ various probabilistic models, however, the learning and inference in such methods is computationally prohibitive. Moreover, as the scene gets crowded, the complexity of the relationships increases, and this necessitates a huge amount of training data for accurate analysis. Therefore, in this work we have proposed to learn the scene dynamics using temporal mining techniques. The frequent pattern discovery algorithm utilized in this work has an exploratory nature of operation. In addition, pattern matching allows for accurate and efficient anomaly detection.

  • To the best of our knowledge, temporal mining techniques have not been used for event recognition in dynamic scenes. We discover frequent temporal patterns using [9] to learn the scene behavior.

  • We indicate exactly how two events are related (overlaps, equals, starts, etc.) using Allen's relations [8]. Moreover, we include the duration of composite events in each pattern.

  • To eliminate the spurious frequent temporal patterns discovered, we suggest a few steps in Section 5.2 in order to prune the pattern space.

  • Once these patterns are obtained, we generate temporal rules. These temporal association rules help model the traffic cycle sequence, which is the main test domain for our work.

  • Using a hierarchical anomaly detection algorithm, spatial anomalies are detected based on object trajectories, and spatio-temporal anomalies are identified using a frequent pattern matching approach.

  • We track objects to obtain events that unfold over time. As with any trajectory-based approach, a good tracking algorithm is needed to overcome its inherent issues. In this paper, we focus only on vehicle motion in complex traffic scenes. Pedestrian activity is disregarded as complete trajectories are hard to obtain in crowded scenes.

  • For the temporal mining algorithms, user-defined parameters have to be determined by domain experts. Even though mining techniques do not require the definition of events or rules in advance, the temporal support and the confidence thresholds (cf. Table 1) have to be specified.

The work is organized as follows: Section 2 presents some existing works on the topic. Section 3 briefly describes the proposed methodology. Section 4 focuses on feature extraction and segmentation, while Section 5 presents the second phase, i.e., using the video mining techniques to learn the dynamic scene model. The anomaly detection methodology is discussed in length in Section 6. Experiments are conducted on two datasets, and the results with evaluation measures are illustrated and explained in Section 7, followed by conclusions in Section 8.

Section snippets

Related work

Existing approaches in the literature generally start with motion feature extraction, such as object trajectories or optical flow. Event modeling is done by clustering these features using similarity based distance measures. Trajectory-based approaches [3], [10], [11], [12] primarily rely on how well a tracker performs. The results may be compromised in crowded scenarios due to the presence of multiple objects, inter-object occlusions and low resolution videos [3], [13]. In their seminal work,

Overview

The proposed approach is illustrated in Fig. 1, comprising of the following steps:

  • Feature extraction: We employ a semi-automatic mean-shift tracker [35] to obtain object trajectories.

  • Motion segmentation: Spectral clustering is used to cluster trajectories into different event classes. The number of clusters is determined iteratively.

  • Learning frequent temporal patterns: Relationships are discovered between events based on their time duration characteristics. Temporal patterns, often represented

Feature extraction and segmentation

Objects tend to follow common pathways in a traffic scenario, and two key points are of particular interest: the entry point, where an object appears in the scene, and the exit point where it disappears from the scene. Since we focus solely on traffic scenarios in this work, we use [35] to perform the object tracking, and pedestrian trajectories, if any, are subsequently removed (as in [28], [36]). Moving average low-pass filters are used to remove noise from the trajectories.

The extracted

Video association mining

Events reoccur over time, and this means that each event corresponds to multiple time intervals. We first start by forming event sequences and then, extract the frequent temporal patterns from them. Allen's First Order Interval Logic is used to describe relationships between event pairs in sequences. Next, temporal association rules are generated from the obtained frequent patterns (Section 5.3). Association rules are used to predict future events or the expected behavior between various

Spatial level

Each trajectory cluster defines a single event and each event is represented by its cluster centroid. That is, the centroid models the general appearance of trajectories for any given event [37]. Having obtained the individual events in the scene, trajectories in test clips are classified to their respective event categories. The nearest-neighbor classification scheme is utilized for this purpose, where the distance of each test trajectory is computed to all other centroid trajectories using

Datasets

We test our system on two public datasets [45]. These datasets feature complex activities between numerous agents in the scene, governed by traffic lights.

Conclusions

In this work, we have proposed a method that analyzes traffic patterns and detect irregular events. To the best of our knowledge, temporal mining techniques have not been used for event recognition in dynamic scenes. We first discover frequent temporal patterns and use Allen's temporal relations [8] for representation. The time duration of composite events is included in the pattern as well. Temporal association rules are then generated from these frequent patterns. These association rules help

References (45)

  • J. Allen et al.

    Actions and events in interval temporal logic

    J. Log. Comput.

    (1994)
  • D. Patel et al.

    Mining relationships among interval-based events for classification

  • E. Jouneau et al.

    Particle-based Tracking Model for Automatic Anomaly Detection

    (2011)
  • V. Morariu et al.

    Multi-agent event recognition in structured scenarios

  • Z. Zhang et al.

    Trajectory series analysis based event rule induction for visual surveillance

  • T. Hospedales et al.

    Identifying rare and subtle behaviours: a weakly supervised joint topic model

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • C. Stauffer et al.

    Learning patterns of activity using real-time tracking

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • R. Emonet et al.

    Extracting and locating temporal motifs in video scenes using a hierarchical nonparametric Bayesian model

  • J. Li et al.

    Discovering multi-camera behaviour correlations for on-the-fly global activity prediction and anomaly detection

  • J. Varadarajan et al.

    Topic models for scene analysis and abnormality detection

  • C. Loy et al.

    Stream-based active unusual event detection

    ACCV

    (2010)
  • L. Song et al.

    Understanding dynamic scenes by hierarchical motion pattern mining

  • Cited by (12)

    • Mining temporal association rules with frequent itemsets tree

      2018, Applied Soft Computing Journal
      Citation Excerpt :

      However, both of these methods only involve the temporal pattern mining, but did not obtain the temporal association rules. In [26], a temporal pattern indicates a set of events that are linked based on their relationship with other events in the set, and the resulting frequent patterns are used to generate temporal association rules. But the temporal constraints of the antecedent and consequence of the resulting rules are the same time period.

    • Crowd Modeling using Temporal Association Rules

      2021, Proceedings of the 2021 IEEE International Conference on Human-Machine Systems, ICHMS 2021
    • Specific temporal association rules and temporal correlations to enlarge and detect inconsistencies in a large growing knowledge base

      2018, ICNC-FSKD 2017 - 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery
    View all citing articles on Scopus

    This paper has been recommended for acceptance by Ivan Laptev.

    View full text