Robust individual and holistic features for crowd scene classification

doi:10.1016/j.patcog.2016.03.031

Pattern Recognition

Volume 58, October 2016, Pages 110-120

https://doi.org/10.1016/j.patcog.2016.03.031 Get rights and content

Abstract

In this paper, we present an approach that utilizes multiple exemplar agent-based motion models (AMMs) to extract motion features (representing crowd behaviors) from the captured crowd trajectories. In the exemplar-based framework, we propose an iterative optimization algorithm to measure the correlation between any exemplar AMM and the trajectory data. It is based on the Extended Kalman Smoother and KL-divergence. In addition, based on the proposed correlation measure, we introduce the novel individual feature, in combination with the holistic feature, to describe crowd motions. Our results show that the proposed features perform well in classifying real-world crowd scenes.

Introduction

In recent years, researchers in many fields are interested in understanding and analyzing pedestrians׳ behaviors in crowds, because of their practical applications in event recognition, traffic flow estimation, and crowd motion prediction. One of the fundamental problems in all these crowd analysis applications is the recognition of crowd scenes. This is a difficult research problem because crowd motion patterns are complicated (e.g., dynamic crowd density, scenarios configuration, and crowd psychology). In computer vision area research, some of the prior works formulate the crowd scene recognition problem as event recognition or anomaly detection [1], [2], [3], [4], [5], which aims to extract local-motion patterns from crowd videos. Some other prior works model the interactions among a small number of persons in order to perform action recognition [6], [7], while [8] detects abnormal behaviors in dense crowd interaction scenarios. Despite the development of these crowd scene recognition methods, they are not robust enough for application across different scenarios – they require much effort in removing the negative impact of the vision-related factors (e.g., background noise and perspective transformation) and in learning a scene-specific model.

In order to investigate the underlying principles of crowd motions, our work aims to learn features from the crowd trajectories. Recently, pedestrian tracking techniques have progressed to a state that they can reliably capture the trajectories of crowd motions. Therefore, we believe that crowd scene recognition could take advantage of the captured crowd trajectories. Many prior vision-based works rely on optical-flow or key-point tracking, so that researchers have to deal with the unassociated tracklets or background noise when analyzing the crowd motion. The results of the analysis can be affected by factors such as camera positions and angles. Comparatively, given the pedestrian trajectories, the results of the analysis may be more informative. In addition to the crowd׳s holistic feature, we may also consider the motion feature of the individuals. Previous trajectory-based crowd analysis works are generally based on trajectory clustering [9] or semantic region inference [10], [11]. Few works have been based on trajectories for crowd scene recognition, possibly due to the difficulty of regularizing the spatial and temporal trajectory data. In particular, for crowd scenes, the numbers of pedestrians may be different, the duration of the captured crowd motion sequences may be different, and the crowd trajectories are usually compounded with unknown noise. However, the crowd trajectories are informative for studying individuals’ interactions within a crowd, e.g., how pedestrians react to oncoming opponents, which provide more insight into crowd scene understanding.

In this paper, we propose a crowd scene recognition algorithm that can handle the difficulty of regularizing the trajectories. The given trajectories allow us to study the individual interactions as the individual motion feature. As prior works in visual pedestrian tracking [12], [13] suggest, such interactions, especially collision avoidance and grouping motion, are crucial in crowd motions. However, directly quantifying the interactions from crowd-motion data is a challenging task. To address this problem, we apply the agent-based motion models, denoted as AMMs, which have been demonstrated to be effective at modeling crowd interactions. In order to bridge the gap between AMMs and crowd trajectories, we propose a novel iterative optimization algorithm based on the Extended Kalman smoother and KL-divergence, which estimates how well the AMMs will model a specific crowd motion at both the individual level and the holistic level. The intuition is that, since we assume that the AMM-based state transition and the captured noisy sequential crowd states are Gaussian distributions, we can apply KL-divergence to measure the distance between these two distributions.

The prior works on crowd analysis usually aim at learning a general model, which requires a large amount of training data. Hence, using a single AMM to recognize different types of crowd motion is not robust. The fact that most AMMs are non-linear models controlled by several parameters further increases the difficulty of training a robust model. To obviate this difficulty in training a suitable agent-based motion model, we are inspired by prior works on recognition (e.g., [14], [34]) that leverage exemplar models to infer unknown models. We apply a similar exemplar-based method that leverages multiple different AMMs to jointly evaluate the input crowd-motion data. Fig. 1 shows the framework. The query crowd trajectories are compared with multiple exemplar AMMs using our proposed iterative optimization algorithm. These exemplar AMMs can model/simulate different types of motion (e.g., the blue AMM models repulsive motions while the purple AMM models penetrative motions). The more similar an AMM to the real data, the lower the score it will get in the holistic feature.

In addition, the proposed optimization algorithm can evaluate a crowd motion not only holistically but individually. The evaluation of the individual motion can serve as a feature, or an individual motion descriptor. After collecting the individual motion descriptors for different individuals in various crowd scenes, we can categorize the individuals. In this work, we make an assumption that the distribution of different kinds of individuals in a crowd scene determines the style of the crowd motion. For instance, a crowd scene with most of the individuals belonging to the same motion category should be a coherent motion, while a crowd scene with a lot of heterogeneous individuals should probably be a random motion. Based on such an assumption and individual categorization, we propose an individual feature to describe crowd motion, as shown in Fig. 1 and explained in Section 3.5.

Contributions: In this paper, we present a framework that extracts motion features from the given crowd trajectories based on the use of multiple exemplar AMMs, and propose an algorithm that measures the KL divergence between the crowd trajectories and any of the AMMs. To describe the crowd trajectories, we learn a holistic feature and an individual feature. While the holistic feature is a direct descriptor of the crowd motion sequences with regard to different AMMs, the individual feature stems from the distribution of different kinds of individuals, which is inferred from the individual clusters collected across crowd scenes. To evaluate our feature, we perform the multi-label classification on real-world crowd data.

The rest of this paper is organized as follows. Section 2 summarizes the relevant works. Section 3 introduces our exemplar-based framework and presents how to compute the individual and holistic features. Section 4 presents the multi-classification formulation. Finally, Section 5 evaluates the proposed approach and Section 6 briefly concludes this work.

Section snippets

Related works

In this section, we review existing works on visual analysis of crowds and agent-based motion models.

Exemplar-based framework

As mentioned, learning a single robust model for recognizing crowd scenes is difficult. To accomplish this task, we set up an exemplar-based framework that leverages multiple AMMs to generate a robust crowd motion feature (see Fig. 1). In this section, we first introduce our algorithm to measure the differences between an AMM and the trajectories. Given different AMMs that are able to model various crowd motions, the proposed algorithm measures the correlation between the AMMs and the query

Multi-label classification

It is sometimes difficult to assign a single label to real world crowd scenes. In this paper, we treat the crowd motion recognition problem as a multi-label classification problem. Specifically, let $X$ be the feature vector of an instance (e.g., a crowd motion sequence) and $Y$ be a finite set of labels ${1, 2, \dots, Q}$ . Given a training set $T = {(x_{1}, Y_{1}), (x_{2}, Y_{2}), \dots, (x_{m}, Y_{m})}$ ( $x_{i} \in X$ , $Y_{i} \subseteq Y$ ), the goal is to output a multi-label classifier $h : X \to 2^{Y}$ , which is usually formed as a real-value function $g : X \times Y \to R$ . The

Experiments

In this section, we first introduce our dataset and how we select the exemplar AMMs. We then evaluate the individual and the holistic features.

Conclusion and future work

In this paper, we propose an exemplar-based method to extract features of crowd motions from the crowd trajectories. In particular, we propose an algorithm based on EKS and KL-divergence to compute the individual and the holistic features. We demonstrate that the proposed features perform well in recognizing real-world crowd scenes.

There are several related problems to address in the future. First, our method is limited to pedestrian trajectory input. It is a difficult problem to automatically

Conflict of interest

None declared.

References (34)

Y. Cong et al.
Abnormal event detection in crowded scenes using sparse representation
Pattern Recognit.
(2013)
X. Zhu et al.
Sparse representation for robust abnormality detection in crowded scenes
Pattern Recognit.
(2014)
M. Zhang et al.
ML-KNNa lazy learning approach to multi-label learning
Pattern Recognit.
(2007)
X. Wang et al.
Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models
IEEE Trans. Pattern Anal. Mach. Intell.
(2009)
D. Kuettel, M. Breitenstein, L. van Gool, V. Ferrari, What׳s going on? discovering spatio-temporal dependencies in...
V. Mahadevan, W. Li, V. Bhalodia, N. Vasconcelos, Anomaly detection in crowded scenes, in: IEEE Conference on Computer...
W. Choi, S. Savarese, A unified framework for multi-target tracking and collective activity recognition, in: European...
R. Li et al.
Recognizing interactive group activities using temporal interaction matrices and their Riemannian statistics
Int. J. Comput. Vis.
(2013)
R. Mehran, A. Oyama, M. Shah, Abnormal crowd behavior detection using social force model, in: IEEE Conference on...
W. Ge et al.
Vision-based analysis of small groups in pedestrian crowds
IEEE Trans. Pattern Anal. Mach. Intell.
(2012)

X. Wang et al.

Trajectory analysis and semantic region modeling using nonparametric hierarchical Bayesian models

Int. J. Comput. Vis.

(2011)

B. Zhou, X. Wang, X. Tang, Understanding collective crowd behaviors: learning a mixture model of dynamic...

S. Ali, M. Shah, Floor fields for tracking in high density crowd scenes, in: European Conference on Computer Vision,...

S. Pellegrini, A. Ess, K. Schindler, L. van Gool, You׳ll never walk alone: modeling social behavior for multi-target...

T. Malisiewicz, A. Gupta, A. Efros, Ensemble of exemplar-svms for object detection and beyond, in: IEEE International...

A. Chan et al.

Modeling, clustering, and segmenting video with mixtures of dynamic textures

IEEE Trans. Pattern Anal. Mach. Intell.

(2008)

B. Morris, M. Trivedi, Learning and classification of trajectories in dynamic scenes: a general framework for live...

Cited by (0)

View full text

Robust individual and holistic features for crowd scene classification

Abstract

Introduction

Section snippets

Related works

Exemplar-based framework

Multi-label classification

Experiments

Conclusion and future work

Conflict of interest

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models

IEEE Trans. Pattern Anal. Mach. Intell.

Recognizing interactive group activities using temporal interaction matrices and their Riemannian statistics

Int. J. Comput. Vis.

Vision-based analysis of small groups in pedestrian crowds

IEEE Trans. Pattern Anal. Mach. Intell.

Trajectory analysis and semantic region modeling using nonparametric hierarchical Bayesian models

Int. J. Comput. Vis.

Modeling, clustering, and segmenting video with mixtures of dynamic textures

IEEE Trans. Pattern Anal. Mach. Intell.