Elsevier

Neurocomputing

Volume 100, 16 January 2013, Pages 41-50
Neurocomputing

Multi-camera tracking using a Multi-Goal Social Force Model

https://doi.org/10.1016/j.neucom.2011.09.038Get rights and content

Abstract

Tracking across non-overlapping cameras is a challenging open problem in video surveillance. In this paper, we propose a novel target re-identification method that models movements in non-observed areas with a modified Social Force Model (SFM) by exploiting the map of the site under surveillance. The SFM is developed with a goal-driven approach that models the desire of people to reach specific interest points (goals) of the site such as exits, shops, seats and meeting points. These interest points work as attractors for people movements and guide the path predictions in the non-observed areas. We also model key regions that are potential intersections of different paths where people can change the direction of motion. Finally, the predictions are linked to the trajectories observed in the next camera view where people reappear. We validate our multi-camera tracking method on the challenging i-LIDS dataset from the London Gatwick airport and show the benefits of the Multi-Goal Social Force Model.

Introduction

Wide indoor and outdoor sites are extensively monitored by networks of cameras whose Fields-Of-View (FOV) do not necessarily overlap, thus making the task of tracking a person across a network very challenging (Fig. 1). When dealing with multi-camera tracking, existing methods solve the trajectory association problem relying on a training phase to learn the relationships between camera pairs. Most algorithms are based on a minimization method in order to find the correspondences between trajectories from each camera in the network [1]. The minimization process usually aims of finding the best match between appearance and motion features of the target. Common strategies, that tackle this problem relying on appearance matching across cameras [2], can only be applied when people are well visible and recognizable. Other algorithms integrate appearance features with motion information and use traveling time and reappearance position within the next observed region as key features for the minimization process [3].

One of the first attempts to solve the multi-camera tracking problem is presented in [4], where Kettnaker and Zabih use a Bayesian formulation for path reconstruction in a non-overlapping camera network. Their main assumption is that one object can only be at one specific position at a certain time. Observation matching permits to obtain chains of observations between frames in order to create object trajectories across the different views. In a more recent work, Javed et al. [5] track across multiple cameras using pedestrian trajectories obtained from single-camera tracking in the observed regions and exploit the relationship between the FOV lines on the same common ground plane. The object motion across cameras is then estimated using a minimization of the Euclidean distance. Furthermore, Javed et al. [6] use inter-camera space–time and appearance probabilities to find an object in different cameras by maximizing the conditional probability of the corresponding observations. To match an object after it moved through non-observed regions, space–time and appearance models are learned and updated on-line. A further improvement of multi-camera tracking based on appearance and motion is presented in [1], where the Brightness Transfer Function (BTF) colors mapping between camera pairs is expected to be lying on a low-dimensional space. This lower dimensionality helps trajectory association that is performed by an optimization step on the available trajectories using the position and the appearance of the target. A similar problem is tackled in [7] where the appearance of the target is matched in the Consensus-Color Conversion of Munsell (CCCM) color space, the main paths are grouped by unsupervised clustering and the time needed for a target to go from one camera to the next is analyzed and learned by associating only potential targets. A different approach is presented in [3] where the appearance of people is matched across cameras using color, covariance matrix and Histogram of Oriented Gradients. The feature mapping across cameras is learnt on-line and the Hungarian algorithm is used to solve the association problem.

In the presence of non-observed areas, there are no direct measurements of a person that can be used to facilitate tracking across cameras. Predicting the exact position where a person exiting the FOV of a camera will appear in the FOV of the next camera is very challenging due to the presence of various barriers and potential interactions occurring in the non-observed regions. Moreover, in the presence of a crowd, partial and complete occlusions will generate challenging situations for the above-described methods. Additional challenges are due changes in illumination conditions across cameras (e.g. the presence of a large window against an area with artificial illumination only), clutter (different people can look very similar) and different body poses.

In this paper, we tackle the multi-camera tracking problem by modeling the path of walking people without using appearance features. We predict where people move using a goal-driven model that creates hypotheses on where they are likely to reappear in order to facilitate the person re-identification process. To each person is assigned a set of possible goals [9], which are interest points in the site such as for example shops, doors, key points for the movement, exits, seats. In order to propagate people movements in non-observed regions, we use a motion model developed for crowd simulation [10]. Each person is modeled as an agent that can freely move onto the top-view map trying to reach the selected goals, avoiding barriers and walls while maintaining a desired speed. In order to tackle the multi-camera tracking problem, a matching process based on the spatio-temporal distances between predictions and single-camera tracking in the next observed region is performed. This process does not impose the assumption that points of view and illumination conditions are relatively consistent in the camera network. The main contributions of our work are: (a) the use of a motion prediction model to estimate the positions of people in non-observed areas; (b) the definition of multi-camera tracking as an on-line re-identification problem without using appearance features; (c) the development of a simple parameter-based model for trajectory prediction that can be easily instantiated for a specific site. To the best of our knowledge, this is the first application and adaptation of a crowd simulation algorithm to a multi-camera tracking problem.

The paper is organized as follows: Section 2 discusses the related work on the field of motion modeling used for tracking. Section 3 presents the Social Force Model (SFM) and its modification for our goal-driven prediction. In Section 4, our model is validated using the i-LIDS dataset from the London Gatwick airport. Finally, Section 5 draws the conclusions and discusses future work.

Section snippets

Related work

We can identify three main strategies to categorize crowd simulation approaches based on how the relationships between pedestrians are modeled, namely macroscopic, microscopic and mesoscopic approaches [11]. Macroscopic approaches consider the crowd as an entity and movements are modeled as a flow that is followed by people. Microscopic approaches consider each person as an entity and the movement of each person is modeled by considering various factors such as interaction with other people and

Overview of the proposed approach

In this paper, we develop a modified Social Force Model for multi-camera tracking. The multi-camera tracking problem is formulated as an on-line target re-identification problem where one person exiting from one camera view is identified in the next camera view (where observations are available again), after having crossed non-observed areas. We assume to be known an approximate map of the environment and we integrate it with a modified Social Force Model [19] to model the behavior of walking

Experimental results

To validate the proposed method, we use the i-LIDS dataset from the London Gatwick airport [8] and we study the movement of people at the arrival terminal. We consider people that are visible when they walk out of the passengers area. The aim is to find where and when these people reappear in one of the next cameras in the public area. This is a challenging environment where people can potentially walk in many directions once they exit the camera view covering the passenger area. In addition to

Conclusion and future work

We presented a method to estimate people movements in non-observed regions between camera views and demonstrated it on a person re-identification problem without using appearance features on a real surveillance scenario. The method is based on a modification of the Social Force Model and takes into account barrier avoidance constraints as well as the desired motion toward specific goals in the scene. Unlike existing methods that assume a linear motion between cameras we only assume that a

Riccardo Mazzon received his bachelor degree in 2006 and MSc degree in 2009 in computer engineering from the University of Padova, Padua, Italy. Currently, he is a research student under the supervision of Prof. Andrea Cavallaro at the School of Electronic Engineering and Computer Science, Queen Mary University of London. His research interests are person detection, non-overlapping camera networks and human behavior understanding.

References (27)

  • O. Javed et al.

    Modeling inter-camera space–time and appearance relationships for tracking across non-overlapping views

    Comput. Vision Image Understanding

    (2008)
  • B. Prosser, W.-S. Zheng, S. Gong, T. Xiang, Person re-identification by support vector ranking, in: Proceedings of the...
  • C.-H. Kuo, C. Huang, R. Nevatia, Inter-camera association of multi-target tracks by on-line learned appearance affinity...
  • V. Kettnaker, R. Zabih, Bayesian multi-camera surveillance, in: Proceedings of IEEE International Conference on...
  • O. Javed, Z. Rasheed, O. Alatas, M. Shah, KnightM: a real time surveillance system for multiple overlapping and...
  • O. Javed, Z. Rasheed, K. Shafique, M. Shah, Tracking across multiple cameras with disjoint views, in: Proceedings of...
  • R. Bowden et al.

    Towards automated wide area visual surveillance: tracking objects between spatially-separated, uncalibrated views

    IEE Proc. Vision Image Signal Process.

    (2005)
  • iLIDS, Home Office multiple camera tracking scenario definition (UK),...
  • A. Turner et al.

    Encoding natural movement as an agent-based system: an investigation into human pedestrian behaviour in the built environment

    Environ. Plann. B Plann. Design

    (2002)
  • A. Johansson et al.

    Specification of a microscopic pedestrian model by evolutionary adjustment to video tracking data

    Adv. Complex Syst.

    (2007)
  • B. Zhan et al.

    Crowd analysis: a survey

    Mach. Vision Appl.

    (2008)
  • R.L. Hughes

    The flow of human crowds

    Annu. Rev. Fluid Mech.

    (2003)
  • D. Bauer, S. Seer, N. Brändle, Macroscopic pedestrian flow simulation for designing crowd control measures in public...
  • Cited by (44)

    • Tracking people in RGBD videos using deep learning and motion clues

      2016, Neurocomputing
      Citation Excerpt :

      The state-of-the-art tracking methods can be divided into single-view and multi-view approaches. Multi-view approaches [5–8] deploy several cameras to monitor the same area, and thus maximally reduce situations of occlusions and trajectory intersections. Traditional single-view approaches [9–11] have easy-to-deploy advantage over multi-view ways but must compensate for lack of available information.

    • Autonomous crowds tracking with box particle filtering and convolution particle filtering

      2016, Automatica
      Citation Excerpt :

      Recent results for the modelling, simulating and visual analysis of crowds are presented in Ali, Nishino, Manocha, and Shah (2014) from the point of view of computer vision, transportation systems and surveillance. The social force model (Ali et al., 2014; Helbing & Molnár, 1995; Mazzon & Cavallaro, 2013) has been used to model behaviour of pedestrians, including evacuation of people through bottlenecks. The social force model has also been combined with some filtering techniques for multiple-target tracking in Pellegrini, Ess, Schindler, and Van Gool (2009).

    • Multi-camera handoff for person re-identification

      2016, Neurocomputing
      Citation Excerpt :

      Satta et al. [14] extended his previous work, presented in [13], to fast person re-identification by subdividing the person׳s body parts. Appearance-, texture- and shape-based features are very commonly used for person re-identification in a multi-camera environment [3,15,16]. The state-of-the-art methods for person re-identification proposed by these authors are all based on appearance (Color, Texture and Shape features), calibration (Color and Spatio-temporal information) and association (Distance, Learning and Optimization) or a combination of these techniques[4, 17].

    • Modeling, simulation and analysis of group trampling risks during escalator transfers

      2016, Physica A: Statistical Mechanics and its Applications
    • Tracking Passengers and Baggage Items Using Multiple Overhead Cameras at Security Checkpoints

      2023, IEEE Transactions on Systems, Man, and Cybernetics: Systems
    View all citing articles on Scopus

    Riccardo Mazzon received his bachelor degree in 2006 and MSc degree in 2009 in computer engineering from the University of Padova, Padua, Italy. Currently, he is a research student under the supervision of Prof. Andrea Cavallaro at the School of Electronic Engineering and Computer Science, Queen Mary University of London. His research interests are person detection, non-overlapping camera networks and human behavior understanding.

    Andrea Cavallaro is a professor of multimedia signal processing at the School of Electronic Engineering and Computer Science, Queen Mary University of London (QMUL). He was awarded a research fellowship with BT labs in 2004; the Royal Academy of Engineering Teaching Prize in 2007; three Student Paper Awards at IEEE ICASSP in 2005, 2007, and 2009; and the Best Paper Award at IEEE AVSS 2009. He has published two books (Video Tracking and Multi-Camera Networks) and more than 100 papers. His research interests are target tracking and multimodal content analysis for multisensor systems.

    View full text