Elsevier

Pattern Recognition

Volume 47, Issue 4, April 2014, Pages 1626-1641
Pattern Recognition

Modeling local behavior for predicting social interactions towards human tracking

https://doi.org/10.1016/j.patcog.2013.10.019Get rights and content

Highlights

  • We model multiple social effects in pedestrian dynamics.

  • We propose a decomposed motion model that approximates complex social interactions.

  • The algorithm adjusts the number of basic trackers dynamically based on the exact interaction.

Abstract

Human interaction dynamics are known to play an important role in the development of robust pedestrian trackers that are needed for a variety of applications in video surveillance. Traditional approaches to pedestrian tracking assume that each pedestrian walks independently and the tracker predicts the location based on an underlying motion model, such as a constant velocity or autoregressive model. Recent approaches have begun to leverage interaction, especially by modeling the repulsion forces among pedestrians to improve motion predictions. However, human interaction is more complex and is influenced by multiple social effects. This motivates the use of a more complex human interaction model for pedestrian tracking. In this paper, we propose a novel human tracking method by modeling complex social interactions. We present an algorithm that decomposes social interactions into multiple potential interaction modes. We integrate these multiple social interaction modes into an interactive Markov Chain Monte Carlo tracker and demonstrate how the developed method translates into a more informed motion prediction, resulting in robust tracking performance. We test our method on videos from unconstrained outdoor environments and evaluate it against common multi-object trackers.

Introduction

Multiple pedestrian tracking in unconstrained environments is an important task that has received considerable attention from the computer vision community in the past two decades. A number of approaches that address this problem have been proposed [1], [2]. Accurate multiple pedestrian tracking can greatly improve the performance of activity recognition and analysis of high level events through a surveillance system. However, the complexity of human motion poses several challenges to the accuracy and precision of any tracking system. In the context of video surveillance, human motion can be thought of as blob motion in which arms and legs are difficult or unnecessary to localize. At this scale, the study of human motion predominantly involves cues related to space and environment, and we can expect to recover how people move from place to place. Accordingly, the recovery of motion pattern of people facilitates a measure of social phenomena among interacting individuals [3]. Interpersonal distance cues have their basis in the seminal findings that people tend to organize the space around them in four concentric zones associated with different degrees of intimacy [4]. The spatial organization of people within these concentric zones is dominated by relationships between interacting individuals [5]. Hence, it is the encoding of social relationships along with tracking methods that has been most commonly exploited in recent years to model human motion.

The integration of social relationships to address the dynamics of human motion has its origin in the social force model [6] that applies a fluid flow analogy to the dynamics of pedestrians. It is primarily a physical model that captures a continuous phenomenon where humans are considered to react to energy potentials caused by other pedestrians and static obstacles, while trying to keep a desired speed and motion direction. Recently proposed local motion models such as linear trajectory avoidance (LTA) model [7] or human motion prediction model [8] demonstrate that leveraging social relationships can improve tracking performance. Typical social relationships can be envisioned through simple interaction effects that can take forms such as: (1) attraction effects, (2) repulsion effects, and/or (3) no social effect. The attraction and repulsion effect can be characterized as the tendency to move toward or away from objects. Repulsion effect has been leveraged in most existing tracking methods, but modeling of multiple effects of social relationships simultaneously remains challenging. Modeling motion based on repulsion effects alone excludes the possibility of people's intent to meet and only captures the intent of avoiding collisions. Nevertheless, unconstrained environments would typically involve people with motion dynamics explained under the combination of several basic social effects. In this paper, we present a model that embeds social relationships in terms of linear combination of predefined basic social effects.

Generally, the intent of pedestrians produces different social relationships in which the intent of avoidance is explained by the repulsion effect and the intent of approach is explained by the attraction effect. The intent varies over time, thus motion prediction of corresponding trackers should be adjusted dynamically depending on the current interaction environment. A specific limitation of many trackers is that the motion model used to predict the dynamics of a target is based on a fixed motion model, typically a first-order approximation. Thus, it fails to model the complex motion that is affected by elaborate pedestrians' intent and corresponding interactions. Our approach focuses on how to incorporate the temporally varying pedestrian interaction or intent into a dynamic motion model without explicit knowledge of local social relationships. Although the desired mode of interaction is unknown, the intent of pedestrians can be assumed to belong to a finite set which combines the intent of avoidance, approach, or non-interaction [9]. The finite set of intent generates a finite set of interactions. We propose to decompose complex pedestrian interaction into a finite set of interactions, where the decomposition is motivated by the work of Kwon and Lee [10].

Consider a simple scenario with two pedestrians as illustrated in Fig. 1, wherein pedestrians can either decide to meet and interact with others or choose their motion direction to avoid colliding with others. By modeling their intents in this case (interaction modes), local interactions can be hypothesized to guide tracking. Conversely, the tracking output validates the mode of social interactions. If we model the local interaction between them under the intent of either avoidance or approach, the approach predicts two possible motions for each pedestrian. Then it searches the best tracking result by sampling pedestrians’ state space. On the other hand, the best tracking result validates the intent under which local interaction effects contribute more accurately to prediction using a linear search strategy.

The key contributions of our work are as follows:

  • 1.

    Local interaction model that explicitly includes repulsion, attraction, and non-interaction. We model repulsion, attraction, and non-interaction effects in pedestrian dynamics. Such interactions are more common in unconstrained environments and can be leveraged to capture various interaction behaviors such as people meeting, people following, and/or group interactions.

  • 2.

    A decomposed social interaction model. We propose a decomposed motion model that approximates complex social interactions by tracking all the possible combination of basic interaction effects among multiple pedestrians. It enables motion prediction without the knowledge of instantaneous interaction modes.

  • 3.

    A dynamically adjusted state space. The algorithm adjusts the number of basic trackers dynamically based on the exact interaction among pedestrians, which expands or shrinks the joint state space to facilitate the search of tracking results.

This paper is an extension of our work in [11] that details and generalizes our proposed approach along with additional experiments to evaluate the benefits of the developed tracker. Specifically, (1) synthetic experiments are presented and analysis performed to evaluate the accuracy of social interaction mode prediction and its impact on tracking performance; (2) various parameters of the proposed framework are evaluated and results presented to better understand their impact on tracking performance; and (3) a more detailed comparison is presented to validate the advantage of modeling multiple basic social effects including approach, avoidance, and non-interaction as compared to existing trackers that incorporate social effects to model motion dynamics. The rest of this paper is organized as follows. Section 2 describes related work. Section 3 presents the proposed social interaction model and describes its decomposition into multiple models. The incorporation of the proposed model within a Bayesian tracking framework and the design of the compound tracker is presented in Section 4. Section 5 presents the experiments performed and a qualitative and quantitative assessment of the tracker performance. Comparative analysis against multiple existing trackers is also presented. Finally, conclusions are presented in Section 6.

Section snippets

Related work

Previous tracking algorithms mainly exploit two aspects including coping with targets’ appearance variance and modeling complex targets’ motion. To account for appearance variation of the target caused by change of illumination, deformation and pose, a large amount of work has been proposed [12], [13], [14], [15], [16], [17] and these methods perform well and get good results. However, the dynamics of target and interaction between targets is much less explored. The state space of targets is

Social interaction model

The social force model by Helbing [29] is a computational model in which the interactions among pedestrians are described by using the concept of forces between physical entities. Each pedestrian feels a social force from other pedestrians that is proportional to the distance between them. In this model, a pedestrian i=1,…,H makes motion decisions based on the sum of forces Fi exerted. Under the modeled social force, the motion model that predicts the positional information for a tracked

Experiments

To evaluate the merit of our proposed model, we perform experiments on both synthesized data and real scenes. Synthesized data is generated to evaluate various parameters in the model. Real scenes are tested to compare the performance of the proposed method against different existing trackers as well as to compare the effect of different functions that could be used to model social forces. Two video sequences were included from the “BEHAVE” Interactions Test Case Scenarios [33]. The videos were

Conclusion and future work

In this paper, we have proposed a new dynamic model for tracking multiple pedestrians. The method leverages the social interaction decomposition to approximate a broader set of human interaction behaviors in unconstrained environments. To the best of our knowledge, this is the first time the social force model has been extended to simultaneously model multiple interaction behaviors in human tracking. The proposed dynamic model is decomposed through the construction of multiple basic trackers,

Conflict of interest

None declared.

Acknowledgement

This work was supported in part by the US Department of Justice 2009-MU-MU-K004. Any opinions, findings, conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of our sponsors.

Xu Yan is Ph.D. candidate in Department of Computer Science at the University of Houston. He received his B.E. degree and M.E. degree in Electrical Engineering from Hunan University, China. His current research focuses on fundamental of computer vision, pattern recognition, and digital image processing with application in video analytics and wide area distributed camera system.

References (37)

  • H. Yang et al.

    Recent advances and trends in visual trackinga review

    Neurocomputing

    (2011)
  • A. Yilmaz et al.

    Object trackinga survey

    ACM Comput. Surv.

    (2006)
  • E.T. Hall
    (1966)
  • E. Goffman

    Behaviour in Public Places, Notes on the Social Organisation of Gatherings

    (1963)
  • V. Richmond et al.

    Nonverbal Behavior in Interpersonal Relations

    (2007)
  • D. Helbing et al.

    Social force model for pedestrian dynamics

    Phys. Rev. E

    (1995)
  • S. Pellegrini, A. Ess, K. Schindler, L. vanGool, You'll never walk alone: modeling social behavior for multi-target...
  • M. Luber, J. Stork, G. Tipaldi, K. Arras, People tracking with human motion prediction from social forces, in:...
  • R.J. Sethi, A.K. Roy-Chowdhury, Modeling and recognition of complex multi-person interactions in video, in: Proceedings...
  • J. Kwon, K. Lee, Visual tracking decomposition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern...
  • X. Yan, I. Kakadiaris, S. Shah, Predicting social interactions for visual tracking, in: Proceedings of the British...
  • S.K. Zhou et al.

    Visual tracking and recognition using appearance-adaptive models in particle filters

    IEEE Trans. Image Process.

    (2004)
  • A. Adam, E. Rivlin, I. Shimshoni, Robust fragments-based tracking using the integral histogram, in: Proceedings of the...
  • H. Grabner, H. Bischof, On-line boosting and vision, in: Proceedings of the IEEE Conference on Computer Vision and...
  • D.A. Ross et al.

    Incremental learning for robust visual tracking

    Int. J. Comput. Vision

    (2008)
  • X. Mei, H. Ling, Robust visual tracking using l1 minimization, in: Proceedings of the International Conference on...
  • B. Babenko, M.-H. Yang, S. Belongie, Visual tracking with online multiple instance learning, in: Proceedings of the...
  • P. Perez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic tracking, in: Proceedings of the European...
  • Cited by (26)

    • Distractor-aware discrimination learning for online multiple object tracking

      2020, Pattern Recognition
      Citation Excerpt :

      Multi-Object Tracking (MOT), a.k.a Multi-Target Tracking (MTT), is an important problem in computer vision with many practical applications such as video surveillance, autonomous driving and human-computer interaction [1].

    • Human trajectory prediction in crowded scene using social-affinity Long Short-Term Memory

      2019, Pattern Recognition
      Citation Excerpt :

      Recent research in computer vision addresses or improves some of the challenges in trajectory prediction with sociality. For instance, Choi et al. [7–9] show that human motion and activity are influenced by other nearby people. Helbing et al. [10,11] propose the Social Force method to model interactions among people to improve the robustness and accuracy of multi-objects tracking problem.

    • Recognizing social relationships from an egocentric vision perspective

      2018, Multimodal Behavior Analysis in the Wild: Advances and Challenges.
    • Long-term path prediction in urban scenarios using circular distributions

      2018, Image and Vision Computing
      Citation Excerpt :

      However, multiple semantic classes along with a different crossing desirabilities allow our model a more detailed description of the human motion. The SFM has been used to detect anomaly events in crowded contexts [17] and has also been extended to simultaneously track pedestrians as in Ref. [18] where an IMCMC (Interactive Markov Chain Monte Carlo) framework combines multiple tracker hypotheses, each based on a specific social interaction. A similar method to our approach is presented in Ref. [19] where an energy function is used to forecast human trajectories by leveraging geometric features which represent distances from surrounding objects.

    • Human running detection: Benchmark and baseline

      2016, Computer Vision and Image Understanding
      Citation Excerpt :

      Human motion and behavior play an important role in the human visual system and video surveillance, which has drawn many researchers’ attention (e.g., [17–19]).

    • Automatic 3D tracking system for large swarm of moving objects

      2016, Pattern Recognition
      Citation Excerpt :

      A data association strategy for cell tracking was proposed in [14]. The mutual interaction among multiple humans was modeled to guide tracking in [15]. Khan et al. [9] proposed a particle filter based method to track multiple targets that frequently interact with each other.

    View all citing articles on Scopus

    Xu Yan is Ph.D. candidate in Department of Computer Science at the University of Houston. He received his B.E. degree and M.E. degree in Electrical Engineering from Hunan University, China. His current research focuses on fundamental of computer vision, pattern recognition, and digital image processing with application in video analytics and wide area distributed camera system.

    Ioannis A. Kakadiaris is a Hugh and Lillie Cranz Cullen Distinguished University Professor of Computer Science, Electrical & Computer Engineering, and Biomedical Engineering at the University of Houston, Houston, TX, USA. He earned his B.Sc. in physics at the University of Athens in Greece, his M.Sc. in computer science from Northeastern University and his Ph.D. at the University of Pennsylvania. He is the founder and director of the Computational Biomedicine Lab. His research interests include cardiovascular informatics, biomedical image analysis, biometrics, computer vision, and pattern recognition.

    Shishir K. Shah is Associate Professor of Computer Science at the University of Houston. He received his B.S. degree in Mechanical Engineering, and M.S. & Ph.D. degrees in Electrical and Computer Engineering from The University of Texas at Austin. He directs research at the Quantitative Imaging Laboratory and his current research focuses on fundamentals of computer vision, pattern recognition, and statistical methods in image analysis with applications in multi-modality sensing, video analytics, biometrics, object recognition, and biomedical image analysis.

    View full text