Feature fusion for basic behavior unit segmentation from video sequences

https://doi.org/10.1016/j.robot.2008.10.018Get rights and content

Abstract

It has become increasingly popular to study animal behaviors with the assistance of video recordings. An automated video processing and behavior analysis system is desired to replace the traditional manual annotation. We propose a framework for automatic video based behavior analysis systems, which consists of four major modules: behavior modeling, feature extraction from video sequences, basic behavior unit (BBU) discovery and complex behavior recognition. BBU discovery is performed based on features extracted from video sequences, hence the fusion of multiple dimensional features is very important. In this paper, we explore the application of feature fusion techniques to BBU discovery with one and multiple cameras. We applied the vector fusion (SBP) method, a multi-variate vector visualization technique, in fusing the features obtained from a single camera. This technique reduces the multiple dimensional data into two dimensional (SBP) space, and the spatial and temporal analysis in SBP space can help discover the underlying data groups. Then we present a simple feature fusion technique for BBU discovery from multiple cameras with the affinity graph method. Finally, we present encouraging results on a physical system and a synthetic mouse-in-a-cage scenario from one, two, and three cameras. The feature fusion methods in this paper are simple yet effective.

Introduction

It has become an increasingly important research area to automatically analyze object behaviors from visually (e.g., motion) captured data or video recordings. The major tasks are to automatically detect and track objects from video sequences and analyze its high level activities or behaviors. Humans and vehicles have been mostly the focus of the visual surveillance and behavior understanding research [1], [2], [3], [4] for security purposes, e.g., access control in certain area, anomaly detection in crowded mass transportation area, etc.

In areas of biology, pharmacology, toxicology, entomology and animal welfare, video recordings are popularly used to analyze the behaviors of animals (e.g. lab mice, rodents, poultry, wild animals, etc.) The traditional human annotation approach is time consuming and results may vary from one observer to another. Hence the automatic animal behavior analysis from visual data is drawing more and more attention in both the research and industrial community [5], [6].

In the area of visual robot control, it is also desired for robots to automatically learn and recognize behaviors from motion capture or visual data [7], [8], [9], which would enable the intelligent robots to respond according to the visual information captured by cameras.

Among all the efforts made in an automated behavior analysis system, the basic behavior unit (BBU) classification (or segmentation) is one important task [10]. Usually the sequences of visual data from images need first to be grouped into BBUs [11], or primitive (atomic) behaviors [7], and then complex behaviors are analyzed based upon the relationship between the BBUs and context. Prior to the BBU segmentation step, spatiotemporal features are usually extracted. In the literature, interest points, shape properties of the detected object blobs, contours, or features derived thereby are used to perform BBU classification. Feature extraction itself is an important task.

In the literature, researchers has been trying to solve the BBU classification and feature extraction tasks separately. In this paper, we take a integrated approach and propose a framework for such an automatic behavior analysis system. We first present the framework, and then focus on investigating feature fusion techniques in BBU discovery: we will present the exploration of the vector fusion method [12] in feature dimension reduction, and the fusion of features from multiple cameras using the affinity graph method.

Our research is motivated by the need of a professor in medicine, who is interested in the automatic video analysis of behavior changes before and after injecting certain medicine to the lab mouse, as shown in Fig. 1. The behaviors interested includes resting, eating, exploring, and mostly importantly, grooming. In this paper, we use behaviors of the mouse-in-cage scenario for our experiments and analysis.

Section snippets

Automatic animal behavior analysis framework

Here we present a four-module framework for video animal behavior analysis: behavior modeling, feature extraction, basic behavior unit (BBU) discovery, and complex behavior analysis, as shown in Fig. 2 (see [10] for a detailed description of relationships between the four blocks enclosed in the dashed box).

Behavior modeling. We need to define, characterize, and represent the behaviors of interest in terms of three factors: physical (spatiotemporal) features; the relationship between these

Related work

In the visual surveillance literature, most of the existing techniques extract basic behaviors (or actions) directly based upon one or more features extracted (trajectory, motion, posture, etc.) from the detection and tracking results. Pattern recognition techniques (template matching, clustering analysis) are used to classify the video sequence into actions or behavior units, as discussed in the survey papers [1], [2], [3], [4]. These methods are effective in their specific applications. The

The vector fusion algorithm for BBU segmentation

In this section, we describe Johnson’s vector fusion method (denoted as SBPSingle-point Broken-line Parallel-coordinate in [12], [21]) and how we apply it in BBU discovery.

The vector fusion method is a vectorized generalization of the parallel coordinates [22] method for visualizing multi-dimensional datasets, which allows one to see any number of dimensions concurrently by arranging the coordinates parallel to each other. The vector fusion method maps a multi-variate vector into a 2D

The affinity graph method

We propose to use the affinity graph method, an unsupervised learning method to discover basic behavior units. Firstly, the spatiotemporal features are extracted from video frames, as in the Feature Extraction block, shown in Fig. 2. Then we take a subsequence (of length T) of the features extracted from video images as an element, and calculate the affinity measure between each pair of elements to construct the affinity matrix. Each element overlaps with the next element by a couple of frames,

Vector fusion for BBU discovery

We have experimented with the vector fusion method with data derived from two cases: (1) a bouncing ball, and (2) an artificial mouse.

Conclusions

We propose a framework for video based animal behavior analysis, and concentrate on feature fusion methods for BBU discovery. We have explored the vector fusion method for its application in object basic behavior unit segmentation in a temporal sequence, and presented results on a physical system and a synthetic mouse-in-a-cage scenario. The vector fusion method reduces multiple dimensional data into the 2D SBP space, and the spatial and temporal analysis in SBP space provides a good

Acknowledgments

The authors thank Bob Johnson for helpful discussions on the vector fusion method.

Xinwei Xue is currently working with Fair Isaac Corporation as an Analytic Science Scientist and he receives his Ph.D. degree in Computer Science from School of Computing, University of Utah in 2008. He got his and B.S. and M.S. degrees in Precision Instruments from Tianjin University in 1997 and 2000 respectively. His research interest includes image processing, computer vision, video-based object behavior analysis, artificial intelligence and machine learning.

References (29)

  • T. Moeslund et al.

    A survey of computer vision-based human motion capture

    Computer Vision and Image Understanding

    (2001)
  • J. Aggarval et al.

    Human motion analysis: A review

    Computer Vision and Image Understanding

    (1999)
  • L. Wang et al.

    Recent developments in human motion analysis

    Chinese Journal of Computers

    (2002)
  • W. Hu et al.

    A survey on visual surveillance of object motion and behaviors

    IEEE Transactions on Systems, Man, and Cybernetics

    (2004)
  • P. van Lochem et al.

    Automatic recognition of behavioral patterns of rats using video imaging and statistical classification

  • L. Noldus et al.

    Ethovision: A versatile video tracking system for automation of behavioral experiments

    Behavior Research Methods, Instruments, & Computers

    (2001)
  • O.C. Jenkins, M.J. Mataric, Deriving action and behavior primitives from human motion data, in: Proc. IEEE/RSJ Int....
  • A. Fod et al.

    Automated derivation of primitives for movement classification

    Autonomous Robots

    (2002)
  • J. Barbic, A. Safonova, J.-Y. Pan, C. Faloutsos, J.K. Hodgins, N.S. Polland, Segmenting motion capture data into...
  • X. Xue, T.C. Henderson, Video-based animal behavior analysis, University of Utah, TechReport UUCS-06-006, June...
  • T.C. Henderson, X. Xue, Construct complex behaviors: A simulation study, in ISCA 18th Intl. Conf. on Computer...
  • R. Johnson, Visualization of multi-dimensional data with vector fusion, in: IEEE Proc. Visualization, 2000, pp....
  • L. Zelnik-Manor, M. Irani, Event-based analysis of video, in: Proc. IEEE CVPR, Hawaii,...
  • F. Porikli, T. Haga, Event detection by eigenvector decomposition using object and frame features, in: Workshop on...
  • Cited by (13)

    View all citing articles on Scopus

    Xinwei Xue is currently working with Fair Isaac Corporation as an Analytic Science Scientist and he receives his Ph.D. degree in Computer Science from School of Computing, University of Utah in 2008. He got his and B.S. and M.S. degrees in Precision Instruments from Tianjin University in 1997 and 2000 respectively. His research interest includes image processing, computer vision, video-based object behavior analysis, artificial intelligence and machine learning.

    Thomas C. Henderson received his B.S in Math with Honors from Louisiana State University in 1973 and his Ph.D. in Computer Science from the University of Texas at Austin in 1979. He is currently a full Professor in the School of Computing at the University of Utah. He has been at Utah since 1982, and was a visiting professor at DLR in Germany in 1980, and at INRIA in France in 1981 and 1987, and at the University of Karlsruhe, Germany in 2003. Prof. Henderson was chairman of the Department of Computer Science at Utah from 1991–1997, and was the founding Director of the School of Computing from 2000–2003. Prof. Henderson is the author of Discrete Relaxation Techniques (University of Oxford Press), and editor of Traditional and Non-Traditional Robotic Sensors (Springer-Verlag); he served for 15 years as Co-Editor-in-Chief of the Journal of Robotics and Autonomous Systems and was an Associate Editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence and IEEE Transactions on Robotics and Automation. His research interests include autonomous agents, robotics and computer vision, and his ultimate goal is to help realize functional androids. He has produced over 200 scholarly publications, and has been principal investigator on over $8M in research funding. Prof. Henderson is a Fellow of the IEEE, and received the Governor’s Medal for Science and Technology in 2000. He enjoys good dinners with friends, reading, playing basketball and hiking.

    View full text