Eigenspace-based fall detection and activity recognition from motion templates and machine learning

https://doi.org/10.1016/j.eswa.2011.11.109Get rights and content

Abstract

Automatic recognition of anomalous human activities and falls in an indoor setting from video sequences could be an enabling technology for low-cost, home-based health care systems. Detection systems based upon intelligent computer vision software can greatly reduce the costs and inconveniences associated with sensor based systems. In this paper, we propose such a software based upon a spatio-temporal motion representation, called Motion Vector Flow Instance (MVFI) templates, that capture relevant velocity information by extracting the dense optical flow from video sequences of human actions. Automatic recognition is achieved by first projecting each human action video sequence, consisting of approximately 100 images, into a canonical eigenspace, and then performing supervised learning to train multiple actions from a large video database. We show that our representation together with a canonical transformation with PCA and LDA of image sequences provides excellent action discrimination. We also demonstrate that by including both the magnitude and direction of the velocity into the MVFI, sequences with abrupt velocities, such as falls, can be distinguished from other daily human action with both high accuracy and computational efficiency. As an added benefit, we demonstrate that, once trained, our method for detecting falls is robust and we can attain real-time performance.

Introduction

Automatically determining human actions and gestures from videos or from real-time surveillance cameras has received considerable attention both in the academic literature and in commercial applications. Intelligent surveillance systems for the health care industry are particularly attractive since they promise to increase the quality of remote care as well as reduce the growing costs of present remote care methods. Indeed, due to the marked increase in the percentage of elderly persons compared to that of working age population, intelligent home surveillance systems and applications will play an important role in future personalized care systems. For the elderly, video based monitoring could provide a convenient and comprehensive detection system for anomalous behavior, such as falls or excessive inactivity.

In general, determining human motion is a difficult problem in computer vision, and there are many different approaches, including tracking the full 3D body motion with multiple cameras, to Bayesian inference tracking. A recent review by Poppe (2010), provides an updated account of several of the most successful methods. For obtaining information about more limited human motions, such as anomalous activities and falls, the full 3D tracking produces an overabundance of information at the cost of huge computation. Indeed, a more simplistic and computationally viable approach can be found from work on human gait characterization, where dimensionality reduction transforms a sequence of images into points within a canonical space for the purpose of distinguishing types of human gaits by Huang, Harris, and Nixon (1999a) and disorders such as degrees of Parkinson by Cho, Chao, Lin, and Chen (2009). More recently, other authors have described fall detection systems based upon video sequences (see Liu, Lee, & Lin, 2010) using similar techniques.

This paper describes a computer vision software system and algorithms for the detection of human activity using a canonical eigenspace transformation of a novel spatio-temporal motion templates. Machine learning algorithms are applied for discerning the following common activities: walking, walking exaggerated, jogging, bending over, lying down, and falling. We show that fall detection can be accomplished with considerable accuracy without the use of sensors nor a full reconstruction of the 3D human posture. Thus, it is an effective and inexpensive method that can be implemented for real-time monitoring.

In particular, we introduce a new representation, denoted MVFI (Motion Vector Flow Instances) templates, which together with eigenspace methods, provide robust detection for a wide class of indoor human motions. The MVFI template encodes the both the size and direction of the optical flow vector from each frame of a motion sequence. Instead of coloring an entire block the same color (as suggested by the MFH (motion flow history) in Venkatesh Babu & Ramakrishnan (2004), we represent the size of the box in x and y directions independently. This allows us to distinguish vertical and horizontal movements with high precision, which is what is needed for a fall detection algorithm. In our method, the MVFI templates are extracted and projected into a canonical Eigenspace. The projected image templates are used to train LDA classifiers for recognizing a set of six human actions. The technique works well because it is specifically sensitive to large horizontal and vertical velocities, as encountered in falls.

Section snippets

Related work

Motivation for fall detection can be found in recent work by Larson & Bergmann (2008), who described the etiology of falls in the elderly. The health risk assessment of falls in the elderly has been studied by Moylan & Binder (2007), and provides ample statistics for demonstrating the seriousness of falls as a major health risk. For example, nearly one third of people over 65 years of age fall each year, and of those 10–15% result in serious injury. More important, 75% of those with fractures do

Theory and algorithms

The PCA based eigenspace method we have used consist of several coordinated steps in order to train our system for automatically detecting falls and actions from video sequences. First a set of canonical transformations (consisting of PCA and LDA) dramatically reduces the dimension of a sequence of images to a set of points in a multidimensional space, albeit significantly reduced in size. The principle workflow of the system is shown in Fig. 1. Supervised learning consists of training the

Experimental results

We performed multiclass training between all possible combinations of human actions in this study, for NA = 2, 3, 4, 5, 6. Moreover, for each multiclass training combination and motion template, we performed an N- fold cross validation between all video sequences in our dataset obtained from different people as described in the previous section. From these extensive tests, we could understand how average recognition results are effected by adding sequences from different people to the training.

Conclusions and further research

For the cases we considered, our recognition rates are consistent with those from similar work, as recently reported in Ahmad & Lee (2010). Nonetheless, a direct comparison would require comparison of the reported methods on our datasets as well as the same method for quoting the recognition rates as in this paper to make the comparison meaningful.

This paper has compared two different motion templates described previously with a new spatio-temporal motion template, MVFI, which we have proposed

References (36)

  • R. Venkatesh Babu et al.

    Recognition of human actions using motion history information extracted from the compressed video

    Image and Vision Computing

    (2004)
  • G. Wu

    Distinguishing fall activities from normal activities by velocity characteristics

    Journal of Biomechanics

    (2000)
  • M.A.R. Ahad et al.

    Temporal motion recognition and segmentation approach

    International Journal on Imaging Systems and Technologies

    (2009)
  • Bobick, A.F., & Davis, J.W. (1996). An appearance-based representation of action. In Proceedings of the 13th...
  • A.F. Bobick et al.

    The recognition of human movement using temporal templates

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • Bourke, A., O’Donovan, K., & OLaighin, G. (2007). Distinguishing falls from normal ADL using vertical velocity...
  • Bradski, G. (2000). The OpenCV library. Dr. Dobb’s Journal of Software...
  • Chen, D., Bharucha, A.J., & Wactlar, H.D. (2007). Intelligent video monitoring to improve safety of older persons. In...
  • Cited by (48)

    • Mining human movement evolution for complex action recognition

      2017, Expert Systems with Applications
      Citation Excerpt :

      It has been noticed that video analysis has attracted increasing interest due to the exponential growth of video data over the recent years. The researches mostly focus on human action recognition in consideration of many relevant real-life applications, such as smartphone sensors (Ronao & Cho, 2016), video surveillance system (Kim et al., 2016), assisted living (Chaaraoui, Climent-Pérez, & Flórez-Revuelta, 2012; Olivieri, Gómez Conde, & Sobrino Vila, 2012), video retrieval (Gómez-Conde & Olivieri, 2015), and smart home applications (Banos, Damas, Pomares, Prieto, & Rojas, 2012; Diraco, Leone, & Siciliano, 2013; Wen, Zhong, & Wang, 2015). The goal of human action recognition is to classify the unlabeled video clips into basic and complex human activities such as walking, boxing, and hand waving.

    • Privacy Preserving Human Fall Recognition Using Human Skeleton Data

      2023, IEEE International Conference on Control and Automation, ICCA
    • AIFMS Autonomous Intelligent Fall Monitoring System for the Elderly Persons

      2022, International Journal of Ambient Computing and Intelligence
    View all citing articles on Scopus
    View full text