Efficient motion estimation methods for fast recognition of activities of daily living
Introduction
The recognition of Activities of Daily Living (ADLs) has drawn significant research attention in the computer vision community, as their monitoring can provide valuable information for applications such as assisted living, remote healthcare, lifestyle and behavioral pro ling. To this end, e orts are being made to design algorithms of ADL recognition which are both accurate and computationally efficient. While a plethora of activity recognition methods exist, most overlook the importance of computational and compression efficiency, focusing only on recognition accuracy. The main bottleneck of current State-of-the-Art (SoA) works [1], [2], [3], [4], [5], [6], [7] is the use of computationally expensive Optical Flow (OF) [8] for motion estimation and feature extraction.
This work addresses the issue of computational efficiency by expanding upon our previous work in [9]: computationally costly dense OF is replaced by computationally lighter Block Matching (BM) and MPEG encoded motion vectors for activity recognition. The motion field is post-processed and its results are incorporated in a dense, trajectory-based activity recognition framework [1]. In-depth experiments on benchmark ADL video datasets compare some of the most reliable and popular OF and BM methods, as well as the most common encoded motion vectors, demonstrating that the latter increase computational efficiency at a minimal loss in recognition accuracy, compared to related work.
Recent works [10], [11], [12], [13] also used motion vectors drawn directly from the MPEG compressed video domain, resulting in a significant computational speedup (~66%) at a small reduction of recognition accuracy (~5%). We ex-tend these works by providing a thorough comparison of very popular video compression standards, applied specifically for the recognition of ADLs, in contrast to more generic works [14], [15], [16]. We also investigate various configuration parameters of MPEG video encoding, such as GOP size and the motion estimation algorithm used to examine their effect on recognition accuracy and compression efficiency (video quality and bit rate) and identify those that make a difference in the measured performance. We finally use the precomputed MPEG motion vectors to seed and accelerate the BM search. This analysis reveals trade-offs between bit rate (file size), PSNR (video quality), ADL recognition accuracy, resulting in useful guidelines to practitioners. In short, our contributions are:
- •
A framework for efficient recognition and coding of human activities in video by exploring the trade-off s at all stages: (1) video compression efficiency, (2) computational efficiency, (3) recognition accuracy.
- •
We propose and evaluate the use of existing compressed motion vectors in conjunction with BM, for improved, faster recognition accuracy and computational savings.
- •
A thorough Rate-Distortion-based comparison (bitrate-video quality) between very popular video compression standards, applied specifically to activities of daily living.
The rest of this paper is organized as follows: In Section 2 we review related SoA approaches for activity recognition and motion estimation. In Section 3 we present our activity recognition framework method in detail, including the motion estimation methods used. Several aspects of video encoding examined are detailed in Section 4, namely the effectiveness of different video codecs, and various encoding parameters. Experimental results are presented in detail in Section 5, comparing OF with BM, BM with BM seeded by MPEG vectors, the computational efficiency of all methods, and a joint performance metric. Finally, Section 6 concludes this paper and addresses our plans for future work.
Section snippets
Activity recognition methods
Numerous approaches have been developed in recent years for activity recognition. Our work is closely related to methods based on trajectories of interest points [1], [3], [4], [5], [6], [7], [17], [18], [19], [20], [21], which can be roughly divided into those where interest points are sampled sparsely or densely.
The approaches of the first category extract sparse interest points via standard interest point detectors and track them over time. Messing et al. [17] tracked corner points using the
Action representation
In this section, we describe our activity representation and recognition framework, as well as the motion estimation methods used in our experiments.The overall schema of our framework is depicted in Fig. 1
MPEG video encoding
In this section, we examine the effect of MPEG video encoding on activity recognition accuracy. Initially we compare video codecs so as to choose the most appropriate one for our experiments. Then we explore the trade-offs between activity recognition accuracy, computational efficiency and compression efficiency (video quality vs bit rate). We also explore the effect of different encoding parameters to show how they affect recognition accuracy and computational/compression efficiency. Finally,
Experimental results
We have performed comprehensive experiments on uncompressed videos of human activities, to compare the effect of different motion estimation and en-coding techniques on recognition accuracy, computational efficiency for several benchmark datasets. We also provide comparisons with the SoA to determine the optimal configuration for recognizing ADLs. We revisit this discussion in Section 5.6, where we assess our results in both the compressed and uncompressed video domains using a hybrid metric,
Conclusions and future work
In this work, we proposed a complete framework for efficient recognition, processing and coding of activities of daily living (ADLs), as captured by standard 2D cameras. Our approach follows the SoA [1], [3], [20], where trajectories of tracked visual features are extracted on dense grids and recognition takes place via Fisher vectors and Support Vector Machines. In contrast to these approaches, which are based on dense OF, we use video downsampling, fast block matching motion estimation and
Acknowledgement
This work was funded by the European Commission under the 7th Frame-work Program (FP7 2007-2013), Grant agreement 288199 Dem@Care.
References (62)
- et al.
An efficient approach to content-based object retrieval in videos
Neurocomputing
(2011) - et al.
Machine learning and signal processing for human pose recovery and behavior analysis
Signal Process.
(2015) - H. Wang, A. Klaser, C. Schmid, C. Liu, Action recognition by dense trajectories, in: Proceedings of the International...
- I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in: Proceedings of the...
- et al.
Dense trajectories and motion boundary descriptors for action recognition
Int. J. Comput. Vis.
(2013) - H. Wang, C. Schmid, Action recognition with improved trajectories, in: Proceedings of the International Conference on...
- M. Jain, H. Jegou, P. Bouthemy, Better exploiting motion for better action recognition, in: Proceedings of the...
- Y.G. Jiang, Q. Dai, X. Xue, W. Liu, C. Ngo, Trajectory-based modeling of human actions with motion reference points,...
- et al.
Segmentation of moving objects by long term video analysis
IEEE Trans. Pattern Anal. Mach. Intell.
(2014) Two-frame motion estimation based on polynomial expansion
Image Anal.
(2003)
Recognition of human actions using motion history information extracted from the compressed video
Image Vis. Comput.
High-speed action recognition and localization in compressed domain videos
IEEE Trans. Circuits Syst. Video Technol.
Comparison of the coding efficiency of video coding standards – including high efficiency video coding (HEVC)
IEEE Trans. Circuits Syst. Video Technol.
Hevc: the new gold standard for video compression: how does HEVC compare with H.264/AVC?
IEEE Consum. Electron. Mag.
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
Aggregating local image descriptors into compact codes
IEEE Trans. Pattern Anal. Mach. Intell.
Large displacement optical flow: descriptor matching in variational motion estimation
IEEE Trans. Pattern Anal. Mach. Intell.
Multitask linear discriminant analysis for view invariant action recognition
IEEE Trans. Image Process.
Cited by (21)
Vehicle tracking with Kalman filter using online situation assessment
2020, Robotics and Autonomous SystemsCitation Excerpt :Fig. 4 represents a sample of the obtained paths in a video sequence. Motion Flow History (MFH) and Motion History Image (MHI) are introduced in [19] and [20] and is constructed using information vector in compressed MPEG video stream. The moving object actions are characterized by MFH and MHI representing spatio-temporal motion vector information.
Background foreground boundary aware efficient motion search for surveillance videos
2020, Signal Processing: Image CommunicationCitation Excerpt :The modified hexagon grid search (MHGS) [12] uses different mechanisms of complexity reduction in addition to hexagon patterns. The motion search complexity reduction has been extended for fast activity recognition applications [17,18]. These algorithms use different search patterns for reducing the total number of search points.
A new framework of action recognition with discriminative parts, spatio-temporal and causal interaction descriptors
2018, Journal of Visual Communication and Image RepresentationCitation Excerpt :For the past few years, the computer vision has become a new subject and is at a stage of rapid development. As a critical technology of analyzing and understanding massive heterogeneous data of video, action recognition possesses significant academic value, potential business value and huge application prospect, and all of them make it become a research focus and difficulty in the computer vision field rapidly, well then an increasing number of scholars and research institutions have carried out numerous research works successively in the related aspects [1–3]. And consequently, action recognition has been successfully applied in the human-computer interaction field, e.g., intelligent surveillance, intelligent traffic, video retrieval, robot navigation and game entertainment.
A metaplastic neural network technique for human activity recognition for Alzheimer's patients
2023, 17th International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2023 - ProceedingsDeep-learning-based human activity recognition for Alzheimer’s patients’ daily life activities assistance
2023, Neural Computing and Applications