An efficient view invariant framework for the recognition of human activities from an input video
sequence is presented. The proposed framework is composed of three consecutive modules: (i) detect and
locate people by background subtraction, (ii) view invariant spatiotemporal template creation for different activities,
(iii) and finally, template matching is performed for view invariant activity recognition. The foreground
objects present in a scene are extracted using change detection and background modeling. The view invariant
templates are constructed using the motion history images and object shape information for different human
activities in a video sequence. For matching the spatiotemporal templates for various activities, the moment
invariants and Mahalanobis distance are used. The proposed approach is tested successfully on our own
viewpoint dataset, KTH action recognition dataset, i3DPost multiview dataset, MSR viewpoint action dataset,
VideoWeb multiview dataset, and WVU multiview human action recognition dataset. From the experimental
results and analysis over the chosen datasets, it is observed that the proposed framework is robust, flexible,
and efficient with respect to multiple views activity recognition, scale, and phase variations.
|