Abstract:
Human-action recognition through local spatio-temporal features have been widely applied because of their simplicity and its reasonable computational complexity. The most...Show MoreMetadata
Abstract:
Human-action recognition through local spatio-temporal features have been widely applied because of their simplicity and its reasonable computational complexity. The most common method to represent such features is the well-known Bag-of-Words approach, which turns a Multiple-Instance Learning problem into a supervised learning one, which can be addressed by a standard classifier. In this paper, a learning framework for human-action recognition that follows the previous strategy is presented. First, spatio-temporal features are detected. Second, they are described by HOG-HOF descriptors, and then represented by a Bag of Words approach to create a feature vector representation. The resulting high dimensional features are reduced by means of a subspace-random-projection technique that is able to retain almost all the original information. Lastly, the reduced feature vectors are delivered to a classifier called Citation K-Nearest Neighborhood, especially adapted to Multiple-Instance Learning frameworks. Excellent results have been obtained, outperforming other state-of-the art approaches in a public database.
Published in: 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
Date of Conference: 25-28 August 2015
Date Added to IEEE Xplore: 26 October 2015
ISBN Information: