Abstract:
Bag-of-Word scheme has almost become de rigueur for event recognition tasks due to its robustness and simplicity. Despite its effectiveness, this technique discards spati...View moreMetadata
Abstract:
Bag-of-Word scheme has almost become de rigueur for event recognition tasks due to its robustness and simplicity. Despite its effectiveness, this technique discards spatial and temporal relationships between codewords. This paper tackles the problem of building a video codeword representation that captures such relationships. We developed a new method that harnesses spatio-temporal boundaries and discriminative codeword co-occurrences. Given a set of videos and their corresponding quantized features, the video is first decomposed in spatio-temporal volumes according to a multi-scale video segmentation algorithm. Meaningful codeword co-occurrences are then extracted within each volume and videos are then represented with histograms of co-occurring features. The set of histograms is finally fed to an SVM for classification. Evaluation under the realistic TRECVID MED11 challenge database validates the approach.
Date of Conference: 24-26 March 2014
Date Added to IEEE Xplore: 23 June 2014
Electronic ISBN:978-1-4799-4985-4
Print ISSN: 1550-5790