Semantic Concept Detection Using Dense Codeword Motion

Tănase, Claudiu; Mérialdo, Bernard

doi:10.1007/978-3-319-02895-8_63

Claudiu Tănase²¹ &
Bernard Mérialdo²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8192))

Included in the following conference series:

International Conference on Advanced Concepts for Intelligent Vision Systems

3222 Accesses

Abstract

When detecting semantic concepts in video, much of the existing research in content-based classification uses keyframe information only. Particularly the combination between local features such as SIFT and the Bag of Words model is very popular with TRECVID participants. The few existing motion and spatiotemporal descriptors are computationally heavy and become impractical when applied on large datasets such as TRECVID. In this paper, we propose a way to efficiently combine positional motion obtained from optic flow in the keyframe with information given by the Dense SIFT Bag of Words feature. The features we propose work by spatially binning motion vectors belonging to the same codeword into separate histograms describing movement direction (left, right, vertical, zero, etc.). Classifiers are mapped using the homogeneous kernel map techinque for approximating the χ2 kernel and then trained efficiently using linear SVM. By using a simple linear fusion technique we can improve the Mean Average Precision of the Bag of Words DSIFT classifier on the TRECVID 2010 Semantic Indexing benchmark from 0.0924 to 0.0972, which is confirmed to be a statistically significant increase based on standardized TRECVID randomization tests.

The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-319-02895-8_64

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chen, M., Hauptmann, A.: Mosift: Recognizing human actions in surveillance videos (2009)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. The Journal of Machine Learning Research 9, 1871–1874 (2008)
MATH Google Scholar
Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003)
Chapter Google Scholar
Gorisse, D., Precioso, F.: IRIM at TRECVID 2010: Semantic Indexing and Instance Search. In: TREC Online Proceedings, Gaithersburg, United States. gDR ISIS (November 2010)
Google Scholar
Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 494–507. Springer, Heidelberg (2010)
Chapter Google Scholar
Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence (2011), http://hal.inria.fr/inria-00633013
Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: Tenth IEEE International Conference on Computer Vision, ICCV 2005, vol. 1, pp. 604–610. IEEE (2005)
Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proceedings of the Ninth IEEE International Conference on Computer Vision 2003, vol. 1, pp. 432–439 (October 2003)
Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision 1999, vol. 2, pp. 1150–1157. IEEE (1999)
Google Scholar
Over, P., Awad, G., Fiscus, J., Antonishek, B., Michel, M., Smeaton, A., Kraaij, W., Quénot, G., et al.: An overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2011-TREC Video Retrieval Evaluation Online (2011)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: Machine learning in python. The Journal of Machine Learning Research 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008)
Google Scholar
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3539–3546. IEEE ( (2010)
Google Scholar
Wang, F., Jiang, Y.G., Ngo, C.W.: Video event detection using motion relativity and visual relatedness. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 239–248. ACM (2008)
Google Scholar
Wang, H., Klaser, A., Schmid, C., Liu, C.: Action recognition by dense trajectories. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)
Google Scholar
Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C., et al.: Evaluation of local spatio-temporal features for action recognition. In: BMVC 2009-British Machine Vision Conference (2009)
Google Scholar
Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

EURECOM, Campus SophiaTech, 450 Route des Chappes, 06410, Biot, France
Claudiu Tănase & Bernard Mérialdo

Authors

Claudiu Tănase
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Mérialdo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DGA, 7-9 rue des mathurins, 92 221, Bagneux, France
Jacques Blanc-Talon
Institute of Control and Information Engineering, Poznań University of Technology, Piotrowo 3A, 60-965, Poznań, Poland
Andrzej Kasinski
Telecommunications and Information Processing (TELIN), Ghent University, St.-Pietersnieuwstraat 41, 9000, Ghent, Belgium
Wilfried Philips
CSIRO ICT Centre, Epping, Po Box 76, 1710, Sydney, NSW, Australia
Dan Popescu
University of Antwerp, Universiteitsplein 1, Building N., 2610, Wilrijk, Antwerp, Belgium
Paul Scheunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tănase, C., Mérialdo, B. (2013). Semantic Concept Detection Using Dense Codeword Motion. In: Blanc-Talon, J., Kasinski, A., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2013. Lecture Notes in Computer Science, vol 8192. Springer, Cham. https://doi.org/10.1007/978-3-319-02895-8_63

Download citation

DOI: https://doi.org/10.1007/978-3-319-02895-8_63
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02894-1
Online ISBN: 978-3-319-02895-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics