Skip to main content

Semantic Concept Detection Using Dense Codeword Motion

  • Conference paper
Advanced Concepts for Intelligent Vision Systems (ACIVS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8192))

  • 3222 Accesses

Abstract

When detecting semantic concepts in video, much of the existing research in content-based classification uses keyframe information only. Particularly the combination between local features such as SIFT and the Bag of Words model is very popular with TRECVID participants. The few existing motion and spatiotemporal descriptors are computationally heavy and become impractical when applied on large datasets such as TRECVID. In this paper, we propose a way to efficiently combine positional motion obtained from optic flow in the keyframe with information given by the Dense SIFT Bag of Words feature. The features we propose work by spatially binning motion vectors belonging to the same codeword into separate histograms describing movement direction (left, right, vertical, zero, etc.). Classifiers are mapped using the homogeneous kernel map techinque for approximating the χ2 kernel and then trained efficiently using linear SVM. By using a simple linear fusion technique we can improve the Mean Average Precision of the Bag of Words DSIFT classifier on the TRECVID 2010 Semantic Indexing benchmark from 0.0924 to 0.0972, which is confirmed to be a statistically significant increase based on standardized TRECVID randomization tests.

The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-319-02895-8_64

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, M., Hauptmann, A.: Mosift: Recognizing human actions in surveillance videos (2009)

    Google Scholar 

  2. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. The Journal of Machine Learning Research 9, 1871–1874 (2008)

    MATH  Google Scholar 

  3. Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Gorisse, D., Precioso, F.: IRIM at TRECVID 2010: Semantic Indexing and Instance Search. In: TREC Online Proceedings, Gaithersburg, United States. gDR ISIS (November 2010)

    Google Scholar 

  5. Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 494–507. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence (2011), http://hal.inria.fr/inria-00633013

  7. Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: Tenth IEEE International Conference on Computer Vision, ICCV 2005, vol. 1, pp. 604–610. IEEE (2005)

    Google Scholar 

  8. Laptev, I., Lindeberg, T.: Space-time interest points. In: Proceedings of the Ninth IEEE International Conference on Computer Vision 2003, vol. 1, pp. 432–439 (October 2003)

    Google Scholar 

  9. Lowe, D.: Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision 1999, vol. 2, pp. 1150–1157. IEEE (1999)

    Google Scholar 

  10. Over, P., Awad, G., Fiscus, J., Antonishek, B., Michel, M., Smeaton, A., Kraaij, W., Quénot, G., et al.: An overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2011-TREC Video Retrieval Evaluation Online (2011)

    Google Scholar 

  11. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: Machine learning in python. The Journal of Machine Learning Research 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  12. Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008)

    Google Scholar 

  13. Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3539–3546. IEEE ( (2010)

    Google Scholar 

  14. Wang, F., Jiang, Y.G., Ngo, C.W.: Video event detection using motion relativity and visual relatedness. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 239–248. ACM (2008)

    Google Scholar 

  15. Wang, H., Klaser, A., Schmid, C., Liu, C.: Action recognition by dense trajectories. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)

    Google Scholar 

  16. Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C., et al.: Evaluation of local spatio-temporal features for action recognition. In: BMVC 2009-British Machine Vision Conference (2009)

    Google Scholar 

  17. Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tănase, C., Mérialdo, B. (2013). Semantic Concept Detection Using Dense Codeword Motion. In: Blanc-Talon, J., Kasinski, A., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2013. Lecture Notes in Computer Science, vol 8192. Springer, Cham. https://doi.org/10.1007/978-3-319-02895-8_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02895-8_63

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02894-1

  • Online ISBN: 978-3-319-02895-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics