Skip to main content
Log in

Spatio-Temporal Tube data representation and Kernel design for SVM-based video object retrieval system

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this article, we propose a new video object retrieval system. Our approach is based on a Spatio-Temporal data representation, a dedicated kernel design and a statistical learning toolbox for video object recognition and retrieval. Using state-of-the-art video object detection algorithms (for faces or cars, for example) we segment video object tracks from real movies video shots. We then extract, from these tracks, sets of spatio-temporally coherent features that we call Spatio-Temporal Tubes. To compare these complex tube objects, we design a Spatio-Temporal Tube Kernel (STTK) function. Based on this kernel similarity we present both supervised and active learning strategies embedded in Support Vector Machine framework. Additionally, we propose a multi-class classification framework dealing with unbalanced data. Our approach is successfully evaluated on two real movies databases, the french movie “L’esquive” and episodes from “Buffy, the Vampire Slayer” TV series. Our method is also tested on a car database (from real movies) and shows promising results for car identification task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Apostoloff NE, Zisserman A (2007) Who are you? Real-time person identification. In: BMVC

  2. Cour T, Sapp B, Jordan C, Taskar B (2009) Learning from ambiguously labeled images. In: CVPR

  3. Ekenel HK, Stiefelhagen R (2009) Why is facial occlusion a challenging problem? In: Intl. conf. on biometrics (ICB’09). LNCS, vol 5558. Alghero, Italy, pp 299–308

    Google Scholar 

  4. Everingham M, Sivic J, Zisserman A (2006) Hello! my name is... Buffy—automatic naming of characters in tv video. In: BMVC

  5. Gosselin PH, Cord M (2008) Active learning methods for interactive image retrieval. IEEE Trans Image Process 17(7):1200–1211

    Article  MathSciNet  Google Scholar 

  6. Guillaumin M, Mensink T, Verbeek J, Schmid C (2008) Automatic face naming with caption-based supervision. In: CVPR, pp 1–8

  7. Kapoor A, Grauman K, Urtasun R, Darrell T (2007) Active learning with Gaussian processes for object categorization. In: ICCV

  8. Kumar N, Belhumeur P, Nayar SK (2008) Face tracer: a search engine for large collections of images with faces. In: ECCV

  9. Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: ICIP, vol 1, pp I–900–I–903

  10. Lowe D (2003) Distinctive image features from scale-invariant keypoints. In: IJCV, vol 20, pp 91–110

  11. Lyu S (2004) Mercer kernels for object recognition with local features. In: Technical report TR2004-520. Dartmouth College

  12. Morik K, Brockhausen P, Joachims T (1999) Combining statistical learning with a knowledge-based approach—a case study in intensive care monitoring. In: ICML, pp 268–277

  13. Osuna EE, Freund R, Girosi F (1997) Support vector machines: training and applications. Tech. rep., AI Memo 1602, MIT

  14. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. MIT Press, Cambridge

    Google Scholar 

  15. Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: ICML

  16. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  17. Sivic J, Everingham M, Zisserman A (2009) “Who are you?”—learning person specific classifiers from video. In: CVPR

  18. Tong S, Koller D (2001) Support vector machine active learning with application to text classification. JMLR 2:45–66

    Article  Google Scholar 

  19. Vedaldi A. http://www.vlfeat.org/~vedaldi/code/siftpp.html

  20. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: CVPR

  21. Wallraven C, Caputo B, Graf A (2003) Recognition with local features: the kernel recipe. In: ICCV, vol 2, pp 257–264

  22. Wu G, Chang E (2003) Class-boundary alignment for imbalanced dataset learning

  23. Yan R, Yang J, Hauptmann A (2003) Automatically labeling video data using multi-class active learning. In: ICCV

  24. Zhao S, Precioso F, Cord M, Philipp-Foliguet S (2008) Actor retrieval system based on kernels on bags of bags. In: EUSIPCO, Lausanne, Switzerland

    Google Scholar 

  25. Zhao S, Precioso F, Cord M (2009) Spatio-temporal tube kernel for actor retrieval. In: ICIP, Cairo, Egypt

Download references

Acknowledgements

We want here to thank a lot, Andrew Zisserman and Josef Sivic for providing us the data to compare our results with theirs and for the very interesting exchanges we had on this work. We also want to thank Philippe-Henri Gosselin for providing the codes of kernel-based SVM with active learning within the retrieval system RETIN.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuji Zhao.

Additional information

This work is funded by Région Île-de-France, project k-VideoScan 2007-34HD Digiteo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, S., Precioso, F. & Cord, M. Spatio-Temporal Tube data representation and Kernel design for SVM-based video object retrieval system. Multimed Tools Appl 55, 105–125 (2011). https://doi.org/10.1007/s11042-010-0602-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0602-3

Keywords

Navigation