Abstract
In this article, we propose a new video object retrieval system. Our approach is based on a Spatio-Temporal data representation, a dedicated kernel design and a statistical learning toolbox for video object recognition and retrieval. Using state-of-the-art video object detection algorithms (for faces or cars, for example) we segment video object tracks from real movies video shots. We then extract, from these tracks, sets of spatio-temporally coherent features that we call Spatio-Temporal Tubes. To compare these complex tube objects, we design a Spatio-Temporal Tube Kernel (STTK) function. Based on this kernel similarity we present both supervised and active learning strategies embedded in Support Vector Machine framework. Additionally, we propose a multi-class classification framework dealing with unbalanced data. Our approach is successfully evaluated on two real movies databases, the french movie “L’esquive” and episodes from “Buffy, the Vampire Slayer” TV series. Our method is also tested on a car database (from real movies) and shows promising results for car identification task.
Similar content being viewed by others
References
Apostoloff NE, Zisserman A (2007) Who are you? Real-time person identification. In: BMVC
Cour T, Sapp B, Jordan C, Taskar B (2009) Learning from ambiguously labeled images. In: CVPR
Ekenel HK, Stiefelhagen R (2009) Why is facial occlusion a challenging problem? In: Intl. conf. on biometrics (ICB’09). LNCS, vol 5558. Alghero, Italy, pp 299–308
Everingham M, Sivic J, Zisserman A (2006) Hello! my name is... Buffy—automatic naming of characters in tv video. In: BMVC
Gosselin PH, Cord M (2008) Active learning methods for interactive image retrieval. IEEE Trans Image Process 17(7):1200–1211
Guillaumin M, Mensink T, Verbeek J, Schmid C (2008) Automatic face naming with caption-based supervision. In: CVPR, pp 1–8
Kapoor A, Grauman K, Urtasun R, Darrell T (2007) Active learning with Gaussian processes for object categorization. In: ICCV
Kumar N, Belhumeur P, Nayar SK (2008) Face tracer: a search engine for large collections of images with faces. In: ECCV
Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: ICIP, vol 1, pp I–900–I–903
Lowe D (2003) Distinctive image features from scale-invariant keypoints. In: IJCV, vol 20, pp 91–110
Lyu S (2004) Mercer kernels for object recognition with local features. In: Technical report TR2004-520. Dartmouth College
Morik K, Brockhausen P, Joachims T (1999) Combining statistical learning with a knowledge-based approach—a case study in intensive care monitoring. In: ICML, pp 268–277
Osuna EE, Freund R, Girosi F (1997) Support vector machines: training and applications. Tech. rep., AI Memo 1602, MIT
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. MIT Press, Cambridge
Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: ICML
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Sivic J, Everingham M, Zisserman A (2009) “Who are you?”—learning person specific classifiers from video. In: CVPR
Tong S, Koller D (2001) Support vector machine active learning with application to text classification. JMLR 2:45–66
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: CVPR
Wallraven C, Caputo B, Graf A (2003) Recognition with local features: the kernel recipe. In: ICCV, vol 2, pp 257–264
Wu G, Chang E (2003) Class-boundary alignment for imbalanced dataset learning
Yan R, Yang J, Hauptmann A (2003) Automatically labeling video data using multi-class active learning. In: ICCV
Zhao S, Precioso F, Cord M, Philipp-Foliguet S (2008) Actor retrieval system based on kernels on bags of bags. In: EUSIPCO, Lausanne, Switzerland
Zhao S, Precioso F, Cord M (2009) Spatio-temporal tube kernel for actor retrieval. In: ICIP, Cairo, Egypt
Acknowledgements
We want here to thank a lot, Andrew Zisserman and Josef Sivic for providing us the data to compare our results with theirs and for the very interesting exchanges we had on this work. We also want to thank Philippe-Henri Gosselin for providing the codes of kernel-based SVM with active learning within the retrieval system RETIN.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is funded by Région Île-de-France, project k-VideoScan 2007-34HD Digiteo.
Rights and permissions
About this article
Cite this article
Zhao, S., Precioso, F. & Cord, M. Spatio-Temporal Tube data representation and Kernel design for SVM-based video object retrieval system. Multimed Tools Appl 55, 105–125 (2011). https://doi.org/10.1007/s11042-010-0602-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0602-3