Abstract
This paper addresses the problem of retrieving video sequences that contain a spatio-temporal pattern queried by a user. To achieve this, the visual content of each video sequence is first decomposed through the analysis of its local feature dynamics. Camera motion of the sequence, background and objects present in the captured scene and events occurring within it are represented respectively by the parameters of the estimated global motion model, the appearance of the extracted local features and their trajectories. At query-time, a probabilistic model of the visual pattern is estimated from the user interaction, captured through a relevance-feedback loop. We show that the method permits to efficiently retrieve video sequences that share, even partially, a spatio-temporal pattern.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)
Bruno, E., Moënne-Loccoz, N., Marchand-Maillet, S.: Learning user queries in multimodal dissimilarity spaces. In: Detyniecki, M., Jose, J.M., Nürnberger, A., van Rijsbergen, C.J.K. (eds.) AMR 2005. LNCS, vol. 3877, pp. 168–179. Springer, Heidelberg (2006)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (June 2003)
Fisher, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 381–395 (1981)
Förstner, W.: A feature-based correspondence algorithm for image matching. Int. Arch. Photogrammetry and Remote Sensing 26, 150–166 (1986)
Harris, C., Stephens, M.: A combined corner and edge detector. In: 4th Alvey Vision Conference, pp. 189–192 (1988)
Janvier, B., Bruno, E., Marchand-Maillet, S., Pun, T.: Information-theoretic framework for the joint temporal partioning and representation of video data. In: Proceedings of the European Conference on Content-based Multimedia Indexing, CBMI 2003 (September 2003)
Kadir, T., Zisserman, A., Brady, M.: An affine invariant salient region detector. In: Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic (May 2004)
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research Logistics Quaterly 2, 83–97 (1955)
Lazebnik, S., Schmid, C., Ponce, J.: Learning local affine-invariant part models for object class recognition. In: Workshop on Learning, Snowbird, Utah (2004)
Li, F., Fergus, R., Perona, P.: A bayesian approach to unsupervised one-shot learning of object categories. In: Ninth IEEE International Conference on Computer Vision (ICCV), vol. 2, p. 1134 (2003)
Lindeberg, T.: Feature detection with automatic scale selection. International Journal of Computer Vision 30(2), 77–116 (1998)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proc. of the International Conference on Computer Vision ICCV, Corfu., pp. 1150–1157 (1999)
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: 8th Internationnal Conference on Computer Vision, pp. 525–531 (2001)
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: European Conference on Computer Vision, Copenhagen, pp. 128–142. Springer, Heidelberg (2002)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (2003)
Moënne-Loccoz, N., Janvier, B., Marchand-Maillet, S., Bruno, E.: Managing video collections at large. In: Proceedings of the First Workshop on Computer Vision Meets Databases, CVDB 2004, Paris, France (2004)
Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: Segmenting, modeling and matching video clips containing multiple moving objects. In: IEEE Conference on Computer Vision, vol. 2, pp. 914–921 (2004)
Shi, J., Tomasi, C.: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1994), Seattle (June 1994)
Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proceedings of the International Conference on Computer Vision (October 2003)
Tian, Q., Sebe, N., Lew, M.S., Loupias, E., Huang, T.S.: Image retrieval using wavelet-based salient points. Journal of Electronic Imaging, Special Issue on Storage and Retrieval of Digital Media, 835–849 (2001)
Torr, P.H.S., Zisserman, A.: Feature based methods for structure and motion estimation. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) ICCV-WS 1999. LNCS, vol. 1883, pp. 278–294. Springer, Heidelberg (2000)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moënne-Loccoz, N., Bruno, E., Marchand-Maillet, S. (2006). Interactive Retrieval of Video Sequences from Local Feature Dynamics. In: Detyniecki, M., Jose, J.M., Nürnberger, A., van Rijsbergen, C.J. (eds) Adaptive Multimedia Retrieval: User, Context, and Feedback. AMR 2005. Lecture Notes in Computer Science, vol 3877. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11670834_11
Download citation
DOI: https://doi.org/10.1007/11670834_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32174-3
Online ISBN: 978-3-540-32175-0
eBook Packages: Computer ScienceComputer Science (R0)