Abstract
We describe an image representation for objects and scenes consisting of a configuration of viewpoint covariant regions and their descriptors. This representation enables recognition to proceed successfully despite changes in scale, viewpoint, illumination and partial occlusion. Vector quantization of these descriptors then enables efficient matching on the scale of an entire feature film.
We show two applications. The first is to efficient object retrieval where the technology of text retrieval, such as inverted file systems, can be employed at run time to return all shots containing the object in a manner, and with a speed, similar to a Google search for text. The object is specified by a user outlining it in an image, and the object is then delineated in the retrieved shots.
The second application is to data mining. We obtain the principal objects, characters and scenes in a video by measuring the reoccurrence of these spatial configurations of viewpoint covariant regions.
The applications are illustrated on two full length feature films.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aner, A., Kender, J.R.: Video summaries through mosaic-based shot and scene clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 388–402. Springer, Heidelberg (2002)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999), ISBN: 020139829
Boujemaa, N., Fauqueur, J., Gouet, V.: What’s beyond query by example? In: Trends and Advances in Content-Based Image and Video Retrieval (2004)
Gong, Y., Liu, X.: Generating optimal video summaries. In: IEEE Intl. Conf. on Multimedia and Expo (III), pp. 1559–1562 (2000)
Lowe, D.: Object recognition from local scale-invariant features. In: Proc. ICCV, pp. 1150–1157 (September 1999)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proc. BMVC, pp. 384–393 (2002)
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: Proc. ICCV (2001)
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Obdrzalek, S., Matas, J.: Object recognition using local affine frames on distinguished regions. In: Proc. BMVC, pp. 113–122 (2002)
Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3d object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In: Proc. CVPR (2003)
Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or “How do I organize my holiday snaps? In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 414–431. Springer, Heidelberg (2002)
Schmid, C., Mohr, R.: Local greyvalue invariants for image retrieval. IEEE PAMI 19(5), 530–534 (1997)
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. ICCV (October 2003)
Sivic, J., Zisserman, A.: Video data mining using configurations of viewpoint invariant regions. In: Proc. CVPR (2004)
Squire, D.M., Müller, W., Müller, H., Pun, T.: Content-based query of image databases: inspirations from text retrieval. Pattern Recognition Letters 21, 1193–1198 (2000)
Tseng, B., Lin, C.-Y., Smith, J.R.: Video personalization and summarization system. In: MMSP (2002)
Tuytelaars, T., Van Gool, L.: Wide baseline stereo matching based on local, affinely invariant regions. In: Proc. BMVC, pp. 412–425 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sivic, J., Zisserman, A. (2004). Efficient Visual Content Retrieval and Mining in Videos. In: Aizawa, K., Nakamura, Y., Satoh, S. (eds) Advances in Multimedia Information Processing - PCM 2004. PCM 2004. Lecture Notes in Computer Science, vol 3332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30542-2_58
Download citation
DOI: https://doi.org/10.1007/978-3-540-30542-2_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23977-2
Online ISBN: 978-3-540-30542-2
eBook Packages: Computer ScienceComputer Science (R0)