Efficient Visual Content Retrieval and Mining in Videos

Sivic, Josef; Zisserman, Andrew

doi:10.1007/978-3-540-30542-2_58

Josef Sivic¹⁹ &
Andrew Zisserman¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3332))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

756 Accesses
4 Citations

Abstract

We describe an image representation for objects and scenes consisting of a configuration of viewpoint covariant regions and their descriptors. This representation enables recognition to proceed successfully despite changes in scale, viewpoint, illumination and partial occlusion. Vector quantization of these descriptors then enables efficient matching on the scale of an entire feature film.

We show two applications. The first is to efficient object retrieval where the technology of text retrieval, such as inverted file systems, can be employed at run time to return all shots containing the object in a manner, and with a speed, similar to a Google search for text. The object is specified by a user outlining it in an image, and the object is then delineated in the retrieved shots.

The second application is to data mining. We obtain the principal objects, characters and scenes in a video by measuring the reoccurrence of these spatial configurations of viewpoint covariant regions.

The applications are illustrated on two full length feature films.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aner, A., Kender, J.R.: Video summaries through mosaic-based shot and scene clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 388–402. Springer, Heidelberg (2002)
Chapter Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999), ISBN: 020139829
Google Scholar
Boujemaa, N., Fauqueur, J., Gouet, V.: What’s beyond query by example? In: Trends and Advances in Content-Based Image and Video Retrieval (2004)
Google Scholar
Gong, Y., Liu, X.: Generating optimal video summaries. In: IEEE Intl. Conf. on Multimedia and Expo (III), pp. 1559–1562 (2000)
Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. In: Proc. ICCV, pp. 1150–1157 (September 1999)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proc. BMVC, pp. 384–393 (2002)
Google Scholar
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: Proc. ICCV (2001)
Google Scholar
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Chapter Google Scholar
Obdrzalek, S., Matas, J.: Object recognition using local affine frames on distinguished regions. In: Proc. BMVC, pp. 113–122 (2002)
Google Scholar
Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3d object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In: Proc. CVPR (2003)
Google Scholar
Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or “How do I organize my holiday snaps? In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 414–431. Springer, Heidelberg (2002)
Chapter Google Scholar
Schmid, C., Mohr, R.: Local greyvalue invariants for image retrieval. IEEE PAMI 19(5), 530–534 (1997)
Google Scholar
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. ICCV (October 2003)
Google Scholar
Sivic, J., Zisserman, A.: Video data mining using configurations of viewpoint invariant regions. In: Proc. CVPR (2004)
Google Scholar
Squire, D.M., Müller, W., Müller, H., Pun, T.: Content-based query of image databases: inspirations from text retrieval. Pattern Recognition Letters 21, 1193–1198 (2000)
Article MATH Google Scholar
Tseng, B., Lin, C.-Y., Smith, J.R.: Video personalization and summarization system. In: MMSP (2002)
Google Scholar
Tuytelaars, T., Van Gool, L.: Wide baseline stereo matching based on local, affinely invariant regions. In: Proc. BMVC, pp. 412–425 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Robotics Research Group, Department of Engineering Science, University of Oxford,
Josef Sivic & Andrew Zisserman

Authors

Josef Sivic
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-8656, Tokyo, Japan
Kiyoharu Aizawa
Tokyo Research Laboratory, IBM Research, 1623-14 Shimo-tsuruma, 242-0001, Yamato, Kanagawa, Japan
Yuichi Nakamura
National Institute of Informatics, Tokyo, Japan
Shin’ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sivic, J., Zisserman, A. (2004). Efficient Visual Content Retrieval and Mining in Videos. In: Aizawa, K., Nakamura, Y., Satoh, S. (eds) Advances in Multimedia Information Processing - PCM 2004. PCM 2004. Lecture Notes in Computer Science, vol 3332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30542-2_58

Download citation

DOI: https://doi.org/10.1007/978-3-540-30542-2_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23977-2
Online ISBN: 978-3-540-30542-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics