Abstract
We present an approach for compact video summaries that allows fast and direct access to video data. The video is segmented into shots and, in appropriate video genres, into scenes, using previously proposed methods. A new concept that supports the hierarchical representation of video is presented, and is based on physical setting and camera locations. We use mosaics to represent and cluster shots, and detect appropriate mosaics to represent scenes. In contrast to approaches to video indexing which are based on key-frames, our efficient mosaic-based scene representation allows fast clustering of scenes into physical settings, as well as further comparison of physical settings across videos. This enables us to detect plots of different episodes in situation comedies and serves as a basis for indexing whole video sequences. In sports videos where settings are not as well defined, our approach allows classifying shots for characteristic event detection. We use a novel method for mosaic comparison and create a highly compact non-temporal representation of video. This representation allows accurate comparison of scenes across different videos and serves as a basis for indexing video libraries.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
A. Aner and J. R. Kender. A unified memory-based approach to cut, dissolve, key frame and scene analysis. In ICIP, 2001.
D. Arijon. Grammar of the Film Language. Silman-James Press, 1976.
M. Gelgon and P. Bouthemy. Comparison of automatic shot boundary detection algorithms. In ECCV, 1998.
R. C. Gonzalez and R. E. Woods. Digital Image Processing. Addison Wesley, 1993.
A. Hanjalic, R. L. Lagendijk, and J. Biemond. Automated high-level movie segmentation for advanced video retrieval systems. In IEEE Transactions on Circuits and Systems for Video Technology, volume 9, Jun. 1999.
M. Irani and P. Anandan. Video indexing based on mosaic representations. In Proceedings of the IEEE, volume 86, 1998.
M. Irani, P. Anandan, J. Bergenand R. Kumar, and S. Hsu. Efficient representation of video sequences and their applications. In Signal processing: Image Communication, volume 8, 1996.
Anil K. Jain and Richard C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ, 1988.
J. R. Kender and B.L. Yeo. Video scene segmentation via continuous video coherence. In CVPR, 1998.
R. Lienhart. Determining a structured spatio-temporal representation of video content for efficient visualisation and indexing. In SPIE Storage and Retrieval for Image and Video Databases VII, volume 3656, 1999.
S. Nepal, U. Srinivasan, and G. Reynolds. Automatic detection of’ goal’ segments in basketball videos. In ACM Multimedia, 2001.
J. Oh, K. A. Hua, and N. Liang. Scene change detection in a MPEG compressed video sequence. In In SPIE Multimedia Computing and Networking, Jan. 2000.
G. Salton and M. McGill. Introduction to modern information retrieval. New York: McGraw-Hill, 1983.
F. Schaffalitzky and A. Zisserman. Viewpoint invariant texture matching and wide baseline stereo. In ICCV, 2001.
R. Szeliski and S. Heung-Yeung. Creating full-view panoramic image mosaics and environment maps. In SIGGRAPH, 1997.
N. Vasconcelos. A spatiotemporal motion model for video summarization. In CVPR, 1998.
M. Yeung and B. Liu. Efficient matching and clustering of video shots. In ICIP, 1995.
M. Yeung and B.L. Yeo. Time-constrained clustering for segmentation of video into story units. In ICPR, 1996.
A. Zomet, S. Peleg, and C. Arora. Rectified mosaicing: Mosaics without the curl. In CVPR, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aner, A., Kender, J.R. (2002). Video Summaries through Mosaic-Based Shot and Scene Clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds) Computer Vision — ECCV 2002. ECCV 2002. Lecture Notes in Computer Science, vol 2353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47979-1_26
Download citation
DOI: https://doi.org/10.1007/3-540-47979-1_26
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43748-2
Online ISBN: 978-3-540-47979-6
eBook Packages: Springer Book Archive