skip to main content
10.1145/1374296.1374325acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmobimediaConference Proceedingsconference-collections
research-article

An overview of video shot clustering and summarization techniques for mobile applications

Published: 18 September 2006 Publication History

Abstract

The problem of content characterization of video programmes is of great interest because video appeals to large audiences and its efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper we analyze several techniques proposed in literature for content characterization of video programmes, including movies and sports, that could be helpful for mobile media consumption. In particular we focus our analysis on shot clustering methods and effective video summarization techniques since, in the current video analysis scenario, they facilitate the access to the content and help in quick understanding of the associated semantics. First we consider the shot clustering techniques based on low-level features, using visual, audio and motion information, even combined in a multi-modal fashion. Then we concentrate on summarization techniques, such as static storyboards, dynamic video skimming and the extraction of sport highlights. Discussed summarization methods can be employed in the development of tools that would be greatly useful to most mobile users: in fact these algorithms automatically shorten the original video while preserving most events by highlighting only the important content. The effectiveness of each approach has been analyzed, showing that it mainly depends on the kind of video programme it relates to, and the type of summary or highlights we are focusing on.

References

[1]
B. Adams, "Where Does Computational Media Aesthetics Fit?," IEEE Multimedia, pp. 18--26, Apr-Jun 2003.
[2]
S. Benini, A. Bianchetti, R. Leonardi and P. Migliorati, "Extraction of Significant Video Summaries by Dendrogram Analysis", to appear in Proc. of ICIP'06, Atlanta, GA, Oct. 2006.
[3]
A. Bonzanini, R. Leonardi, and P. Migliorati, "Semantic video indexing using MPEG motion vectors," in Proc. EUSIPCO'00, pp. 147--150, Sept. 2000, Tampere, Finland.
[4]
A. Bonzanini, R. Leonardi, and P. Migliorati, "Event recognition in sport programs using low-level motion indices," in Proc. ICME'01, Aug. 2001, Tokyo, Japan.
[5]
J. Calic, N. Campbell, "Comic-like Layout of Video Summaries," in Proc. of WIAMIS'06, Seoul, South Korea, Apr. 2006.
[6]
H. S. Chang, S. S. Sull, S. U. Lee, "Efficient Video Indexing scheme for Content-Based Retrieval," IEEE Trans. on Circuits and Systems for Video Technol., vol. 9, no. 8, pp. 1269--1279, Dec. 1999.
[7]
S.-F. Chang, "The holy grail of content-based media analysis," IEEE Multimedia 9, pp. 6--10, Apr.-Jun. 2002.
[8]
P. Chang, M. Han, and Y. Gong, "Extract highlights from baseball game video with hidden markov models," in Proc. ICIP'02, pp. 609--612, Sept. 2002, Rochester, NY.
[9]
P. Chiu, A. Girgensohn, Q. Liu, "Stained-Glass Visualization for Highly Condensed Video Summaries", in Proc. of ICME'04, Taipei, Taiwan, June 2004.
[10]
A. Divakaran, R. Radhakrishnan, K. Peker, "Motion activity-based extraction of key-frames from video shots," Proc. of ICIP02, Rochester, NY, Sept. 2002.
[11]
D. DeMenthon, V. Kobla, D. Doermann, "Video Summarization by Curve Simplification," in Proc. of CVPR'98, Santa Barbara, CA, 1998.
[12]
N. Doulamis, A. Doulamis, Y. Avrithis, S. Kollias, "Video Content Representation Using Optimal Extraction of Frames and Scenes," in Proc. of ICIP'98, Chicago, IL, pp. 875--878, 1998.
[13]
R. O. Duda, P. E. Hart, D. G. Stork, "Pattern Classification", Wiley-Interscience, 2 nd ed., NY, 2001.
[14]
A. Ekin and M. Tekalp, "Automatic soccer video analysis and summarization," in Proc. SST SPIE03, Jan. 2003, CA, USA.
[15]
D. Gatica-Perez, A. Loui, M.-T. Sun, "Finding Structure in Home Video by Probabilistic Hierarchical Clustering," IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 6, pp. 539--548, June 2003.
[16]
D. Gatica-Perez, M.-T. Sun, A. Loui, "Consumer Video Structuring by Probabilistic Merging of Video Segments," in Proc. of ICME'01, Tokyo, Japan, Aug. 2001.
[17]
A. Girgenshohn, J. Boreczky, "Time-Constrained Key Frame Selection Technique," in IEEE Multimedia Computing and Systems, pp. 756--761, 1999.
[18]
Y. H. Gong, X. Liu, "Video Summarization Using Singular Value Decomposition," in Proc. of CVPR'00, vol. 2, pp. 174--180, 2000.
[19]
Y. Gong, L. Sin, C. Chuan, H. Zhang, and M. Sakauchi, "Automatic parsing of TV soccer programs," in Proc. ICMCS'95, May 1995, Washington DC, USA.
[20]
Y. Gong, X. Liu, "Video Summarization and Retrieval Using Singular Value Decomposition," ACM MM Systems Journal, vol. 9, no. 2, pp. 157--168, Aug 2003.
[21]
A. Hanjalic, "Shot-Boundary Detection: Unraveled and Resolved?," IEEE Trans. on Circ. and Syst. for Video Technol., vol. 12, no. 2, pp. 90--105, Feb. 2002.
[22]
A. Hanjalic, R. L. Lagendijk, "Automated High-Level Movie Segmentation for Advanced Video Retrieval Systems," IEEE Trans. on Circuits and Syst. on Video Technol., vol. 9, no. 4, June 1999.
[23]
A. Hanjalic, H. J. Zhang, "An Integrated Scheme for Automated Video Abstraction Based on Unsupervised Cluster-Validity Analysis," IEEE Trans. on Circuits and Syst. for Video Technol., vol. 9, no. 8, pp. 1280--1289, Dec. 1999.
[24]
T. Kawashima, K. Takeyama, T. Iijima, and Y. Aoki, "Indexing of baseball telecast for content based video retrieval," in Proc. ICIP'98, pp. 871--874, Oct. 1998, Chicago, IL., USA.
[25]
J. R. Kender, B.-L. Yeo, "Video Scene Segmentation via Continuous Video Coherence", in Proc. of CVPR'98, pp. 367--373, Santa Barbara, CA, May 1998.
[26]
E. Kijak, L. Oisel, P. Gros, "Hierarchical Structure Analysis of Sport Videos Using HMMs," in Proc. of ICIP'03, Barcelona, Spain, pp. 1025--1028, Sept. 2003.
[27]
I. Koprinska, S. Carrato, "Temporal Video Segmentation: a Survey," Signal Processing: Image Commun., vol. 16, pp. 477--500, 2001.
[28]
S. Lefevre, B. Maillard, and N. Vincent, "3 classes segmentation for analysis of football audio sequences," in Proc. ICDSP'02, July 2002, Santorin, Grece.
[29]
R. Leonardi, P. Migliorati, and M. Prandini, "Modeling of visual features by markov chains for sport content characterization," in Proc. EUSIPCO'02, Sept. 2002, Toulouse, France.
[30]
R. Leonardi and P. Migliorati, "Semantic indexing of multimedia documents," IEEE Multimedia 9, pp. 44--51, Apr.-June 2002.
[31]
R. Leonardi, P. Migliorati, and M. Prandini, "A markov chain model for semantic indexing of sport program sequences," in Proc. WIAMIS'03, Apr. 2003, London, UK.
[32]
Z. Li, G. M. Schuster, A. K. Katsaggelos, "MINMAX Optimal Video Summarization," IEEE Trans. on Circuits and Syst. for Video Technol., vol. 15, no. 10, pp. 1245--1256, Oct. 2005.
[33]
R. Lienhart, "Dynamic Video Summarization of Home Video," in Proc. of SPIE'00, vol. 3972, pp. 378--389, San Jose, CA, Jan. 2000.
[34]
Y.-F. Ma, L. Lu, H.-J. Zhang, M. Li, "A User Attention Model for Video Summarization," in Proc. of 10 th ACM Int. Conf. Multimedia, pp. 533--542, Juan Les Pins, France, Dec. 2002.
[35]
Y.-F. Ma, H.-J. Zhang, "A Model of Motion Attention for Video Skimming," in Proc. of ICIP'02, vol. 1, pp. 129--132, Rochester, NY, Sept. 2002.
[36]
V. Mihajlovic and M. Petrovic, "Automatic annotation of formula 1 races for content-based video retrieval," in Tech. report, TR-CTIT-01-41, Dec. 2001.
[37]
J. Nam, A. T. Tewfik, "Dynamic Video Summarization and Visualization," in Proc. of 7 th ACM Int. Conf. Multimedia, Orlando, Florida, pp. 53--56, Nov. 1999.
[38]
C.-W. Ngo, Y.-F. Ma, H.-J. Zhang, "Video Summarization and Scene Detection by Graph Modeling," IEEE Trans. on Circuits and Syst. for Video Technol., vol.15, no.2, pp. 296--305, Feb. 2005.
[39]
J-M. Odobez, D. Gatica-Perez, M. Guillemot, "Video Shot Clustering Using Spectral Methods", in Proc. of CBMI'03, Rennes, France, Sept. 2003.
[40]
X. Orriols, X. Binefa, "An EM Algorithm for Video Summarization, Generative Model Approach," in Proc. of Int. Conference on Computer Vision, Vancouver, Canada, vol. 2, pp. 335--342, July 2001.
[41]
H. Pan, P. Beek, and M. Sezan, "Detection of slow-motion replay segments in sports video for highlights generation," in Proc. ICASSP'01, May 2001, Salt Lake City, USA.
[42]
M. Petkovic, W. Jonker, and Z. Zivkovic, "Recognizing strokes in tennis videos using hidden markov models," Marbella, Spain, 2001.
[43]
M. Petrovic, V. Mihajlovic, W. Jonker, and S. Djordievic-Kajan, "Multi-modal extraction of highlights from tv formula 1 programs," in Proc. ICME'02, Aug. 2002, Lausanne, Switzerland.
[44]
Y. Qi, A. Hauptmann, T. Liu, "Supervised Classification for Video Shot," in Proc. of ICME'03, Baltimore, MD, July 2003.
[45]
Y. Rui, T. Huang, "A Unified Framework for Video Browsing and Retrieval," in Image and Video Proc. Handbook, Academic Press, pp. 705--715, 2000.
[46]
Y. Rui, A. Gupta, and A. Acero, "Automatically extracting highlights for TV baseball programs," in Proc. ACM Multimedia'02, pp. 105--115, 2000, Los Angeles, CA, USA.
[47]
E. Sahouria, A. Zakhor, "Content Analysis of Video Using Principal Components," IEEE Trans. on Circuits and Syst. for Video Technol., vol. 9, no. 8, pp. 1290--1298, 1999.
[48]
D. Saur, Y. Tan, S. Kulkarni, and P. Ramadge, "Automated analysis and annotation of basketball video," in SPIE Vol. 3022, Sept. 1997.
[49]
M. A. Smith, T. Kanade, "Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques," in Proc. of CVPR'97, Puerto Rico, pp. 775--781, June 1997.
[50]
C. Snoek and M. Worring, "Multimodal video indexing: a review of the state-of-the-art," in ISIS Technical Report Series, Vol. 2001--20, Dec. 2001.
[51]
G. Sudhir, J. lee, and A. Jain, "Automatic classification of tennis video for high-level content-based retrieval," in IEEE Multimedia, 1997.
[52]
H. Sundaram, S. F. Chang, "Determining Computable Scenes in Films and their Structures Using Audio-Visual Memory Models," in Proc. of ACM, pp. 95--104, Los Angeles, CA, Nov. 2000.
[53]
H. Sundaram, S.-F. Chang, "Constrained Utility Maximization for Generating Visual Skims," in Proc. of IEEE Workshop Content-Based Access of Image and Video Library 2001, Kauai, HI, pp. 124--131, 2001.
[54]
H. Sundaram, L. Xie, S.-F. Chang, "A Utility Framework for the Automatic Generation of Audio-Visual Skims," in Proc. of 10 th ACM Int. Conf. Multimedia'02, Juan Les Pins, France, pp. 189--198, 2002.
[55]
Y. Takahashi, N. Nitta, N. Babaguchi, "Video Summarization for Large Sports Video Archives," in Proc. of ICME'05, Amsterdam, NL, July 2005.
[56]
V. Tovinkere and R. J. Qian, "Detecting semantic events in soccer games: Toward a complete solution," in Proc. of ICME'01, pp. 1040--1043, Aug. 2001, Tokyo, Japan.
[57]
S. Uchihashi, J. Foote, A. Girgensohn, J. Boreczky, "Video Manga: Generating Semantically Meaningful Video Summaries", in Proc. of ACM Multimedia 1999, Orlando, Florida, pp. 383--392, October, 1999.
[58]
N. Vasconcelos, A. Lippman, "A Spatio-Temporal Motion Model for Video Summarization," in Proc. of International Conference on Computer Vision and Pattern Recognition CVPR'98, Santa Barbara, CA, pp. 361--366, June 1998.
[59]
Y. Wang, Z. Liu, and J.-C. Huang, "Multimedia Content Analysis Using Both Audio and Visual Clues," IEEE Signal Process. Mag., vol. 17, no. 11, pp. 12--36, Nov. 2000.
[60]
L. Xie, S.-F. Chang, A. Divakaran, H. Sun, "Structure Analysis of Soccer Video with Hidden Markov Model," in Proc. of ICASSP'02, Orlando, FL, May 2002.
[61]
P. Xu, L. Xie, S.-F. Chang, A. Divakaran, A. Vetro, and H. Sun, "Algorithms and system for segmentation and structure analysis in soccer video," in Proc. ICME'01, pp. 928--931, Aug. 2001, Tokyo, Japan.
[62]
M. Yeung, B. L. Yeo, B. Liu, "Segmentation of Video by Clustering and Graph Analysis," in Proc. of Computer Vision and Image Understanding, vol. 71, no. 1, pp. 94--109, July 1998.
[63]
M. M. Yeung, B.-L. Yeo, "Time-Constrained Clustering for Segmentation of Video Into Story Units," in Proc. of ICPR'96, vol. III-vol.7276, p. 375, Vienna, Austria, Aug. 1996.
[64]
M. M. Yeung, B.-L. Yeo, "Video Visualization for Compact Presentation and Fast Browsing of Pictorial Content," IEEE Trans. on Circuits and Syst. for Video Technol., vol. 7, no. 5, pp. 771--785, Oct. 1997.
[65]
D. You, B. Yeo, M. Yeung, and G. Liu, "Analysis and presentation of soccer highlights from digital video," in Proc. ACCV'95, Dec. 1995, Singapore.
[66]
D. Q. Zhang, C. Y. Lin, S. F. Chang, J. R. Smith, "Semantic Video Clustering Across Sources Using Bipartite Spectral Clustering," in Proc. of ICME'04, Taiwan, June 2004.
[67]
Y. Zhuang, Y. Rui, T. S. Huan, S. Mehrotra, "Adaptive Key Frame Extracting Using Unsupervised Clustering," in Proc. of ICIP'98, Chicago, IL, pp. 866--870, 1998.
[68]
J. Zhang, L. Sun, S. Yang, Y. Zhong, "Joint Inter and Intra Shot Modeling for Spectral Video Shot Clustering," in Proc. of ICME'05, Amsterdam, NL, July 2005.
[69]
T. Zhang and C.-C. J. Kuo, "Audio content analysis for online audiovisual data segmentation and classification," IEEE Trans. on Speech and Audio Processing 9, pp. 441--457, 2001.
[70]
D. Zhong and S.-F. Chang, "Structure analysis of sports video using domain models," in Proc. ICME'01, Aug. 2001, Tokyo, Japan.
[71]
W. Zhou, A. Vellaikal, and C.-C. J. Kuo, "Rule based video classification system for basketball video indexing," in Proc. ACM Multimedia'02, Dec. 2002, Los Angeles, CA, USA.

Cited By

View all
  • (2018)Static Video Summarization Using Artificial Bee Colony optimization2018 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI.2018.8628784(777-784)Online publication date: Nov-2018
  • (2018)Location based abstraction of user generated mobile videosImage Communication10.1016/j.image.2012.01.01727:8(917-924)Online publication date: 27-Dec-2018
  • (2013)Content-Based Keyframe Clustering Using Near Duplicate Keyframe IdentificationMultimedia Data Engineering Applications and Processing10.4018/978-1-4666-2940-0.ch001(1-19)Online publication date: 2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MobiMedia '06: Proceedings of the 2nd international conference on Mobile multimedia communications
September 2006
281 pages
ISBN:1595935177
DOI:10.1145/1374296
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2006

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. overview of existing video summarization techniques

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Static Video Summarization Using Artificial Bee Colony optimization2018 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI.2018.8628784(777-784)Online publication date: Nov-2018
  • (2018)Location based abstraction of user generated mobile videosImage Communication10.1016/j.image.2012.01.01727:8(917-924)Online publication date: 27-Dec-2018
  • (2013)Content-Based Keyframe Clustering Using Near Duplicate Keyframe IdentificationMultimedia Data Engineering Applications and Processing10.4018/978-1-4666-2940-0.ch001(1-19)Online publication date: 2013
  • (2012)Video SummarizationProceedings of the International Conference on Computer Vision and Graphics - Volume 759410.5555/2942031.2942033(1-13)Online publication date: 24-Sep-2012
  • (2012)Video Summarization: Techniques and ClassificationComputer Vision and Graphics10.1007/978-3-642-33564-8_1(1-13)Online publication date: 2012
  • (2012)Location Based Abstraction of User Generated Mobile VideosMobile Multimedia Communications10.1007/978-3-642-30419-4_25(295-306)Online publication date: 2012
  • (2011)Content-Based Keyframe Clustering Using Near Duplicate Keyframe IdentificationInternational Journal of Multimedia Data Engineering and Management10.4018/jmdem.20110101012:1(1-21)Online publication date: Jan-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media