research-article

A Top-Down Approach for Video Summarization

Authors:

David Dagan FengAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 11, Issue 1

Article No.: 4, Pages 1 - 21

https://doi.org/10.1145/2632267

Published: 04 September 2014 Publication History

Abstract

While most existing video summarization approaches aim to identify important frames of a video from either a global or local perspective, we propose a top-down approach consisting of scene identification and scene summarization. For scene identification, we represent each frame with global features and utilize a scalable clustering method. We then formulate scene summarization as choosing those frames that best cover a set of local descriptors with minimal redundancy. In addition, we develop a visual word-based approach to make our approach more computationally scalable. Experimental results on two benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.

References

[1]

R. Achantay, S. Hemamiz, F. Estraday, and S. Susstrunky. 2009. Frequency-tuned salient region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[2]

D. Besiris, A. Makedonas, G. Economou, and S. Fotopoulos. 2009. Combining graph connectivity and dominant set clustering for video summarization. Multimedia Tools Appl. 44, 161--186.

Digital Library

[3]

J. Bian, Y. Yang, and T.-S. Chua. 2013. Multimedia summarization for trending topics in microblogs. In Proceedings of the ACM International Conference on Conference on Information and Knowledge Management (CIKM'13). 1807--1812.

Digital Library

[4]

L. Cao, Y. Mu, A. Natsev, S.-F. Chang, G. Hua, and J. R. Smith. 2012. Scene aligned pooling for complex video recognition. In Proceedings of the European Conference on Computer Vision (ECCV'12).

Digital Library

[5]

J. G. Carbonell and J. Goldstein. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98). 335--336.

Digital Library

[6]

S. A. Chatzichristofis and Y. S. Boutalis. 2008. CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval. In Proceedings of the International Conference on Computer Vision Systems.

Digital Library

[7]

B.-W. Chen, J.-C. Wang, and J.-F. Wang. 2009. A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans. Multimedia 11, 295--312.

Digital Library

[8]

F. Chen, C. D. Vleeschouwer, and A. Cavallaro. 2014. Resource allocation for personalized video summarization. IEEE Trans. Multimedia 16, 2, 455--469.

Digital Library

[9]

Y. Cong, J. Yuan, and J. Luo. 2012. Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimedia 14, 1, 66--75.

Digital Library

[10]

S. E. F. Devila, A. P. B. Lopes, A. Da Luz Jr, and A. De Lbuquerque Arajo. 2011. VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn. Lett. 32, 56--68.

Digital Library

[11]

D. F. Dementhon, V. Kobla, and D. Doermann. 1998. Video summarization by curve simplification. In Proceedings of the ACM International Conference on Multimedia.

Digital Library

[12]

G. Evangelopoulos, K. Rapantzikos, A. Potamianos, P. Maragos, A. Zlatintsi, and Y. Avrithis. 2008. Movie summarization based on audio-visual saliency detection. In Proceedings of the IEEE International Conference on Image Processing.

[13]

G. Evangelopoulos, A. Zlatintsi, A. Potamianos, P. Maragos, K. Rapantzikos, G. Skoumas, and Y. Avrithis. 2013. Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans. Multimedia 15, 7, 1553--1568.

Digital Library

[14]

B. J. Frey and D. Dueck. 2007. Clustering by passing messages between data points. Science 315, 972--976.

[15]

M. Furini, F. Geraci, M. Montangero, and M. Pellegrini. 2010. STIMO: Still and moving video storyboard for the web scenario. Multimedia Tools Appl. 46, 47--69.

Digital Library

[16]

Y. Gong and X. Liu. 2000. Video summarization using singular value decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[17]

G. Guan, Z. Wang, J. D. Deng, and D. D. Feng. 2013. Keypoint based keyframe selection. IEEE Trans. Circ. Syst. Video Technol. 23, 4, 729--734.

Digital Library

[18]

G. Guan, Z. Wang, K. Yu, S. Mei, M. He, and D. Feng. 2012. Video summarization with global and local features. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops.

Digital Library

[19]

R. Hong, J. Tang, H.-K. Tan, C.-W. Ngo, S. Yan, and T.-S. Chua. 2011. Beyond search: Event-driven summarization for web videos. ACM Trans. Multimedia Comput. Comm. Appl. 7, 4.

Digital Library

[20]

J. Li, Y. Ding, Y. Shi, and W. Li. 2010. A divide-and-rule scheme for shot boundary detection based on sift. Int. J. Digital Content Technol. Appl. 4, 202--214.

[21]

Y. Li, B. Merialdo, M. Rouvier, and G. Linares. 2011. Static and dynamic video summaries. In Proceedings of the ACM International Conference on Multimedia (MM'11). 1573--1576.

Digital Library

[22]

Z. Li, G. M. Schuster, and A. K. Katsaggelos. 2005. MINMAX optimal video summarization. IEEE Trans. Circ. Syst. Video Technol. 15, 1245--1256.

Digital Library

[23]

R. Lienhart, S. Pfeiffer, and W. Effelsberg. 1997. Video abstracting. Comm. ACM 40, 12, 54--62.

Digital Library

[24]

G. Liu, X. Wen, W. Zheng, and P. He. 2009. Shot boundary detection and keyframe extraction based on scale invariant feature transform. In Proceedings of the IEEE/ACIS International Conference on Computer and Information Science.

Digital Library

[25]

D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91--110.

Digital Library

[26]

S. Lu, Z. Wang, T. Mei, G. Guan, and D. D. Feng. 2014. A bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans. Multimedia (to appear).

[27]

S. Lu, Z. Wang, Y. Song, T. Mei, and D. D. Feng. 2013. A bag-of-importance model for video summarization. In Proceedings of the ICME Workshop on Emerging Multimedia Systems and Applications (EMSA'13).

[28]

Z. Lu and K. Grauman. 2013. Story-driven summarization for egocentric video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'13).

Digital Library

[29]

J. Luo, C. Papin, and K. Costello. 2009. Towards extracting semantically meaningful key frames from personal video clips: From humans to computers. IEEE Trans. Circ. Syst. Video Technol. 19, 289--301.

Digital Library

[30]

U. Luxburg. 2007. A tutorial on spectral clustering. J. Statist. Comput. 17, 4, 395--416.

Digital Library

[31]

Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang. 2005. A generic framework of user attention model and its application in video summarization. IEEE Trans. Multimedia 7, 907--919.

Digital Library

[32]

S. Mei, G. Guan, Z. Wang, M. He, X.-S. Hua, and D. D. Feng. 2014. l2,0 constrained sparse dictionary selection for video summarization. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'14).

[33]

T. Mei, L.-X. Tang, J. Tang, and X.-S. Hua. 2013. Near-lossless semantic video summarization and its applications to video analysis. ACM Trans. Multimedia Comput. Comm. Appl. 9, 3.

Digital Library

[34]

K. Mikolajczyk and C. Schmid. 2005. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 10, 27, 1615--1630.

Digital Library

[35]

A. Money and H. Agius. 2008. Video summarisation: A conceptual framework and survey of the state of the art. J. Vis. Comm. Image Represent. 19, 121--143.

Digital Library

[36]

M. Muja and D. G. Lowe. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In Proceedings of the International Conference on Computer Vision Theory and Applications.

[37]

P. Mundur, Y. Rao, and Y. Yesha. 2006. Keyframe-based video summarization using delaunay clustering. Int. J. Digital Librar. 6, 2, 219--232.

Digital Library

[38]

C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang. 2005. Video summarization and scene detection by graph modeling. IEEE Trans. Circ. Syst. Video Technol. 15, 296--305.

Digital Library

[39]

C. Panagiotakis, A. Doulamis, and G. Tziritas. 2009. Equivalent key frames selection based on iso-content principles. IEEE Trans. Circ. Syst. Video Technol. 19, 447--451.

Digital Library

[40]

D. Pelleg and A. W. Moore. 2000. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the 17th International Conference on Machine Learning.

Digital Library

[41]

B. T. Truong and S. Venkatesh. 2007. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Comm. Appl. 3, 1.

Digital Library

[42]

M. Wang, R. Hong, G. Li, Z.-J. Zha, S. Yan, and T.-S. Chua. 2012. Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimedia 14, 4, 975--985.

Digital Library

[43]

YouTube Statistics. 2012. http://www.youtube.com/yt/press/statistics.html.

[44]

Y.-T. Zheng, S.-Y. Neo, T.-S. Chua, and Q. Tian. 2007. The use of temporal, semantic and visual partitioning model for efficient near duplicate keyframe detection in large scale news corpus. In Proceedings of the ACM International Conference on Image and Video Retrieval.

Digital Library

[45]

Y. Zhuang, Y. Rui, T. Huang, and S. Mehrotraw. 1998. Adaptive key frame extraction using unsupervised clustering. In Proceedings of the IEEE International Conference on Image Processing.

Cited By

Han TZhou QYu JYu ZZhang JZhao S(2024)Effective Video Summarization by Extracting Parameter-Free Motion AttentionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365467020:7(1-20)Online publication date: 30-Mar-2024
https://dl.acm.org/doi/10.1145/3654670
Lan SWang ZWei ERoy-Chowdhury AZhu Q(2024)Collaborative Multi-Agent Video Fast-ForwardingIEEE Transactions on Multimedia10.1109/TMM.2023.327585326(1041-1054)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3275853
You WJi JSun LYang CYu MChen SYao J(2024)Automatic Generation of Interactive Nonlinear Video for Online Apparel Shopping NavigationIEEE Transactions on Multimedia10.1109/TMM.2023.326661526(474-486)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3266615
Show More Cited By

Recommendations

Hierarchical Recurrent Neural Network for Video Summarization
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Exploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such ...
Video Summarization with Global and Local Features
ICMEW '12: Proceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops

Video summarization has been crucial for effective and efficient access of video content due to the ever increasing amount of video data. Most of the existing key frame based summarization approaches represent individual frames with global features, ...
Multi-video summarization with query-dependent weighted archetypal analysis
Abstract
Given the tremendous growth of web videos, video summarization is becoming increasingly important to improve user’s browsing experience. Since most existing methods focus on generating an informative summarization from a single video ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 11, Issue 1

August 2014

151 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/2665935

Editor:
Ralf Steinmetz
Technische Universität Darmstadt, Germany

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2014

Accepted: 01 April 2014

Revised: 01 January 2014

Received: 01 October 2013

Published in TOMM Volume 11, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Australian Research Council
Fundamental Research Funds for the Central Universities (3102014JCQ01054)
Natural Science Foundation of Shaanxi Province
National ICT Australia (NICTA)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
480
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Han TZhou QYu JYu ZZhang JZhao S(2024)Effective Video Summarization by Extracting Parameter-Free Motion AttentionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365467020:7(1-20)Online publication date: 30-Mar-2024
https://dl.acm.org/doi/10.1145/3654670
Lan SWang ZWei ERoy-Chowdhury AZhu Q(2024)Collaborative Multi-Agent Video Fast-ForwardingIEEE Transactions on Multimedia10.1109/TMM.2023.327585326(1041-1054)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3275853
You WJi JSun LYang CYu MChen SYao J(2024)Automatic Generation of Interactive Nonlinear Video for Online Apparel Shopping NavigationIEEE Transactions on Multimedia10.1109/TMM.2023.326661526(474-486)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3266615
Zang SYu HSong YZeng R(2023)Unsupervised video summarization using deep Non-Local video summarization networksNeurocomputing10.1016/j.neucom.2022.11.028519:C(26-35)Online publication date: 28-Jan-2023
https://dl.acm.org/doi/10.1016/j.neucom.2022.11.028
Wu XMa MWan SHan XMei S(2023)Multi-scale deep feature fusion based sparse dictionary selection for video summarizationSignal Processing: Image Communication10.1016/j.image.2023.117006118(117006)Online publication date: Oct-2023
https://doi.org/10.1016/j.image.2023.117006
Sabha ASelwal A(2023)Towards machine vision-based video analysis in smart cities: a survey, framework, applications and open issuesMultimedia Tools and Applications10.1007/s11042-023-16434-283:22(62107-62158)Online publication date: 9-Aug-2023
https://doi.org/10.1007/s11042-023-16434-2
Sabha ASelwal A(2023)Data-driven enabled approaches for criteria-based video summarization: a comprehensive survey, taxonomy, and future directionsMultimedia Tools and Applications10.1007/s11042-023-14925-w82:21(32635-32709)Online publication date: 2-Mar-2023
https://doi.org/10.1007/s11042-023-14925-w
Ghatak SRup SBehera AMajhi BSwamy M(2022)An improved tube rearrangement strategy for choice-based surveillance video synopsis generationDigital Signal Processing10.1016/j.dsp.2022.103817132:COnline publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1016/j.dsp.2022.103817
Nair MMohan J(2022)VSMCNN-dynamic summarization of videos using salient features from multi-CNN modelJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-022-04112-414:10(14071-14080)Online publication date: 25-Jun-2022
https://doi.org/10.1007/s12652-022-04112-4
Lin FZhou WDeng JLi BLu YLi H(2021)Residual Refinement Network with Attribute Guidance for Precise Saliency DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344069417:3(1-19)Online publication date: 22-Jul-2021
https://dl.acm.org/doi/10.1145/3440694
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents