skip to main content
research-article

Aesthetics-Guided Summarization from Multiple User Generated Videos

Published: 07 January 2015 Publication History

Abstract

In recent years, with the rapid development of camera technology and portable devices, we have witnessed a flourish of user generated videos, which are gradually reshaping the traditional professional video oriented media market. The volume of user generated videos in repositories is increasing at a rapid rate. In today's video retrieval systems, a simple query will return many videos which seriously increase the viewing burden. To manage these video retrievals and provide viewers with an efficient way to browse, we introduce a system to automatically generate a summarization from multiple user generated videos and present their salience to viewers in an enjoyable manner. Among multiple consumer videos, we find their qualities to be highly diverse due to various factors such as a photographer's experience or environmental conditions at the time of capture. Such quality inspires us to include a video quality evaluation component into the video summarization since videos with poor qualities can seriously degrade the viewing experience. We first propose a probabilistic model to evaluate the aesthetic quality of each user generated video. This model compares the rich aesthetics information from several well-known photo databases with generic unlabeled consumer videos, under a human perception component indicating the correlation between a video and its constituting frames. Subjective studies were carried out with the results indicating that our method is reliable. Then a novel graph-based formulation is proposed for the multi-video summarization task. Desirable summarization criteria is incorporated as the graph attributes and the problem is solved through a dynamic programming framework. Comparisons with several state-of-the-art methods demonstrate that our algorithm performs better than other methods in generating a skimming video in preserving the essential scenes from the original multiple input videos, with smooth transitions among consecutive segments and appealing aesthetics overall.

References

[1]
Subhabrata Bhattacharya, Rahul Sukthankar, and Mubarak Shah. 2010. A framework for photo-quality assessment and enhancement based on visual aesthetics. In Proceedings of the ACM International Conference on Multimedia.
[2]
Yue-Meng Chen and I. V. Bajic. 2011. A joint approach to global motion estimation and motion segmentation from a coarsely sampled motion vector field. IEEE Trans. Circuits Syst. Video Technol. 21, 9, 1316--1328.
[3]
Yang Cong, Junsong Yuan, and Jiebo Luo. 2012. Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimedia 14, 1, 66--75.
[4]
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[5]
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2006. Studying aesthetics in photographic images using a computational approach. In Proceedings of the European Conference on Computer Vision.
[6]
Ritendra Datta, Jia Li, and James Z. Wang. 2008. Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. In Proceedings of the IEEE International Conference on Image Processing.
[7]
S. Dhar, V. Ordonez, and T. L. Berg. 2011. High level describable attributes for predicting aesthetics and interestingness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[8]
A. Ekin, A. M. Tekalp, and R. Mehrotra. 2003. Automatic soccer video analysis and summarization. IEEE Trans. Image Process. 12, 7, 796--807.
[9]
GeoVid 2013. GeoVid -- Georeferenced Video Portal. http://geovid.org/. (2013).
[10]
Jia Hao, GuanfengWang, Beomjoo Seo, and Roger Zimmermann. 2011. Keyframe presentation for browsing of user-generated videos on map interfaces. In Proceedings of the ACM International Conference on Multimedia.
[11]
Jonathan Harel, Christof Koch, and Pietro Perona. 2007. Graph-based visual saliency. In Advances in Neural Information Processing Systems.
[12]
Liwei He, Elizabeth Sanocki, Anoop Gupta, and Jonathan Grudin. 1999. Auto-summarization of audio-video presentations. In Proceedings of the ACM International Conference on Multimedia.
[13]
Tetsuro Hori and Kiyoharu Aizawa. 2003. Context-based video retrieval system for the life-log applications. In Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval.
[14]
A. Khosla, R. Hamid, Chih-Jen Lin, and N. Sundaresan. 2013. Large-scale video summarization using web-image priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[15]
JaeGon Kim, Hyun Sung Chang, Kyeongok Kang, Munchurl Kim, Jinwoong Kim, and HyungMyung Kim. 2003. Summarization of news video and its description for content-based access. Int. J. Imaging Syst. Techno. 13, 5, 267--274.
[16]
Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang. 2007. Trajectory clustering: A partition-and-group framework. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[17]
Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1, 1--19.
[18]
Yingbo Li and B. Merialdo. 2010a. Multi-video summarization based on AV-MMR. In Proceedings of the International Workshop on Content-Based Multimedia Indexing.
[19]
Yingbo Li and B. Merialdo. 2010b. Multi-video summarization based on Video-MMR. In Proceedings of the International Workshop on Image Analysis for Multimedia Interactive Services.
[20]
Yingbo Li and B. Merialdo. 2011. Multi-video summarization based on OB-MMR. In Proceedings of the International Workshop on Content-Based Multimedia Indexing.
[21]
Yiwen Luo and Xiaoou Tang. 2008. Photo and video quality evaluation: focusing on the subject. In Proceedings of the European Conference on Computer Vision.
[22]
L. Marchesotti, F. Perronnin, D. Larlus, and G. Csurka. 2011. Assessing the aesthetic quality of photographs using generic image descriptors. In Proceedings of the IEEE International Conference on Computer Vision.
[23]
Arthur G. Money and Harry Agius. 2008. Video Summarisation: A conceptual framework and survey of the state of the art. J. Visual Commun. Image Represent. 19, 121--143.
[24]
Anush K. Moorthy, Pere Obrador, and Nuria Oliver. 2010. Towards computational models of the visual aesthetic appeal of consumer videos. In Proceedings of the European Conference on Computer Vision.
[25]
I. Otsuka, K. Nakane, A. Divakaran, K. Hatanaka, and M. Ogawa. 2005. A highlight scene detection and video summarization system using audio feature for a personal video recorder. IEEE Trans. Consum. Electron. 51, 1, 112--116.
[26]
Mukesh Kumar Saini, Raghudeep Gadde, Shuicheng Yan, and Wei Tsang Ooi. 2012. MoViMash: Online mobile video mashup. In Proceedings of the ACM International Conference on Multimedia.
[27]
J. Shao, D. Jiang, M. Wang, H. Chen, and L. Yao. 2010. Multi-video Summarization Using Complex Graph Clustering and Mining. Comput. Sci. Inf. Syst. 7, 1, 85--98.
[28]
Xi Shao, Changsheng Xu, Namunu C. Maddage, Qi Tian, Mohan S. Kankanhalli, and Jesse S. Jin. 2006. Automatic summarization of music videos. ACM Trans. Multimedia Comput. Commun. Appl. 2, 2.
[29]
F. Shipman, Andreas Girgensohn, and Lynn Wilcox. 2003. Creating navigable multi-level video summaries. In Proceedings of the IEEE International Conference on Multimedia and Expo.
[30]
Hsiao-Hang Su, Tse-Wei Chen, Chieh-Chi Kao, Winston H. Hsu, and Shao-Yi Chien. 2011. Scenic photo quality assessment with bag of aesthetics-preserving features. In Proceedings of the ACM International Conference on Multimedia.
[31]
Xiaoshuai Sun, Hongxun Yao, Rongrong Ji, and Shaohui Liu. 2009. Photo assessment based on computational visual attention model. In Proceedings of the ACM International Conference on Multimedia.
[32]
Ba Tu Truong and Svetha Venkatesh. 2007. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1.
[33]
Feng Wang and Bernard Merialdo. 2009. Multi-document video summarization. In Proceedings of the IEEE International Conference on Multimedia and Expo.
[34]
Xiaoyu Wang, T. X. Han, and Shuicheng Yan. 2009. An HOG-LBP human detector with partial occlusion handling. In Proceedings of the IEEE International Conference on Computer Vision.
[35]
Yanran Wang. 2013. Beauty is here: Evaluating aesthetics in videos using multimodal features and free training data. In Proceedings of the ACM International Conference on Multimedia.
[36]
Zhou Wang and Qiang Li. 2007. Video quality assessment using a statistical model of human visual speed perception. J. Opt. Soc. Am. A 24, 12, B61--B69.
[37]
Zhou Wang, Hamid R. Sheikh, and Alan C. Bovik. 2003. Objective video quality assessment. In The Handbook of Video Databases: Design and Applications, 1041--1078.
[38]
Stefan Wilk and Wolfgang Effelsberg. 2013. Crowd-sourced evaluation of the perceived viewing quality in user-generated video. In Proceedings of the ACM International Workshop on Crowdsourcing for Multimedia.
[39]
Changsheng Xu, Jinjun Wang, Hanqing Lu, and Yifan Zhang. 2008. A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans. Multimedia 10, 3, 421--436.
[40]
Changsheng Xu, Jinjun Wang, Kongwah Wan, Yiqun Li, and Lingyu Duan. 2006. Live sports event detection based on broadcast video and web-casting text. In Proceedings of the ACM International Conference on Multimedia.
[41]
Jianzhou Yan, S. Lin, Sing Bing Kang, and Xiaoou Tang. 2013. Learning the change for automatic image cropping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[42]
Chun-Yu Yang, Hsin-Ho Yeh, and Chu-Song Chen. 2011. Video aesthetic quality assessment by combining semantically independent and dependent features. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.
[43]
Yang Yang, Yi Yang, and Heng Tao Shen. 2013. Effective transfer tagging from image to video. ACM Trans. Multimedia Comput. Commun. Appl. 9, 2.
[44]
Luming Zhang, Yue Gao, R. Zimmermann, Qi Tian, and Xuelong Li. 2014. Fusion of multichannel local and global structural cues for photo aesthetics evaluation. IEEE Trans. Image Process. 23, 3, 1419--1429.
[45]
Luming Zhang, Yue Gao, Rongrong Ji, Qionghai Dai, and Xuelong Li. 2013c. Actively learning human gaze shifting paths for photo cropping. IEEE Trans. Image Process. 23, 5.
[46]
Luming Zhang, Mingli Song, Qi Zhao, Xiao Liu, Jiajun Bu, and Chun Chen. 2013b. Probabilistic graphlet transfer for photo cropping. IEEE Trans. Image Process. 21, 5, 2887C2897.
[47]
Ying Zhang, He Ma, and Roger Zimmermann. 2013a. Dynamic multi-video summarization of sensor-rich videos in geo-space. Adv. Multimedia Modeling 7732, 380--390.
[48]
Ying Zhang, Guanfeng Wang, Beomjoo Seo, and Roger Zimmermann. 2012. Multi-video summary and skim generation of sensor-rich videos in geo-space. In Proceedings of the ACM Multimedia Systems Conference.
[49]
Ying Zhang and Roger Zimmermann. 2012. DVS: A dynamic multi-video summarization system of sensor-rich videos in geo-space. In Proceedings of the ACM International Conference on Multimedia.

Cited By

View all
  • (2024)An Aesthetic-Guided Multimodal Framework for Video Summarization2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10687758(1-6)Online publication date: 15-Jul-2024
  • (2024)An Aesthetic-Driven Approach to Unsupervised Video SummarizationIEEE Access10.1109/ACCESS.2024.343450812(128768-128777)Online publication date: 2024
  • (2023)2BiVQA: Double Bi-LSTM-based Video Quality Assessment of UGC VideosACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363217820:4(1-22)Online publication date: 8-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 11, Issue 2
December 2014
197 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2716635
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 January 2015
Accepted: 01 August 2014
Revised: 01 August 2014
Received: 01 February 2014
Published in TOMM Volume 11, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Quality assess
  2. user generated videos
  3. video quality
  4. video summary

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative
  • IDM Programme Office through the Centre of Social Media Innovations for Communities (COSMIC)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)6
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Aesthetic-Guided Multimodal Framework for Video Summarization2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10687758(1-6)Online publication date: 15-Jul-2024
  • (2024)An Aesthetic-Driven Approach to Unsupervised Video SummarizationIEEE Access10.1109/ACCESS.2024.343450812(128768-128777)Online publication date: 2024
  • (2023)2BiVQA: Double Bi-LSTM-based Video Quality Assessment of UGC VideosACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363217820:4(1-22)Online publication date: 8-Nov-2023
  • (2023)Multimodal-Based and Aesthetic-Guided Narrative Video SummarizationIEEE Transactions on Multimedia10.1109/TMM.2022.318339425(4894-4908)Online publication date: 1-Jan-2023
  • (2023)Towards machine vision-based video analysis in smart cities: a survey, framework, applications and open issuesMultimedia Tools and Applications10.1007/s11042-023-16434-283:22(62107-62158)Online publication date: 9-Aug-2023
  • (2023)Data-driven enabled approaches for criteria-based video summarization: a comprehensive survey, taxonomy, and future directionsMultimedia Tools and Applications10.1007/s11042-023-14925-w82:21(32635-32709)Online publication date: 2-Mar-2023
  • (2021)Content selection criteria for news multi-video summarization based on human strategiesInternational Journal on Digital Libraries10.1007/s00799-020-00281-922:1(1-14)Online publication date: 1-Mar-2021
  • (2020)Automatic Transformation of a Video Using Multimodal Information for an Engaging Exploration ExperienceApplied Sciences10.3390/app1009305610:9(3056)Online publication date: 27-Apr-2020
  • (2020)Investigating Subjectivity Criterion for Multi-video SummarizationProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3428658.3430964(137-144)Online publication date: 30-Nov-2020
  • (2019)Survey of Compressed Domain Video Summarization TechniquesACM Computing Surveys10.1145/335539852:6(1-29)Online publication date: 16-Oct-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media