research-article

MoViMash: online mobile video mashup

Authors:

Mukesh Kumar Saini,

Raghudeep Gadde,

Wei Tsang OoiAuthors Info & Claims

MM '12: Proceedings of the 20th ACM international conference on Multimedia

Pages 139 - 148

https://doi.org/10.1145/2393347.2393373

Published: 29 October 2012 Publication History

Abstract

With the proliferation of mobile video cameras, it is becoming easier for users to capture videos of live performances and socially share them with friends and public. As an attendee of such live performances typically has limited mobility, each video camera is able to capture only from a range of restricted viewing angles and distance, producing a rather monotonous video clip. At such performances, however, multiple video clips can be captured by different users, likely from different angles and distances. These videos can be combined to produce a more interesting and representative mashup of the live performances for broadcasting and sharing. The earlier works select video shots merely based on the quality of currently available videos. In real video editing process, however, recent selection history plays an important role in choosing future shots. In this work, we present MoViMash, a framework for automatic online video mashup that makes smooth shot transitions to cover the performance from diverse perspectives. Shot transition and shot length distributions are learned from professionally edited videos. Further, we introduce view quality assessment in the framework to filter out shaky, occluded, and tilted videos. To the best of our knowledge, this is the first attempt to incorporate history-based diversity measurement, state-based video editing rules, and view quality in automated video mashup generations. Experimental results have been provided to demonstrate the effectiveness of MoViMash framework.

References

[1]

http://www.bloomberg.com/news/2011-04-07/high-resolution-cameras-will-drive-mobile-phone-shipments-above-1-billion.html.

[2]

http://www.businesswire.com/news/home /20110829005068/en/photobucket-survey-video-uploads-mobile-devices-rise.

[3]

M. Al-Hames, B. Hörnler, C. Scheuermann, and G. Rigoll. Using audio, visual, and lexical features in a multi-modal virtual meeting director. In Proc. of Machine Learning for Multimodal Interaction, pages 63--74. Springer, Bethesda, MD, USA, May 2006.

Digital Library

[4]

M. Campanella, H. Weda, and M. Barbieri. Edit while watching: home video editing made easy. In Proc. of SPIE 19th Ann. Symp. Electronic Imaging, Multimedia Content Access: Algorithms and Systems, volume 6506, page 21, San Jose, CA, USA, January 2007.

[5]

F. Crete, T. Dolmiere, P. Ladret, and M. Nicolas. The blur effect: perception and estimation with a new no-reference perceptual blur metric. In Proc. of SPIE Human Vision and Electronic Imaging XII, volume 6492, San Jose, CA, USA, January 2007.

[6]

R. Cutler, Y. Rui, A. Gupta, J. Cadiz, I. Tashev, L. He, A. Colburn, Z. Zhang, Z. Liu, and S. Silverberg. Distributed meetings: A meeting capture and broadcasting system. In Proc. of ACM International Conference on Multimedia, pages 503--512. ACM, Juan Les Pins, France, December 2002.

Digital Library

[7]

E. de Lima, C. Pozzer, M. d'Ornellas, A. Ciarlini, B. Feijó, and A. Furtado. Virtual cinematography director for interactive storytelling. In Proc. of International Conference on Advances in Computer Enterntainment Technology, pages 263--270, Athens, Greece, October 2009.

Digital Library

[8]

A. Engstrom, M. Esbjornsson, O. Juhlin, and M. Perry. Producing collaborative video: developing an interactive user experience for mobile tv. In Proc. of the International Conference on Designing Interactive User Experiences for TV and Video, pages 115--124, Silicon Valley, CA, USA, October 2008.

Digital Library

[9]

E. Machnicki and L. Rowe. Virtual director: Automating a webcast. In Proc. of the SPIE Multimedia Computing and Networking, volume 4673, 2002.

[10]

A. K. Moorthy, P. Obrador, and N. Oliver. Towards computational models of the visual aesthetic appeal of consumer videos. In Proc. of the 11th European conference on Computer vision: Part V, pages 1--14, Crete, Greece, September 2010.

Digital Library

[11]

N. Quang Minh Khiem, G. Ravindra, A. Carlier, and W. Ooi. Supporting zoomable video streams with dynamic region-of-interest cropping. In Proc. of the first annual ACM SIGMM Conference on Multimedia Systems, pages 259--270. ACM, 2010.

Digital Library

[12]

A. Ranjan, R. Henrikson, J. Birnholtz, R. Balakrishnan, and D. Lee. Automatic camera control using unobtrusive vision and audio tracking. In Proc. of Graphics Interface, pages 47--54. Canadian Information Processing Society, Ottawa, Ontario, Canada, May 2010.

Digital Library

[13]

A. Senior, A. Hampapur, Y. Tian, L. Brown, S. Pankanti, and R. Bolle. Appearance models for occlusion handling. Image and Vision Computing, 24(11):1233--1243, 2006.

[14]

P. Shrestha, M. Barbieri, and H. Weda. Synchronization of multi-camera video recordings based on audio. In Proc of ACM International Conference on Multimedia, pages 545--548, Augsburg, Germany, September 2007.

Digital Library

[15]

P. Shrestha, H. de With Peter, H. Weda, M. Barbieri, and E. Aarts. Automatic mashup generation from multiple-camera concert recordings. In Proc. of ACM International Conference on Multimedia, pages 541--550, Firenze, Italy, October 2010.

Digital Library

[16]

J. Wang, C. Xu, E. Chng, H. Lu, and Q. Tian. Automatic composition of broadcast sports video. Multimedia Systems, 14(4):179--193, 2008.

Digital Library

[17]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Trans. on Image Processing, 13(4):600--612, 2004.

Digital Library

[18]

Z. Wang, H. Sheikh, and A. Bovik. No-reference perceptual quality assessment of jpeg compressed images. In Proc. of IEEE International Conference on Image Processing, pages 477--480, Rochester, NY, USA, September 2002.

[19]

T. Yang, Q. Pan, J. Li, and S. Li. Real-time multiple objects tracking with occlusion handling in dynamic scenes. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 970--975, San Diego, CA, USA, June 2005.

Digital Library

[20]

B. Yu, C. Zhang, Y. Rai, and K. Nahrstedt. A three-layer virtual director model for supporting automated multi-site distributed education. In IEEE International Conference on Multimedia and Expo, pages 637--640, Toronto, Ontario, Canada, July 2006.

[21]

C. Zhang, Y. Rui, J. Crawford, and L. He. An automated end-to-end lecture capture and broadcasting system. ACM Tran. on Multimedia Computing, Communications, and Applications, 4(1):6:1--6:23, 2008.

Digital Library

Cited By

Effelsberg WWilk S(2024)Composition and Transmission of Videos Generated by Multiple UsersFrom Multimedia Communications to the Future Internet10.1007/978-3-031-71874-8_14(202-218)Online publication date: 13-Sep-2024
https://doi.org/10.1007/978-3-031-71874-8_14
Hu PXiao NLi FChen YHuang REl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language ModelProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611878(6441-6450)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611878
Zhong WChen HPan ZZheng CWang A(2023)COSense: collaborative and opportunistic sensing of road events by vehicles’ camerasCCF Transactions on Pervasive Computing and Interaction10.1007/s42486-023-00126-95:3(276-287)Online publication date: 15-Feb-2023
https://doi.org/10.1007/s42486-023-00126-9
Show More Cited By

Index Terms

MoViMash: online mobile video mashup
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization

Recommendations

Jiku director: a mobile video mashup system
MM '13: Proceedings of the 21st ACM international conference on Multimedia

In this technical demonstration, we demonstrate a Web-based application called Jiku Director that automatically creates a mashup video from event videos uploaded by users. The system runs an algorithm that considers view quality (shakiness, tilt, ...
Jiku director 2.0: a mobile video mashup system with zoom and pan using motion maps
MM '14: Proceedings of the 22nd ACM international conference on Multimedia

In this demonstration, we show an automated mobile video mashup system that takes a set of videos filming the same scene as input, and generate an output mashup video consisting of temporally coherent clips selected from these input videos. The key ...
A knowledge-based moviemaking approach
AIKED'05: Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases

We are developing an intelligent approach of motion picture generation for desktop software system EMM (Electronic MovieMaker) that aims at automating the production of digital movies with various visual effects like three-dimension animation, real ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '12: Proceedings of the 20th ACM international conference on Multimedia

October 2012

1584 pages

ISBN:9781450310895

DOI:10.1145/2393347

General Chairs:
Noboru Babaguchi
Osaka University, Japan
,
Kiyoharu Aizawa
The University of Tokyo, Japan
,
John Smith
IBM, USA
,
Program Chairs:
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Thomas Plagemann
University of Oslo, Norway
,
Xian-Sheng Hua
Microsoft, USA
,
Rong Yan
Facebook, USA

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '12

Sponsor:

SIGMM

MM '12: ACM Multimedia Conference

October 29 - November 2, 2012

Nara, Japan

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

62
Total Citations
View Citations
591
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Effelsberg WWilk S(2024)Composition and Transmission of Videos Generated by Multiple UsersFrom Multimedia Communications to the Future Internet10.1007/978-3-031-71874-8_14(202-218)Online publication date: 13-Sep-2024
https://doi.org/10.1007/978-3-031-71874-8_14
Hu PXiao NLi FChen YHuang REl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language ModelProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611878(6441-6450)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611878
Zhong WChen HPan ZZheng CWang A(2023)COSense: collaborative and opportunistic sensing of road events by vehicles’ camerasCCF Transactions on Pervasive Computing and Interaction10.1007/s42486-023-00126-95:3(276-287)Online publication date: 15-Feb-2023
https://doi.org/10.1007/s42486-023-00126-9
Lee DYoo JCho KKim BIm GNoh J(2022)PopStageACM Transactions on Graphics10.1145/3550454.355546741:6(1-13)Online publication date: 30-Nov-2022
https://dl.acm.org/doi/10.1145/3550454.3555467
Huang HShih CYang ZHong JBures MPark JCerny T(2022)Automated video editing based on learned styles using LSTM-GANProceedings of the 37th ACM/SIGAPP Symposium on Applied Computing10.1145/3477314.3507141(73-80)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3477314.3507141
Wei WLin JLiu TTyan HWang HLiao H(2021)Learning to Visualize Music Through Shot Sequence for Automatic Concert Video MashupIEEE Transactions on Multimedia10.1109/TMM.2020.300363123(1731-1743)Online publication date: 2021
https://doi.org/10.1109/TMM.2020.3003631
Hu PLiu JCao THuang R(2021)Reinforcement Learning Based Automatic Personal Mashup Generation2021 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME51207.2021.9428357(1-6)Online publication date: 5-Jul-2021
https://doi.org/10.1109/ICME51207.2021.9428357
Lin JWei WLin YLiu TLiao HWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Learning From Music to Visual Storytelling of Shots: A Deep Interactive Learning MechanismProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413985(102-110)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3413985
Kleinrouweler JPotetsianakis ED'Acunto L(2020)Prosuming Live Multimedia Content at the EdgeProceedings of the 2020 ACM International Conference on Interactive Media Experiences10.1145/3391614.3399393(160-164)Online publication date: 17-Jun-2020
https://dl.acm.org/doi/10.1145/3391614.3399393
Xing BZhang XZhang KWu XZhang HZheng JZhang LSun S(2020)PopMash: an automatic musical-mashup system using computation of musical and lyrical agreement for transitionsMultimedia Tools and Applications10.1007/s11042-020-08934-2Online publication date: 13-May-2020
https://doi.org/10.1007/s11042-020-08934-2
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten