skip to main content
10.1145/1873951.1873962acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Crowdsourced automatic zoom and scroll for video retargeting

Published: 25 October 2010 Publication History

Abstract

Screen size and display resolution limit the experience of watching videos on mobile devices. The viewing experience can be improved by determining important or interesting regions within the video (called regions of interest, or ROIs) and displaying only the ROIs to the viewer. Previous work focuses on analyzing the video content using visual attention model to infer the ROIs. Such content-based technique, however, has limitations. In this paper, we propose an alternative paradigm to infer ROIs from a video. We crowdsource from a large number of users through their implicit viewing behavior using a zoom and pan interface, and infer the ROIs from their collective wisdom. A retargeted video, consisting of relevant shots determined from historical users behavior, can be automatically generated and replayed to subsequent users who would prefer a less interactive viewing experience. This paper presents how we collect the user traces, infer the ROIs and their dynamics, group the ROIs into shots, and automatically reframe those shots to improve the aesthetics of the video. A user study with 48 participants shows that our automatically retargeted video is of comparable quality to one handcrafted by an expert user

References

[1]
D. Arijon. Grammar of the Film Language. Silman-James Press, 1991.
[2]
D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., 24(5):603--619, 2002.
[3]
P. Doubek, I. Geys, T. Svoboda, and L. V. Gool. Cinematographic rules applied to a camera network. In Proc. of the 5th Workshop on Omnidirectional Vision, pages 17--30, 2004.
[4]
H. El-Alfy, D. Jacobs, and L. Davis. Multi-scale video cropping. In Proc. of MULTIMEDIA '07, pages 97?106, Augsburg, Germany, 2007.
[5]
X. Fan, X. Xie, H.-Q. Zhou, and W.-Y. Ma. Looking into video frames on small displays. In Proc. of MULTIMEDIA '03, pages 247--250, Berkeley, CA, 2003.
[6]
M. L. Gleicher and F. Liu. Re-cinematography: Improving the camerawork of casual video. ACM Trans. Multimedia Comput. Commun. Appl., 5(1):1--28, 2008.
[7]
J. Han, K. N. Ngan, M. Li, and H. Zhang. Unsupervised extraction of visual attention objects in color images. IEEE Trans. Circuits Syst. Video Techn., 16(1):141--145, 2006.
[8]
L.-w. He, M. F. Cohen, and D. H. Salesin. The virtual cinematographer: a paradigm for automatic real-time camera control and directing. In Proc. of SIGGRAPH '96, pages 217--224, 1996.
[9]
T.-H. Huang, K.-Y. Cheng, and Y.-Y. Chuang. A collaborative benchmark for region of interest detection algorithms. In Proc. of CVPR '09, Miami, FL, June 2009.
[10]
L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell., 20(11):1254--1259, 1998.
[11]
T.-Y. Li and X.-Y. Xiao. An interactive camera planning system for automatic cinematographer. In Proc. of Multimedia Modeling, pages 310--315, Los Alamitos, 2005.
[12]
F. Liu and M. Gleicher. Video retargeting: automating pan and scan. In Proc. of MULTIMEDIA '06, pages 241--250, Santa Barbara, CA, 2006.
[13]
Y. Pritch, A. Rav-Acha, and S. Peleg. Nonchronological video synopsis and indexing. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1971--1984, 2008.
[14]
N. Quang Minh Khiem, G. Ravindra, A. Carlier, and W. T. Ooi. Supporting zoomable video streams with dynamic region-of-interest cropping. In Proc. of ACM MMSYS '10, pages 259--270, Phoenix, AZ, 2010.
[15]
M. Rubinstein, A. Shamir, and S. Avidan. Improved seam carving for video retargeting. ACM Trans. Graph., 27(3):1--9, 2008.
[16]
D. Shamma, R. Shaw, P. Shafton, and Y. Liu. Watch What I Watch. In Proc. ACM MIR '07, Augsburg, Germany, 2007.
[17]
N. Ukita, T. Ono, and M. Kidode. Region extraction of a gaze object using the gaze point and view image sequences. In Proc. of the 7th International Conference on Multimodal Interfaces, pages 129--136, Torento, Italy, 2005.
[18]
D. Walther and C. Koch. Modeling attention to salient proto-objects. Neural Networks, 19:1395--1407, 2006.
[19]
X. Xie, H. Liu, S. Goumaz, and W.-Y. Ma. Learning user interest for image browsing on small-form-factor devices. In Proc. of CHI '05, pages 671--680, Portland, OR, 2005.

Cited By

View all
  • (2022)A Study of Interaction Design for E-learning System Based on Distributed Asynchronous CommunicationProceedings of the Tenth International Symposium of Chinese CHI10.1145/3565698.3565793(249-255)Online publication date: 22-Oct-2022
  • (2022)Adaptive cropping with interframe relative displacement constraint for video retargetingSignal Processing: Image Communication10.1016/j.image.2022.116666104(116666)Online publication date: May-2022
  • (2020)Sifter: A Hybrid Workflow for Theme-based Video Curation at ScaleProceedings of the 2020 ACM International Conference on Interactive Media Experiences10.1145/3391614.3393657(65-73)Online publication date: 17-Jun-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '10: Proceedings of the 18th ACM international conference on Multimedia
October 2010
1836 pages
ISBN:9781605589336
DOI:10.1145/1873951
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic zoom and pan
  2. crowdsourcing
  3. video retargeting
  4. zoomable video

Qualifiers

  • Research-article

Conference

MM '10
Sponsor:
MM '10: ACM Multimedia Conference
October 25 - 29, 2010
Firenze, Italy

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A Study of Interaction Design for E-learning System Based on Distributed Asynchronous CommunicationProceedings of the Tenth International Symposium of Chinese CHI10.1145/3565698.3565793(249-255)Online publication date: 22-Oct-2022
  • (2022)Adaptive cropping with interframe relative displacement constraint for video retargetingSignal Processing: Image Communication10.1016/j.image.2022.116666104(116666)Online publication date: May-2022
  • (2020)Sifter: A Hybrid Workflow for Theme-based Video Curation at ScaleProceedings of the 2020 ACM International Conference on Interactive Media Experiences10.1145/3391614.3393657(65-73)Online publication date: 17-Jun-2020
  • (2019)Investigating the Effect of Adding Nudges to Increase Engagement in Active Video WatchingArtificial Intelligence in Education10.1007/978-3-030-23204-7_27(320-332)Online publication date: 21-Jun-2019
  • (2018)A Survey on Content-Aware Image and Video RetargetingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/323159814:3(1-28)Online publication date: 24-Jul-2018
  • (2018)Lightweight Visualization and User Logging for Mobile 360-degree Videos2018 IEEE 11th Workshop on Software Engineering and Architectures for Real-time Interactive Systems (SEARIS)10.1109/SEARIS44442.2018.9180230(1-8)Online publication date: Mar-2018
  • (2018)A robust video watermarking based on feature regions and crowdsourcingMultimedia Tools and Applications10.1007/s11042-018-5888-677:20(26769-26791)Online publication date: 1-Oct-2018
  • (2016)Visual object trappingComputer Vision and Image Understanding10.1016/j.cviu.2016.07.007153:C(3-15)Online publication date: 1-Dec-2016
  • (2015)Automatic multi-camera remix from single videoProceedings of the 30th Annual ACM Symposium on Applied Computing10.1145/2695664.2695881(1270-1277)Online publication date: 13-Apr-2015
  • (2015)Discovering salient regions on 3D photo-textured mapsComputer Vision and Image Understanding10.1016/j.cviu.2014.07.006131:C(28-41)Online publication date: 1-Feb-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media