Abstract
Given the current profusion of devices for viewing media, video content created at one aspect ratio is often viewed on displays with different aspect ratios. Many previous solutions address this problem by retargeting or resizing the video, but a more general solution would re-edit the video for the new display. Our method employs the three primary editing operations: pan, cut, and zoom. We let viewers implicitly reveal what is important in a video by tracking their gaze as they watch the video. We present an algorithm that optimizes the path of a cropping window based on the collected eyetracking data, finds places to cut, and computes the size of the cropping window. We present results on a variety of video clips, including close-up and distant shots, and stationary and moving cameras. We conduct two experiments to evaluate our results. First, we eyetrack viewers on the result videos generated by our algorithm, and second, we perform a subjective assessment of viewer preference. These experiments show that viewer gaze patterns are similar on our result videos and on the original video clips, and that viewers prefer our results to an optimized crop-and-warp algorithm.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Gaze-Driven Video Re-Editing
- W. Abbot and F. Aldo. 2011. Ultra-low cost eyetracking as an high information throughput alternative to BMIS. BMC Neurosci. 12,1.Google Scholar
- J. S. Agustin, H. Skovsgaard, E. Mollenbach, M. Barret, M. Tall, D. W. Hansen, and J. P. Hansen. 2010. Evaluation of a low-cost open-source gaze tracker. In Proceedings of the Symposium on Eyetracking Research and Applications (ETRA'10). 77—80. Google ScholarDigital Library
- S. Avidan and A. Shamir. 2007. Seam carving for content-aware image resizing. ACM Trans. Graph. 26, 3. Google ScholarDigital Library
- F. Baluch and L. Itti. 2011. Mechanisms of top-down attention. Trends Neurosci. 34, 210--224.Google ScholarCross Ref
- S. Castillo, T. Judd, and D. Gutierrez. 2011. Using eye-tracking to assess different image retargeting methods. In Proceedings of the Symposium on Applied Perception in Graphics and Visualization (APGV'11). Google ScholarDigital Library
- C. Chamaret and O. Le Meur. 2008. Attention-based video reframing: Validation using eye-tracking. In Proceedings of the International Conference on Pattern Recognition (ICPR'08).Google Scholar
- D. DeCarlo and A. Santella. 2002. Stylization and abstraction of photographs. ACM Trans. Graph. 21, 3, 769--776. Google ScholarDigital Library
- T. Deselaers, P. Dreuw, and H. Ney. 2008. Pan, zoom, scan -- Time coherent, trained automatic video cropping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08). 1--8.Google Scholar
- E. Dmytryk. 1984. On Film Editing. Focal Press.Google Scholar
- M. Dorr, T. Martinetz, K. Gegenfurtner, and E. Barth. 2010. Variability of eye movements when viewing dynamic natural scenes. J. Vis. 10, 10.Google ScholarCross Ref
- H. El-Alfy, D. Jacobs, and L. Davis. 2007. Multi-scale video cropping. In Proceedings of the 15th ACM International Conference on Multimedia (MULTIMEDIA'07). 97--106. Google ScholarDigital Library
- E. Erdfelder, F. Faul, and A. Buchner. 1996. Gpower: A general power analysis program. Behav. Res. Meth. Instrum. Comput. 28, 1, 1--11.Google ScholarCross Ref
- M. A. Fischler and R. C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. ACM 24, 381--395. Google ScholarDigital Library
- J. D. Foley, A. Van Dam, S. K. Feiner, and J. F. Hughes. 1996. Computer Graphics Principles and Practice 2nd Ed. Addison-Wesley. Google ScholarDigital Library
- R. B. Goldstein, R. L. Woods, and E. Peli. 2007. Where people look when watching movies: Do all viewers look at the same place? Comput. Biol. Med. 37, 7, 957--964. Google ScholarDigital Library
- C. G. Healey and A. P. Sawant. 2012. On the limits of resolution and visual angle in visualization. ACM Trans. Appl. Percept. 9, 4, 20:1--20:21. Google ScholarDigital Library
- E. Jain, Y. Sheikh, and J. Hodgins. 2012. Inferring artistic intention in comic art through viewer gaze. In Proceedings of the ACM Symposium on Applied Perception (SAP'12). Google ScholarDigital Library
- T. Judd, F. Durand, and A. Torralba. 2012. A benchmark of computational models of saliency to predict human fixations. Tech. rep. MITCSAIL-TR-2012-001, Massachusetts Institute of Technology. http://dspace.mit.edu/handle/1721.1/68590.Google Scholar
- H. Katti, A. K. Rajagopal, M. Kankanhalli, and R. Kalpathi. 2014. Online estimation of evolving human visual interest. ACM Trans. Multimedia Comput. Comm. Appl. 11, 1. Google ScholarDigital Library
- S. D. Katz. 1991. Shot by Shot. Michael Wiese Productions, Focal Press.Google Scholar
- H. Knoche, J. McCarthy, and M. Sasse. 2008. How low can you go? The effect of low resolutions on shot types in mobile tv. Multimedia Tools Appl. 36, 1--2, 145--166. Google ScholarDigital Library
- S. Kopf, T. Haenselmann, J. Kiess, B. Guthier, and W. Effelsberg. 2011. Algorithms for video retargeting. Multimedia Tools Appl. 51, 2, 819--861. Google ScholarDigital Library
- P. Krähenbühl, M. Lang, A. Hornung, and M. Gross. 2009. A system for retargeting of streaming video. ACM Trans. Graph. 28, 126:1--126:10. Google ScholarDigital Library
- F. Liu and M. Gleicher. 2006. Video retargeting: Automating pan and scan. In Proceedings of the ACM International Conference on Multimedia (MULTIMEDIA'06). 241--250. Google ScholarDigital Library
- L. Liu, R. Chen, L. Wolf, and D. Cohen-Or. 2010. Optimizing photo composition. Comput. Graph. Forum 29, 2, 469--478.Google ScholarCross Ref
- P. K. Mital, T. J. Smith, R. L. Hill, and J. M. Henderson. 2010. Clustering of gaze during dynamic scene viewing is predicted by motion. Cogn. Comput. 3, 1, 5--24.Google ScholarCross Ref
- Y. Niu, F. Liu, X. Li, and M. Gleicher. 2010. Warp propagation for video resizing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10). 537--544.Google Scholar
- M. Rubinstein, A. Shamir, and S. Avidan. 2008. Improved seam carving for video retargeting. ACM Trans. Graph. 27, 3, 16:1--16:9. Google ScholarDigital Library
- D. Rudoy, D. B. Goldman, E. Shechtman, and L. Zelnik-Manor. 2012. Crowdsourcing gaze data collection. http://arxiv.org/abs/1204. 3367.Google Scholar
- A. Santella, M. Agrawala, D. DeCarlo, D. Salesin, and M. Cohen. 2006. Gaze-based interaction for semi-automatic photo cropping. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI'06). 771--780. Google ScholarDigital Library
- A. Shamir and O. Sorkine. 2009. Visual media retargeting. In Proceedings of the 1st ACM SIGGRAPH Conference and Exhibition in Asia (SIGGRAPH-ASIA'09). 11:1--11:13. Google ScholarDigital Library
- T. J. Smith and J. M. Henderson. 2008. Edit blindness: The relationship between attention and global change blindness in dynamic scenes. J. Eye Movement Res. 2, 2, 1--17.Google Scholar
- C. Tao, J. Jia, and H. Sun. 2007. Active window oriented dynamic video retargeting. In Proceedings of the Workshop on Dynamical Vision at the International Conference on Computer Vision (ICCV'07).Google Scholar
- J. Wang, M. J. T. Reinders, R. L. Lagendijk, J. Lindenberg, and M. S. Kankanhalli. 2004. Video content representation on tiny devices. In Proceedings of the IEEE Conference on Multimedia and Expo (ICME'04). 1711--1714.Google Scholar
- Y.-S. Wang, H. Fu, O. Sorkine, T.-Y. Lee, and H.-P. Seidel. 2009. Motion-aware temporal coherence for video resizing. ACM Trans. Graph. 28, 127:1--127:10. Google ScholarDigital Library
- Y.-S. Wang, J.-H. Hsiao, O. Sorkine, and T.-Y. Lee. 2011. Scalable and coherent video resizing with per-frame optimization. ACM Trans. Graph. 30, 4, 88:1--88:8. Google ScholarDigital Library
- Y.-S. Wang, H.-C. Lin, O. Sorkine, and T.-Y. Lee. 2010. Motionbased video retargeting with optimized crop-and-warp. ACM Trans. Graph. 29, 90:1--90:9. Google ScholarDigital Library
- Y.-S. Wang, C.-L. Tai, O. Sorkine, and T.-Y. Lee. 2008. Optimized scale-and-stretch for image resizing. ACM Trans. Graph. 27, 118:1--118:8. Google ScholarDigital Library
- Wikipedia. 2015. http://en.wikipedia.org/wiki/pan_and_scan.Google Scholar
- Y. Y. Xiang and M. S. Kankanhalli. 2010a. Automated aesthetic enhancement of videos. In Proceedings of the ACM International Conference on Multimedia (MM'10). 218--290. Google ScholarDigital Library
- Y.-Y. Xiang and M. S. Kankanhalli. 2010b. Video retargeting for aesthetic enhancement. In Proceedings of the ACM International Conference on Multimedia (MM'10). 919--922. Google ScholarDigital Library
- J. Young. 2008. Sydney Pollack dies at 73. Variety, May 26.Google Scholar
- Q. Zhao and C. Koch. 2012. Learning visual saliency by combining feature maps in a nonlinear manner using adaboost. J. Vis. 12, 6.Google ScholarCross Ref
Index Terms
- Gaze-Driven Video Re-Editing
Recommendations
Computational video editing for dialogue-driven scenes
We present a system for efficiently editing video of dialogue-driven scenes. The input to our system is a standard film script and multiple video takes, each capturing a different camera framing or performance of the complete scene. Our system then ...
Multi-clip video editing from a single viewpoint
CVMP '14: Proceedings of the 11th European Conference on Visual Media ProductionWe propose a framework for automatically generating multiple clips suitable for video editing by simulating pan-tilt-zoom camera movements within the frame of a single static camera. Assuming important actors and objects can be localized using computer ...
Automatic Video Editing for Video-Based Interactive Storytelling
ICME '12: Proceedings of the 2012 IEEE International Conference on Multimedia and ExpoThe development of interactive narratives with the quality of feature films is the central challenge of what we can name Video-Based Interactive Storytelling. A promising approach to this question is the use of prerecorded videos with real actors. ...
Comments