Abstract
Efficient video summarization leads to facilely exploring video content appropriate to the user’s intention with low cognitive demand. In this paper, we present a novel approach for summarizing videos in the form of multi-scale structures that exhibit different video features at different scale levels and allow exploration of video contents with multi-scale interaction. The semantic relationship between structures is addressed and user intention is also considered and integrated in the summarization and interaction. This paper first introduces the concept of multi-scale structures for summarizing video content and describes three different types of structures that present important features at different scale levels. Furthermore, a continuous zooming interaction for browsing multi-scale structures is provided to facilitate video browsing. Finally, an elaborate user study is conducted showing that user performance on understanding and browsing videos is improved.
Similar content being viewed by others
References
Zhang X, Furnas G W. mCVEs: Using cross-scale collaboration to support user interaction with multiscale structures. Presence-Teleoper Virtual Env, 2005, 14: 31–46
Tong L, Hong J Z. Automatic video scene extraction by shot grouping. In: 15th International Conference on Pattern Recognition, Barcelona, 2000. 39–42
Zhai Y, Shah M. Video scene segmentation using Markov chain Monte Carlo. IEEE Trans Multimedia, 2006, 8: 686–697
Liu Y J, Luo X, Xuan Y M, et al. Image retargeting quality assessment. Comput Graph Forum, 2011, 30: 583–592
Gargi U, Kasturi R, Strayer S H. Performance characterization of video-shot-change detection methods. IEEE Trans Circuits Syst Video Technol, 2000, 10: 1–13
Barnes C, Goldman D B, Shechtman E, et al. Video tapestries with continuous temporal zoom. ACM Trans Graph, 2010, 29: 89
Liu Y J, Tang K, Gong W, et al. Industrial design using interpolatory discrete developable surfaces. Comput-Aided Des, 2011, 43: 1089–1098
Ma C X, Liu Y J, Wang H A, et al. Sketch-based annotation and visualization in video authoring. IEEE Trans Multimedia, 2012, 14: 1153–1165
Ueda H, Miyatake T, Summino S, et al. Automatic structure visualization for video editing. In: INTERACT’93, Amsterdam, 1993. 137–141
Shingo U, Jonathan F, et al. Video Manga: generating semantically meaningful video summaries. In: The 7th ACM International Conference on Multimedia (Part 1), Orlando, 1999. 383–392
Collomosse J P, McNeill G, Qian Y. Storyboard sketches for content based video retrieval. In: International Conference on Computer Vision, Kyoto, 2009. 245–252
Goldman D B, Curless B, Salesin D, et al. Schematic storyboarding for video visualization and editing. ACM Trans Graph, 2006, 25: 862–871
Liu Y J, Chen Z, Tang K. Construction of iso-contours, bisectors and Voronoi diagrams on triangulated surfaces. IEEE Trans Pattern Anal Mach Intell, 2011, 33: 1502–1517
Bederson B. Quantum treemaps and bubblemaps for a zoomable image browser. In: The 14th Annual ACM Symposium on User Interface Systems and Technology, Orlando, 2001. 71–80
Shipman F, Girgensohn A, Wilcox L. Generation of interactive multi-level video summaries. In: ACM Multimedia, Berkeley, 2003. 392–401
Furnas G, Bederson B. Space-scale diagrams: understanding multiscale interfaces. In: The ACM Conference on Human Factors in Computing System, Denver, 1995. 234–241
Perlin K, Fox D. Pad: an alternative approach to the computer interface. In: Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1993. 57–64
Albrecht J, Car A. GIS analysis for scale-sensitive environmental modelling based on hierarchy theory. In: Dikau R, Saurer H, eds. GIS for Earth Surface Systems. Borntraeger, 1999. 1–23
Reitsma F, Bittner T. Scale in object and process ontologies. In: Kuhn W, Worboys M F, Timpf S, eds. Spatial Information Theory: Foundations of Geographic Information Science. Proceedings of COSIT’03. Berlin: Springer-Verlag, 2003. 13–30
Andy C, Amy K, Benjamin B, et al. A review of overview + detail, zooming, and focus + context interfaces. ACM Comput Surv, 2008, 41: 2
Bederson B, Hollan J. Pad++: a zooming graphical interface for exploring alternate interface physics. In: The ACM Symposium of User Interface Software and Technology, Marina del Rey, 1994. 17–26
Bederson B, Meyer J, Good L. Jazz: an extensible zoomable user interface graphics toolkit in Java. In: The ACM Conference on User Interface and Software Technology, San Diego, 2000. 171–180
Bederson B, Grosjean J, Meyer J. Toolkit design for interactive structured graphics. IEEE Trans Softw Eng, 2004, 30: 535–546
Pietriga E. A toolkit for addressing HCI issues in visual language environments. In: IEEE Symposium on Visual Languages and Human-Centric Computing, Dallas, 2005. 145–152
Bederson B, Clamage A, Czerwinski M P, et al. DateLens: a fisheye calendar interface for PDAs. ACM Trans Comput-Hum Interact, 2004, 11: 90–119
Liu Y J, Tang K, Joneja A. Sketch-based free-form shape modelling with a fast and stable numerical engine. Comput Graph, 2005, 29: 778–793
Liu Y J, Luo X, Joneja A, et al. User-adaptive sketch-based 3D CAD model retrieval. IEEE Trans Autom Sci Eng, 2013. doi: 10.1109/TASE.2012.2228481
Liu Y J, Ma C X, Zhang D L. Easytoy: a plush toy design system using editable sketch curves. IEEE Comput Graph Appl, 2011, 31: 49–57
Simoncelli E P, Olshausen B. Natural image statistics and neural representation. Annu Rev Neurosci, 2001, 24: 1193–1216
Kraaij W, Smeaton A, OVER P, et al. Trecvid 2004—an overview. In: TRECVID 2004-Text REtrieval Conference TRECVID Workshop, Gaithersburg, 2004
Harel J, Koch C, Perona P. Graph-based visual saliency. In: NIPS’2006, Vancouver, 2006. 545–552
Xipeng S, Matthew B, Jiebo L, et al. Multilabel machine learning and its application to semantic scene classification. In: Storage and Retrieval Methods and Applications for Multimedia’2004, San Jose, 2004. 188–199
Heng D C, Xi H J, Angela S, et al. Color image segmentation: advances and prospects. Pattern Recognit, 2001, 4: 2259–2281
Teuvo K. Self-Organizing Maps. 3rd ed. Berlin: Springer, 2001
Haykin S. Neural Network: A Comprehensive Foundation. 2nd ed. New York: Prentice-Hall, 1999
Kang H, Lee S, Chui C K. Coherent line drawing. In: Proceedings of the 5th International Symposium on Non-Photorealistic Animation and Rendering, San Diego, 2007. 43–50
Bhat P, Zitnick C L, Cohen M, et al. GradientShop: a gradient-domain optimization framework for image and video filtering. ACM Trans Graph, 2010, 29: 10
Rubine D. Specifying gestures by example. ACM SIGGRAPH Comput Graph, 1991, 25: 329–337
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, H., Ma, C. Interactive multi-scale structures for summarizing video content. Sci. China Inf. Sci. 56, 1–12 (2013). https://doi.org/10.1007/s11432-013-4833-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-013-4833-6