skip to main content

SmartShots: An Optimization Approach for Generating Videos with Data Visualizations Embedded

Published: 04 March 2022 Publication History


Videos are well-received methods for storytellers to communicate various narratives. To further engage viewers, we introduce a novel visual medium where data visualizations are embedded into videos to present data insights. However, creating such data-driven videos requires professional video editing skills, data visualization knowledge, and even design talents. To ease the difficulty, we propose an optimization method and develop SmartShots, which facilitates the automatic integration of in-video visualizations. For its development, we first collaborated with experts from different backgrounds, including information visualization, design, and video production. Our discussions led to a design space that summarizes crucial design considerations along three dimensions: visualization, embedded layout, and rhythm. Based on that, we formulated an optimization problem that aims to address two challenges: (1) embedding visualizations while considering both contextual relevance and aesthetic principles and (2) generating videos by assembling multi-media materials. We show how SmartShots solves this optimization problem and demonstrate its usage in three cases. Finally, we report the results of semi-structured interviews with experts and amateur users on the usability of SmartShots.


E-tailing Group. 2013. How Consumers Shop with Video: Based on a 4Q 2012 Research Study of 1000 Consumers. Technical Report.
GitHub. 2019. G2. Retrieved March 31, 2019 from
OpenCV. 2019. OpenCV. Retrieved March 31, 2019 from
React. 2019. React. Retrieved March 31, 2019 from
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 265–283.
Amazon. 2005. Amazon Mechanical Turk. Retrieved March 31, 2019 from
Fereshteh Amini, Nathalie Henry Riche, Bongshin Lee, Christophe Hurter, and Pourang Irani. 2015. Understanding data videos: Looking at narrative visualization through the cinematography lens. In Proceedings of ACM Conference on Human Factors in Computing Systems. 1459–1468.
Fereshteh Amini, Nathalie Henry Riche, Bongshin Lee, Jason Leboe-McGowan, and Pourang Irani. 2018. Hooked on data videos: Assessing the effect of animation and pictographs on viewer engagement. In Proceedings of the Working Conference on Advanced Visual Interfaces. 1–9.
Fereshteh Amini, Nathalie Henry Riche, Bongshin Lee, Andres Monroy-Hernandez, and Pourang Irani. 2017. Authoring data-driven videos with dataclips. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2017), 501–510.
Helen Y. Balinsky, Anthony J. Wiley, and Matthew C. Roberts. 2009. Aesthetic measure of alignment and regularity. In Proceedings of ACM Symposium on Document Engineering. 56–65.
Ronald C. Barker and Chester L. Schuler. 1985. Video composition method and apparatus. US Patent 4,538,188.
Gavin Bierman, Martín Abadi, and Mads Torgersen. 2014. Understanding typescript. In Proceedings of the European Conference on Object-Oriented Programming. 257–281.
Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D\(^3\): Data-driven documents. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 2301–2309.
Jere Brophy. 2003. Using Video in Teacher Education. Emerald Group Publishing Limited.
Zhutian Chen, Yijia Su, Yifang Wang, Qianwen Wang, Huamin Qu, and Yingcai Wu. 2020. MARVisT: Authoring glyph-based visualization in mobile augmented reality. IEEE Transactions on Visualization and Computer Graphics 26, 8 (2020), 2645–2658.
Chong-Wah Ngo, Yu-Fei Ma, and Hong-Jiang Zhang. 2003. Automatic video summarization by graph modeling. In Proceedings of IEEE Conference on Computer Vision. 104–109.
D. Coelho and K. Mueller. 2020. Infomages: Embedding data into thematic images. Computer Graphics Forum 39, 3 (2020), 593–606.
Daniel Cohen-Or, Olga Sorkine, Ran Gal, Tommer Leyvand, and Ying-Qing Xu. 2006. Color harmonization. ACM Transactions on Graphics 25, 3 (2006), 624–630.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3rd ed.). MIT Press, Cambridge, MA.
Food and Agriculture Organization of the United Nations. 2018. The State of World’s Forests. Retrieved March 31, 2019 from
T. Ge, Y. Zhao, B. Lee, D. Ren, B. Chen, and Y. Wang. 2020. Canis: A high-level language for data-driven chart animations. Computer Graphics Forum 39, 3 (2020), 607–617.
Jinlian Guo, Tao Mei, Falin Liu, and Xian-Sheng Hua. 2009. AdOn: An intelligent overlay video advertising system. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 628–629.
Roger Harrabin. 2019. China and India Help Make Planet Leafier. Retrieved March 31, 2019 from
Mark Harrower and Cynthia A. Brewer. 2003. An online tool for selecting colour schemes for maps. Cartographic Journal 40, 1 (2003), 27–37.
Jeffrey Heer and George G. Robertson. 2007. Animated transitions in statistical data graphics. IEEE Transactions on Visualization and Computer Graphics 13, 6 (2007), 1240–1247.
Srinidhi Hegde, Jitender Maurya, Aniruddha Kalkar, and Ramya Hebbalaguppe. 2020. SmartOverlays: A visual saliency driven label placement for intelligent human-computer interfaces. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1121–1130.
Yongtao Hu, Jan Kautz, Yizhou Yu, and Wenping Wang. 2015. Speaker-following video subtitles. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 2 (2015), 1–17.
Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, et al. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7310–7311.
Robert Kosara and Jock Mackinlay. 2013. Storytelling: The next step for visualization. IEEE Computer 46, 5 (2013), 44–50.
Kuno Kurzhals, Fabian Göbel, Katrin Angerbauer, Michael Sedlmair, and Martin Raubal. 2020. A view on the viewer: Gaze-adaptive captions for videos. In Proceedings of ACM Conference on Human Factors in Computing Systems. 1–12.
Bongshin Lee, Nathalie Henry Riche, Petra Isenberg, and Sheelagh Carpendale. 2015. More than telling a story: Transforming data into visually shared stories. IEEE Computer Graphics and Applications 35, 5 (2015), 84–90.
William Lidwell, Kritina Holden, and Jill Butler. 2010. Proximity. In Universal Principles of Design. Rockport Publishers, Beverly, MA, 196–197.
Simon Lok, Steven Feiner, and Gary Ngai. 2004. Evaluation of visual balance for automated layout. In Proceedings of the ACM Conference on Intelligent User Interfaces. 101–108.
Junhua Lu, Jie Wang, Hui Ye, Yuhui Gu, Zhiyu Ding, Mingliang Xu, and Wei Chen. 2020. Illustrating changes in time-series data with data video. IEEE Computer Graphics and Applications 40, 2 (2020), 18–31.
Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and music signal analysis in Python. In Proceedings of the Python in Science Conference. 18–25.
Ann McNamara, Katherine Boyd, Joanne George, Weston Jones, Somyung Oh, and Annie Suther. 2019. Information placement in virtual reality. In Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces. IEEE, Los Alamitos, CA, 1765–1769.
Tao Mei, Xian-Sheng Hua, Linjun Yang, and Shipeng Li. 2007. VideoSense: Towards effective online video advertising. In Proceedings of the ACM International Conference on Multimedia. 1075–1084.
Tamara Munzner. 2014. Visualization Analysis and Design. CRC Press, Boca Raton, FL.
Peter O’Donovan, Aseem Agarwala, and Aaron Hertzmann. 2014. Learning layouts for single-page graphic designs. IEEE Transactions on Visualization and Computer Graphics 20, 8 (2014), 1200–1213.
Peter O’Donovan, Aseem Agarwala, and Aaron Hertzmann. 2015. DesignScape: Design with interactive layout suggestions. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 1221–1224.
John Pavlik. 2000. The impact of technology on journalism. Journalism Studies 1, 2 (2000), 229–237.
Lisa Purvis, Steven Harrington, Barry O’Sullivan, and Eugene C. Freuder. 2003. Creating personalized documents: An optimization approach. In Proceedings of the ACM Symposium on Document Engineering. 68–77.
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149.
Hans Rosling. 2009. Gapminder. Gapminder Foundation. Retrieved March 31, 2019 from
Edward Segel and Jeffrey Heer. 2010. Narrative visualization: Telling stories with data. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 1139–1148.
Tan Tang, Sadia Rubab, Jiewen Lai, Weiwei Cui, Lingyun Yu, and Yingcai Wu. 2019. iStoryline: Effective convergence to hand-drawn storylines. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 769–778.
Tan Tang, Junxiu Tang, Jiayi Hong, Lingyun Yu, Peiran Ren, and Yingcai Wu. 2020. Design guidelines for augmenting short-form videos using animated data visualizations. Journal of Visualization 23 (2020), 707–720.
Tan Tang, Junxiu Tang, Jiewen Lai, Lu Ying, Peiran Ren, Lingyun Yu, and Yingcai Wu. 2020. SmartShots: Enabling automatic generation of videos with data visualizations embedded. In Proceedings of the 28th ACM International Conference on Multimedia. 4509–4511.
Tao Mei, Xian-Sheng Hua, and Shipeng Li. 2009. VideoSense: A contextual in-video advertising system. IEEE Transactions on Circuits and Systems for Video Technology 19, 12 (2009), 1866–1879.
Dejan Todorovic. 2008. Gestalt principles. Scholarpedia 3, 12 (2008), 5345.
Masataka Tokumaru, Noriaki Muranaka, and Shigeru Imanishi. 2002. Color design support system considering color harmony. In Proceedings of the IEEE Conference on Fuzzy Systems. 378–383.
D. Van Krevelen and R. Poelman. 2010. A survey of augmented reality: Technologies, applications, and limitations. International Journal of Virtual Reality 9, 2 (2010), 1.
Jinjun Wang, Engsiong Chng, and Changsheng Xu. 2006. Fully and semi-automatic music sports video composition. In Proceedings of the IEEE Conference on Multimedia and Expo. 1897–1900.
Jinjun Wang, Changsheng Xu, Engsiong Chng, Lingyu Duan, Kongwah Wan, and Qi Tian. 2005. Automatic generation of personalized music sports video. In Proceedings of the ACM Conference on Multimedia. 735–744.
Yunhai Wang, Xin Chen, Tong Ge, Chen Bao, Michael Sedlmair, Chi-Wing Fu, Oliver Deussen, and Baoquan Chen. 2019. Optimizing color assignment for perception of class separability in multiclass scatterplots. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 820–829.
Yunhai Wang, Fubo Han, Lifeng Zhu, Oliver Deussen, and Baoquan Chen. 2018. Line graph or scatter plot? Automatic selection of methods for visualizing trends in time series. IEEE Transactions on Visualization and Computer Graphics 24, 2 (2018), 1141–1154.
Wikipedia. 2019. Vlog. Retrieved March 31, 2019 from
Wesley Willett, Yvonne Jansen, and Pierre Dragicevic. 2017. Embedded data representations. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2017), 461–470.
Karthik Yadati, Harish Katti, and Mohan Kankanhalli. 2014. CAVVA: Computational affective video-in-video advertising. IEEE Transactions on Multimedia 16, 1 (2014), 15–23.
Xuyong Yang, Tao Mei, Ying-Qing Xu, Yong Rui, and Shipeng Li. 2016. Automatic generation of visual-textual presentation layout. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 2 (2016), 1–22.
Jiajing Zhang, Jinhui Yu, Kang Zhang, Xianjun Sam Zheng, and Junsong Zhang. 2017. Computational aesthetic evaluation of logos. ACM Transactions on Applied Perception 14, 3 (2017), 1–21.
Jiayi Eris Zhang, Nicole Sultanum, Anastasia Bezerianos, and Fanny Chevalier. 2020. DataQuilt: Extracting visual elements from images to craft pictorial visualizations. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 1–13.
Yunke Zhang, Kangkang Hu, Peiran Ren, Changyuan Yang, Weiwei Xu, and Xian-Sheng Hua. 2017. Layout style modeling for automating banner design. In Proceedings of the ACM Conference on Multimedia Thematic Workshops. 451–459.
Ying Zhao, Haojin Jiang, Qi’an Chen, Yaqi Qin, Yitao Wu, Shixia Liu, Zhiguang Zhou, Jiazhi Xia, and Fangfang Zhou. 2021. Preserving minority structures in graph sampling. IEEE Transactions on Visualization and Computer Graphics 27 (2021), 1698–1708.

Cited By

View all
  • (2025)VisTellAR: Embedding Data Visualization to Short-Form Videos Using Mobile Augmented RealityIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337210431:3(1862-1874)Online publication date: 1-Mar-2025
  • (2024)The Generative Fairy Tale of Scary Little Red Riding HoodProceedings of the 2024 ACM International Conference on Interactive Media Experiences10.1145/3639701.3656303(129-144)Online publication date: 7-Jun-2024
  • (2024)Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI CollaborationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642726(1-19)Online publication date: 11-May-2024
  • Show More Cited By

Index Terms

  1. SmartShots: An Optimization Approach for Generating Videos with Data Visualizations Embedded



      Information & Contributors


      Published In

      cover image ACM Transactions on Interactive Intelligent Systems
      ACM Transactions on Interactive Intelligent Systems  Volume 12, Issue 1
      March 2022
      206 pages
      Issue’s Table of Contents


      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 March 2022
      Accepted: 01 August 2021
      Revised: 01 May 2021
      Received: 01 August 2020
      Published in TIIS Volume 12, Issue 1


      Request permissions for this article.

      Check for updates

      Author Tags

      1. Visualization
      2. data-driven videos
      3. optimization


      • Research-article
      • Refereed

      Funding Sources

      • NSFC
      • NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization
      • Zhejiang Provincial Natural Science Foundation
      • Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies
      • Microsoft Research Asia
      • XJTLU Research Development Funding


      Other Metrics

      Bibliometrics & Citations


      Article Metrics

      • Downloads (Last 12 months)116
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 03 Mar 2025

      Other Metrics


      Cited By

      View all
      • (2025)VisTellAR: Embedding Data Visualization to Short-Form Videos Using Mobile Augmented RealityIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337210431:3(1862-1874)Online publication date: 1-Mar-2025
      • (2024)The Generative Fairy Tale of Scary Little Red Riding HoodProceedings of the 2024 ACM International Conference on Interactive Media Experiences10.1145/3639701.3656303(129-144)Online publication date: 7-Jun-2024
      • (2024)Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI CollaborationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642726(1-19)Online publication date: 11-May-2024
      • (2024)From Data to Story: Towards Automatic Animated Data Video Creation with LLM-Based Multi-Agent Systems2024 IEEE VIS Workshop on Data Storytelling in an Era of Generative AI (GEN4DS)10.1109/GEN4DS63889.2024.00008(20-27)Online publication date: 13-Oct-2024
      • (2023)Automated Conversion of Music Videos into Lyric VideosProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606757(1-11)Online publication date: 29-Oct-2023
      • (2023)Designing for Visualization in Motion: Embedding Visualizations in Swimming VideosIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.334199030:3(1821-1836)Online publication date: 12-Dec-2023

      View Options

      Login options

      Full Access

      View options


      View or Download as a PDF file.



      View online with eReader.


      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format






      Share this Publication link

      Share on social media