skip to main content
10.1145/3581641.3584047acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

SoundToons: Exemplar-Based Authoring of Interactive Audio-Driven Animation Sprites

Published: 27 March 2023 Publication History

Abstract

Animations can come to life when they are synchronized with relevant sounds. Yet, synchronizing animations to audio requires tedious key-framing or programming, which is difficult for novice creators. There are existing tools that support audio-driven live animation, but they focus primarily on speech and have little or no support for non-speech sounds. We present SoundToons, an exemplar-based authoring tool for interactive, audio-driven animation focusing on non-speech sounds. Our tool enables novice creators to author live animations to a wide variety of non-speech sounds, such as clapping and instrumental music. We support two types of audio interactions: (1) discrete interaction, which triggers animations when a discrete sound event is detected, and (2) continuous, which synchronizes an animation to continuous audio parameters. By employing an exemplar-based iterative authoring approach, we empower novice creators to design and quickly refine interactive animations. User evaluations demonstrate that novice users can author and perform live audio-driven animation intuitively. Moreover, compared to other input modalities such as trackpads or foot pedals, users preferred using audio as an intuitive way to drive animation.

References

[1]
2008. Princess Fairy Tale Maker. https://www.duckduckmoose.com/educational-iphone-itouch-apps-for-kids/princess-fairy-tale-maker
[2]
2016. The Simpsons.
[3]
Deepali Aneja and Wilmot Li. 2019. Real-Time Lip Sync for Live 2D Animation. CoRR abs/1910.08685 (2019). arxiv:1910.08685http://arxiv.org/abs/1910.08685
[4]
Deepali Aneja and Wilmot Li. 2019. Real-time lip sync for live 2d animation. arXiv preprint arXiv:1910.08685 (2019).
[5]
Connelly Barnes, David E. Jacobs, Jason Sanders, Dan B Goldman, Szymon Rusinkiewicz, Adam Finkelstein, and Maneesh Agrawala. 2008. Video Puppetry: A Performative Interface for Cutout Animation. ACM Trans. Graph. 27, 5, Article 124 (Dec. 2008), 9 pages. https://doi.org/10.1145/1409060.1409077
[6]
Amit H. Bermano, Markus Billeter, Daisuke Iwai, and Anselm Grundhöfer. 2017. Makeup Lamps: Live Augmentation of Human Faces via Projection. Comput. Graph. Forum 36, 2 (May 2017), 311–323.
[7]
Charlotte Church. 2000. Ave Maria (Dormition Abbey 2000). https://www.youtube.com/watch?v=Uch0FlNo3Go
[8]
Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, and Michael J. Black. 2019. Capture, Learning, and Synthesis of 3D Speaking Styles. arxiv:1905.03079 [cs.CV]
[9]
Rebecca Fiebrink and Perry R Cook. 2010. The Wekinator: a system for real-time, interactive machine learning in music. In Proceedings of The Eleventh International Society for Music Information Retrieval Conference (ISMIR 2010)(Utrecht), Vol. 3.
[10]
Tsukasa Fukusato and Shigeo Morishima. 2014. Automatic depiction of onomatopoeia in animation considering physical phenomena. In Proceedings of the Seventh International Conference on Motion in Games. 161–169.
[11]
Shoichi Furukawa, Tsukasa Fukusato, Shugo Yamaguchi, and Shigeo Morishima. 2017. Voice Animator: Automatic Lip-Synching in Limited Animation by Audio. In International Conference on Advances in Computer Entertainment. Springer, 153–171.
[12]
Masataka Goto, Katunobu Itou, Tomoyosi Akiba, and Satoru Hayamizu. 2001. Speech completion: New speech interface with on-demand completion assistance. In Proc. of HCI International 2001. Citeseer.
[13]
Susumu Harada, Jacob O Wobbrock, and James A Landay. 2007. Voicedraw: a hands-free voice-driven drawing application for people with motor impairments. In Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility. 27–34.
[14]
Björn Hartmann, Leith Abdulla, Manas Mittal, and Scott R. Klemmer. 2007. Authoring Sensor-Based Interactions by Demonstration with Direct Manipulation and Pattern Recognition. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). Association for Computing Machinery, New York, NY, USA, 145–154. https://doi.org/10.1145/1240624.1240646
[15]
Takeo Igarashi and John F. Hughes. 2001. Voice as Sound: Using Non-Verbal Voice Input for Interactive Control. In Proceedings of the 14th Annual ACM Symposium on User Interface Software and Technology (Orlando, Florida) (UIST ’01). Association for Computing Machinery, New York, NY, USA, 155–156. https://doi.org/10.1145/502348.502372
[16]
Rubaiat Habib Kazi, Fanny Chevalier, Tovi Grossman, and George Fitzmaurice. 2014. Kitty: Sketching Dynamic and Interactive Illustrations(UIST ’14). Association for Computing Machinery, New York, NY, USA, 395–405. https://doi.org/10.1145/2642918.2647375
[17]
Han-Jong Kim, Chang Min Kim, and Tek-Jin Nam. 2018. SketchStudio: Experience Prototyping with 2.5-Dimensional Animated Design Scenarios. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 831–843. https://doi.org/10.1145/3196709.3196736
[18]
Jong Wook Kim, Justin Salamon, Peter Li, and Juan Pablo Bello. 2018. CREPE: A Convolutional Representation for Pitch Estimation. arxiv:1802.06182 [eess.AS]
[19]
Sergey Levine, Christian Theobalt, and Vladlen Koltun. 2009. Real-time prosody-driven synthesis of body language. In ACM SIGGRAPH Asia 2009 papers. 1–10.
[20]
Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36, 6 (2017), 194:1–194:17. https://doi.org/10.1145/3130800.3130813
[21]
mamoworld tools. [n. d.]. BeatEdit for After Effects. https://aescripts.com/beatedit-for-after-effects/
[22]
Brian McFee, Vincent Lostanlen, Alexandros Metsai, Matt McVicar, Stefan Balke, Carl Thomé, Colin Raffel, Frank Zalkow, Ayoub Malek, Dana, Kyungyun Lee, Oriol Nieto, Jack Mason, Dan Ellis, Eric Battenberg, Scott Seyfarth, Ryuichi Yamamoto, Keunwoo Choi, viktorandreevichmorozov, Josh Moore, Rachel Bittner, Shunsuke Hidaka, Ziyao Wei, nullmightybofo, Darío Hereñú, Fabian-Robert Stöter, Pius Friesch, Adam Weiss, Matt Vollrath, and Taewoon Kim. 2020. librosa/librosa: 0.8.0. https://doi.org/10.5281/zenodo.3955228
[23]
Frederic Pighin and John P Lewis. 2006. Facial motion retargeting. In ACM SIGGRAPH 2006 Courses. 2–es.
[24]
Nazmus Saquib, Rubaiat Habib Kazi, Li-Yi Wei, and Wilmot Li. 2019. Interactive Body-Driven Graphics for Augmented Video Performance. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300852
[25]
Arno Schödl and Irfan A. Essa. 2002. Controlled Animation of Video Sprites. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (San Antonio, Texas) (SCA ’02). Association for Computing Machinery, New York, NY, USA, 121–127. https://doi.org/10.1145/545261.545281
[26]
Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019. First Order Motion Model for Image Animation. In Conference on Neural Information Processing Systems (NeurIPS).
[27]
J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE.
[28]
F. Thomas and O. Johnston. 1981. The Illusion of Life: Disney Animation. Disney Editions. https://books.google.co.jp/books?id=k5TMoAEACAAJ
[29]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-Video Synthesis. In Advances in Neural Information Processing Systems (NeurIPS).
[30]
Y. Wang, J. Salamon, N. J. Bryan, and J. Pablo Bello. 2020. Few-Shot Sound Event Detection. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 81–85. https://doi.org/10.1109/ICASSP40776.2020.9054708
[31]
Nora S Willett, Wilmot Li, Jovan Popovic, and Adam Finkelstein. 2017. Triggering artwork swaps for live animation. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 85–95.
[32]
Nora S Willett, Hijung Valentina Shin, Zeyu Jin, Wilmot Li, and Adam Finkelstein. 2020. Pose2Pose: Pose Selection and Transfer for 2D Character Animation. In Proceedings of the 25th International Conference on Intelligent User Interfaces (Cagliari, Italy) (IUI ’20). Association for Computing Machinery, New York, NY, USA, 88–99. https://doi.org/10.1145/3377325.3377505
[33]
Egor Zakharov, Aleksei Ivakhnenko, Aliaksandra Shysheya, and Victor Lempitsky. 2020. Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars. In European Conference on Computer Vision. Springer, 524–540.
[34]
Yang Zhou, Xintong Han, Eli Shechtman, Jose Echevarria, Evangelos Kalogerakis, and Dingzeyu Li. 2020. MakeltTalk. ACM Transactions on Graphics 39, 6 (Nov 2020), 1–15. https://doi.org/10.1145/3414685.3417774

Cited By

View all
  • (2023)Soundify: Matching Sound Effects to VideoProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606823(1-13)Online publication date: 29-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces
March 2023
972 pages
ISBN:9798400701061
DOI:10.1145/3581641
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Audio interface
  2. interactive systems
  3. live animation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • JST CREST

Conference

IUI '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)7
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Soundify: Matching Sound Effects to VideoProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606823(1-13)Online publication date: 29-Oct-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media