skip to main content
10.1145/3338533.3366602acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Learn to Gesture: Let Your Body Speak

Published: 10 January 2020 Publication History

Abstract

Presentation is one of the most important and vivid methods to deliver information to audience. Apart from the content of presentation, how the speaker behaves during presentation makes a big difference. In other words, gestures, as part of the visual perception and synchronized with verbal information, express some subtle information that the voice or words alone cannot deliver. One of the most effective ways to improve presentation is to practice through feedback/suggestions by an expert. However, hiring human experts is expensive thus impractical most of the time. Towards this end, we propose a speech to gesture network (POSE) to generate exemplary body language given a vocal behavior speech as input. Specifically, we build an "expert" Speech-Gesture database based on the featured TED talk videos, and design a two-layer attentive recurrent encoder-decoder network to learn the translation from speech to gesture, as well as the hierarchical structure within gestures. Lastly, given a speech audio sequence, the appropriate gesture will be generated and visualized for a more effective communication. Both objective and subjective validation show the effectiveness of our proposed method.

References

[1]
Paul Boersma. 2002. Praat, a system for doing phonetics by computer. Glot international 5 (2002), 341--345.
[2]
Elif Bozkurt, Yücel Yemez, and Engin Erzin. 2016. Multi-modal analysis of speech and arm motion for prosody-driven synthesis of beat gestures. Speech Communication 85 (2016), 29--42.
[3]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. In IEEE CVPR. 1302--1310.
[4]
Ginevra Castellano, Santiago D. Villalba, and Antonio Camurri. 2007. Recognising Human Emotions from Body Movement and Gesture Dynamics. In International Conference on Affective Computing and Intelligent Interaction. 71--82.
[5]
Lei Chen, Chee Wee Leong, Gary Feng, and Chong Min Lee. 2014. Using Multi-modal Cues to Analyze MLA'14 Oral Presentation Quality Corpus: Presentation Delivery and Slides Quality. In ACM MLA Workshop. 45--52.
[6]
Norah E Dunbar, Catherine F Brooks, and Tara Kubicka-Miller. 2006. Oral communication skills in higher education: Using a performance-based evaluation rubric to assess communication skills. Innovative Higher Education 31, 2 (2006), 115.
[7]
Tian Gan, Yongkang Wong, Bappaditya Mandal, Vijay Chandrasekhar, and Mohan S. Kankanhalli. 2015. Multi-sensor Self-Quantification of Presentations. In ACM MM. 601--610.
[8]
Susan Goldin-Meadow. 1999. The role of gesture in communication and thinking. Trends in cognitive sciences (1999), 419--429.
[9]
Luis Gonzalez-Abril, Jose M Gavilan, and Francisco Velasco Morente. 2014. Three Similarity Measures between One-Dimensional DataSets. Revista Colombiana de Estadística 37, 1 (2014), 79--94.
[10]
Autumn B Hostetter. 2011. When do gestures communicate? A meta-analysis. Psychological bulletin & review 137, 2 (2011), 297.
[11]
Autumn B Hostetter and Martha W Alibali. 2008. Visible embodiment: Gestures as simulated action. Psychonomic bulletin & review 15, 3 (2008), 495--514.
[12]
Adam Kendon. 1986. Current issues in the study of gesture. The biological foundations of gestures: Motor and semiotic aspects 1 (1986), 23--47.
[13]
Junnan Li, Yongkang Wong, and Mohan S. Kankanhalli. 2016. Multi-stream Deep Learning Framework for Automated Presentation Assessment. In IEEE ISM. 222--225.
[14]
Yan Li, Tianshu Wang, and Heung-Yeung Shum. 2002. Motion texture: a two-level statistical model for character motion synthesis. ACM ToG. 21, 3 (2002), 465--472.
[15]
David McNeill. 1992. Hand and mind: What gestures reveal about thought. University of Chicago Press.
[16]
Yajie Miao, Hao Zhang, and Florian Metze. 2015. Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors. IEEE/ACM ToASLP 23, 11 (2015), 1938--1949. https://doi.org/10.1109/TASLP.2015.2457612
[17]
Izidor Mlakar, Zdravko Kačič, and Matej Rojc. 2013. TTS-driven Synthetic Behaviour-generation Model for Artificial Bodies. International Journal of Advanced Robotic Systems 10, 10 (2013), 344.
[18]
Nikolaos Pappas and Andrei Popescu-Belis. 2013. Sentiment analysis of user comments for one-class collaborative filtering over ted talks. In SIGIR. 773--776.
[19]
Ken Perlin. 1997. Layered compositing of facial expression. In ACM SIGGRAPH. 226--227.
[20]
Maha Salem, Stefan Kopp, Ipke Wachsmuth, Katharina J. Rohlfing, and Frank Joublin. 2012. Generation and Evaluation of Communicative Robot Gesture. International Journal of Social Robotics 4, 2 (2012), 201--217.
[21]
Mehmet Emre Sargin, Oya Aran, Alexey Karpov, Ferda Ofli, Yelena Yasinnik, Stephen Wilson, Engin Erzin, Yücel Yemez, and A. Murat Tekalp. 2006. Combined Gesture-Speech Analysis and Speech Driven Gesture Synthesis. In IEEE ICME. 893--896.
[22]
Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE ToSP 45, 11 (1997), 2673--2681.
[23]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In NIPS. 3104--3112.
[24]
Hainan Xu and Philipp Koehn. 2017. Zipporah: a Fast and Scalable Data Cleaning System for Noisy Web-Crawled Parallel Corpora. In EMNLP. 2945--2950.

Index Terms

  1. Learn to Gesture: Let Your Body Speak
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        MMAsia '19: Proceedings of the 1st ACM International Conference on Multimedia in Asia
        December 2019
        403 pages
        ISBN:9781450368414
        DOI:10.1145/3338533
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 10 January 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • National Natural Science Foundation of China
        • The Future Talents Research Funds of Shandong University
        • The Fundamental Research Funds of Shandong University
        • The Shandong Provincial Natural Science and Foundation
        • The Project of Thousand Youth Talents 2016

        Conference

        MMAsia '19
        Sponsor:
        MMAsia '19: ACM Multimedia Asia
        December 15 - 18, 2019
        Beijing, China

        Acceptance Rates

        MMAsia '19 Paper Acceptance Rate 59 of 204 submissions, 29%;
        Overall Acceptance Rate 59 of 204 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 137
          Total Downloads
        • Downloads (Last 12 months)5
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 15 Feb 2025

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media