skip to main content
10.1145/3450618.3469163acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
poster

Text-Based Motion Synthesis with a Hierarchical Two-Stream RNN

Published: 06 August 2021 Publication History

Abstract

We present a learning-based method for generating animated 3D pose sequences depicting multiple sequential or superimposed actions provided in long, compositional sentences. We propose a hierarchical two-stream sequential model to explore a finer joint-level mapping between natural language sentences and the corresponding 3D pose sequences of the motions. We learn two manifold representations of the motion –- one each for the upper body and the lower body movements. We evaluate our proposed model on the publicly available KIT Motion-Language Dataset containing 3D pose data with human-annotated sentences. Experimental results show that our model advances the state-of-the-art on text-based motion synthesis in objective evaluations by a margin of 50%.

Supplementary Material

VTT File (3450618.3469163.vtt)
MP4 File (3450618.3469163.mp4)
Presentation.

References

[1]
Chaitanya Ahuja and Louis-Philippe Morency. 2019. Language2Pose: Natural Language Grounded Pose Forecasting. In 2019 International Conference on 3D Vision (3DV). 719–728. https://doi.org/10.1109/3DV.2019.00084
[2]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).
[3]
Eva Hanser, Paul Mc Kevitt, Tom Lunney, and Joan Condell. 2009. Scenemaker: Intelligent multimodal visualisation of natural language scripts. In Irish Conference on Artificial Intelligence and Cognitive Science. Springer, 144–153.
[4]
Angela S Lin, Lemeng Wu, Rodolfo Corona, Kevin Tai, Qixing Huang, and Raymond J Mooney. 2018. Generating animated videos of human activities from natural language descriptions. Visually Grounded Interaction and Language Workshop, NeurIPS (2018), 2.
[5]
Matthias Plappert, Christian Mandery, and Tamim Asfour. 2016. The KIT Motion-Language Dataset. Big Data 4, 4 (dec 2016), 236–252. https://doi.org/10.1089/big.2016.0028

Cited By

View all
  • (2024)ASAP for multi-outputs: auto-generating storyboard and pre-visualization with virtual actors based on screenplayMultimedia Tools and Applications10.1007/s11042-024-19904-3Online publication date: 3-Aug-2024
  • (2024)Large Motion Model for Unified Multi-modal Motion GenerationComputer Vision – ECCV 202410.1007/978-3-031-72624-8_23(397-421)Online publication date: 26-Oct-2024
  • (2022)AvatarCLIPACM Transactions on Graphics10.1145/3528223.353009441:4(1-19)Online publication date: 22-Jul-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGGRAPH '21: ACM SIGGRAPH 2021 Posters
August 2021
90 pages
ISBN:9781450383714
DOI:10.1145/3450618
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2021

Check for updates

Qualifiers

  • Poster
  • Research
  • Refereed limited

Conference

SIGGRAPH '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 704 of 3,473 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ASAP for multi-outputs: auto-generating storyboard and pre-visualization with virtual actors based on screenplayMultimedia Tools and Applications10.1007/s11042-024-19904-3Online publication date: 3-Aug-2024
  • (2024)Large Motion Model for Unified Multi-modal Motion GenerationComputer Vision – ECCV 202410.1007/978-3-031-72624-8_23(397-421)Online publication date: 26-Oct-2024
  • (2022)AvatarCLIPACM Transactions on Graphics10.1145/3528223.353009441:4(1-19)Online publication date: 22-Jul-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media