Loading [MathJax]/extensions/MathMenu.js
Speech-Driven Gesture Generation Using Transformer-Based Denoising Diffusion Probabilistic Models | IEEE Journals & Magazine | IEEE Xplore

Speech-Driven Gesture Generation Using Transformer-Based Denoising Diffusion Probabilistic Models


Abstract:

While it is crucial for human-like avatars to perform co-speech gestures, existing approaches struggle to generate natural and realistic movements. In the present study, ...Show More

Abstract:

While it is crucial for human-like avatars to perform co-speech gestures, existing approaches struggle to generate natural and realistic movements. In the present study, a novel transformer-based denoising diffusion model is proposed to generate co-speech gestures. Moreover, we introduce a practical sampling trick for diffusion models to maintain the continuity between the generated motion segments while improving the within-segment motion likelihood and naturalness. Our model can be used for online generation since it generates gestures for a short segment of speech, e.g., 2 s. We evaluate our model on two large-scale speech-gesture datasets with finger movements using objective measurements and a user study, showing that our model outperforms all other baselines. Our user study is based on the Metahuman platform in the Unreal Engine, a popular tool for creating human-like avatars and motions.
Published in: IEEE Transactions on Human-Machine Systems ( Volume: 54, Issue: 6, December 2024)
Page(s): 733 - 742
Date of Publication: 09 October 2024

ISSN Information:

Funding Agency:


References

References is not available for this document.