An audio-driven dancing avatar

Ofli, Ferda; Demir, Yasemin; Yemez, Yücel; Erzin, Engin; Tekalp, A. Murat; Balcı, Koray; Kızoğlu, İdil; Akarun, Lale; Canton-Ferrer, Cristian; Tilmanne, Joëlle; Bozkurt, Elif; Erdem, A. Tanju

doi:10.1007/s12193-008-0009-x

An audio-driven dancing avatar

Original Paper
Published: 31 May 2008

Volume 2, pages 93–103, (2008)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Ferda Ofli¹,
Yasemin Demir¹,
Yücel Yemez¹,
Engin Erzin¹,
A. Murat Tekalp¹,
Koray Balcı²,
İdil Kızoğlu²,
Lale Akarun²,
Cristian Canton-Ferrer³,
Joëlle Tilmanne⁴,
Elif Bozkurt⁵ &
…
A. Tanju Erdem⁵

134 Accesses
11 Citations
Explore all metrics

Abstract

We present a framework for training and synthesis of an audio-driven dancing avatar. The avatar is trained for a given musical genre using the multicamera video recordings of a dance performance. The video is analyzed to capture the time-varying posture of the dancer’s body whereas the musical audio signal is processed to extract the beat information. We consider two different marker-based schemes for the motion capture problem. The first scheme uses 3D joint positions to represent the body motion whereas the second uses joint angles. Body movements of the dancer are characterized by a set of recurring semantic motion patterns, i.e., dance figures. Each dance figure is modeled in a supervised manner with a set of HMM (Hidden Markov Model) structures and the associated beat frequency. In the synthesis phase, an audio signal of unknown musical type is first classified, within a time interval, into one of the genres that have been learnt in the analysis phase, based on mel frequency cepstral coefficients (MFCC). The motion parameters of the corresponding dance figures are then synthesized via the trained HMM structures in synchrony with the audio signal based on the estimated tempo information. Finally, the generated motion parameters, either the joint angles or the 3D joint positions of the body, are animated along with the musical audio using two different animation tools that we have developed. Experimental results demonstrate the effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Chen T (2001) Audiovisual speech processing. IEEE Signal Process Mag 18(1):9–21
Article MATH Google Scholar
Bregler C, Covell M, Slaney M (1997) Video rewrite: driving visual speech with audio. In: SIGGRAPH ’97: Proceedings of the 24th annual conference on computer graphics and interactive techniques, New York, NY, USA. ACM Press/Addison-Wesley, New York, pp 353–360
Chapter Google Scholar
Brand M (1999) Voice puppetry. In: SIGGRAPH ’99: Proceedings of the 26th annual conference on computer graphics and interactive techniques, New York, NY, USA. ACM Press/Addison-Wesley, New York, pp 21–28
Chapter Google Scholar
Li Y, Shum H (2006) Learning dynamic audio-visual mapping with input-output hidden Markov models. IEEE Trans Multimedia 8(3):542–549
Article Google Scholar
Ofli F, Erzin E, Yemez Y, Tekalp AM (2007) Estimation and analysis of facial animation parameter patterns. In: IEEE International conference on image processing
Sargin ME, Erzin E, Yemez Y, Tekalp AM, Erdem AT, Erdem C, Ozkan M (2007) Prosody-driven head-gesture animation. IEEE Int Conf Acoustics Speech Signal Process 2:677–680
Google Scholar
Sargin ME, Aran O, Karpov A, Ofli F, Yasinnik Y, Wilson S, Erzin E, Yemez Y, Tekalp AM (2006) Combined gesture—speech analysis and speech driven gesture synthesis. In: IEEE international conference on multimedia and expo, pp 893–896
Bagci U, Erzin E (2007) Automatic classification of musical genres using inter-genre similarity. IEEE Signal Process Lett 14:521–524
Article Google Scholar
Ehara Y, Fujimoto H, Miyazaki S, Tanaka S, Yamamoto S (1995) Comparison of the performance of 3d camera systems. Gait Posture 3:166–169
Article Google Scholar
Ehara Y, Fujimoto H, Miyazaki S, Mochimaru M, Tanaka S, Yamamoto S (1997) Comparison of the performance of 3d camera systems II. Gait Posture 5:251–255
Article Google Scholar
Bregler C, Malik J (1998) Tracking people with twists and exponential maps. In: IEEE international conference on computer vision and pattern recognition
Deutscher J, Reid I (2005) Articulated body motion capture by stochastic search. Int J Comput Vis 61:185–205
Article Google Scholar
Canton-Ferrer C, Casas JR, Pardàs M (2005) Towards a Bayesian approach to robust finding correspondences in multiple view geometry environments. In: Lecture notes on computer science, vol 3515. Springer, Berlin, pp 281–289
Google Scholar
Arulampalam M, Maskell S, Gordon N, Clapp T (2002) A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans Signal Process 50(2):174–188
Article Google Scholar
Young S (1993) The htk hidden Markov model toolkit: design and philosophy. Technical Report TR. 153, Speech Group, Department of Engineering, Cambridge University (UK)
Alonso M, David B, Richard G (2004) Tempo and beat estimation of music signals. In: International conference on music information retrieval
Balci K, Not E, Zancanaro M, Pianesi F (2007) Xface open source project and smil-agent scripting language for creating and animating embodied conversational agents. In: MULTIMEDIA ’07: Proceedings of the 15th international conference on Multimedia, New York, NY, USA. ACM Press, New York, pp 1013–1016
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia, Vision and Graphics Laboratory, Koç University, İstanbul, Turkey
Ferda Ofli, Yasemin Demir, Yücel Yemez, Engin Erzin & A. Murat Tekalp
Multimedia Group, Boğaziçi University, İstanbul, Turkey
Koray Balcı, İdil Kızoğlu & Lale Akarun
Image and Video Processing Group, Technical University of Catalonia, Barcelona, Spain
Cristian Canton-Ferrer
TCTS Lab, Faculty of Engineering of Mons, Mons, Belgium
Joëlle Tilmanne
Momentum Digital Media Technologies, İstanbul, Turkey
Elif Bozkurt & A. Tanju Erdem

Authors

Ferda Ofli
View author publications
You can also search for this author in PubMed Google Scholar
Yasemin Demir
View author publications
You can also search for this author in PubMed Google Scholar
Yücel Yemez
View author publications
You can also search for this author in PubMed Google Scholar
Engin Erzin
View author publications
You can also search for this author in PubMed Google Scholar
A. Murat Tekalp
View author publications
You can also search for this author in PubMed Google Scholar
Koray Balcı
View author publications
You can also search for this author in PubMed Google Scholar
İdil Kızoğlu
View author publications
You can also search for this author in PubMed Google Scholar
Lale Akarun
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Canton-Ferrer
View author publications
You can also search for this author in PubMed Google Scholar
Joëlle Tilmanne
View author publications
You can also search for this author in PubMed Google Scholar
Elif Bozkurt
View author publications
You can also search for this author in PubMed Google Scholar
A. Tanju Erdem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ferda Ofli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ofli, F., Demir, Y., Yemez, Y. et al. An audio-driven dancing avatar. J Multimodal User Interfaces 2, 93–103 (2008). https://doi.org/10.1007/s12193-008-0009-x

Download citation

Received: 24 December 2007
Accepted: 05 March 2008
Published: 31 May 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s12193-008-0009-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An audio-driven dancing avatar

Abstract

Access this article

Similar content being viewed by others

MIDGET: Music Conditioned 3D Dance Generation

Beat Synchronous Dance Animation Based on Visual Analysis of Human Motion and Audio Analysis of Music Tempo

BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An audio-driven dancing avatar

Abstract

Access this article

Similar content being viewed by others

MIDGET: Music Conditioned 3D Dance Generation

Beat Synchronous Dance Animation Based on Visual Analysis of Human Motion and Audio Analysis of Music Tempo

BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation