Skip to main content

Kernel-Based Lip Shape Clustering with Phoneme Recognition for Real-Time Voice Driven Talking Face

  • Conference paper
Advances in Neural Networks - ISNN 2010 (ISNN 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6064))

Included in the following conference series:

  • 1774 Accesses

Abstract

This work describes a real-time voice driven method using which a speaker’s lip shape is synchronized with the corresponding speech signal, for a low bandwidth mobile devices. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the kernel-based lip shape clustering algorithm is inspired based on one-class support vector machines (SVM). A set of speaker who has similar lip shape is clustered and a cluster-dependent vowel phoneme is then constructed for each cluster. We use sum of absolute difference (SAD) as vowel lip shape likelihood to cluster into categories. Then adjust the source and destination pictures of lip shape in the transparent level using alpha blending for lip-sync animation. We find that this method outperforms conventional CHMM method in phoneme error rate (PER), 8.78% and 32.25%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lin, I.C., Hung, C.S., Yang, T.J., Ouhyoung, M.: A Speech Driven Talking Head System Based on a Single Face Image. In: Proc. Pacific Graphics 1999, Seoul, Korea, October 1999, pp. 43–49 (1999) IEEE ISBN 0-7695-0293-8

    Google Scholar 

  2. Ostermann, J., Weissenfeld, A.: Talking faces-technologies and applications. In: Proc. of ICPR 2004, August 2004, vol. 3, pp. 826–833 (2004)

    Google Scholar 

  3. Tamura, M.: Visual speech synthesis based on parameter generation from HMM: Speech driven and text-and-speech driven approaches. In: Proc. AVSP 1998, pp. 221–226 (1998)

    Google Scholar 

  4. Zoric, G., Pandzic, I.S.: Automatic lip sync. and its use in the new multimedia services for mobile devices. In: Proc. 8th Int. Conf. Telecommunications, vol. 2, pp. 353–358 (2005)

    Google Scholar 

  5. Xie, L., Liu, Z.: Realistic mouth-synching for speech-driven talking face using articulatory modeling. IEEE Trans. Multimedia 9(3), 500–510 (2007)

    Article  Google Scholar 

  6. Park, J., Ko, H.: Real-Time Continuous Phoneme Recognition System Using Class-Dependent Tied-Mixture HMM With HBT Structure for Speech-Driven Lip-Sync. IEEE Transaction on Multimedia 10(7) (November 2008)

    Google Scholar 

  7. Sun, N., Suigetsu, K., Ayabe, T.: An Approach to Speech Driven Animation. In: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, August 15-17 (2008)

    Google Scholar 

  8. Camastra, F., Verri, A.: A novel kernel method for clustering. IEEE Trans. PAMI 27(5), 801–805 (2005)

    Google Scholar 

  9. Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20(11-13), 1191–1199 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shih, PY., Wang, JF., Chen, ZY. (2010). Kernel-Based Lip Shape Clustering with Phoneme Recognition for Real-Time Voice Driven Talking Face. In: Zhang, L., Lu, BL., Kwok, J. (eds) Advances in Neural Networks - ISNN 2010. ISNN 2010. Lecture Notes in Computer Science, vol 6064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13318-3_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13318-3_64

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13317-6

  • Online ISBN: 978-3-642-13318-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics