Abstract
This work describes a real-time voice driven method using which a speaker’s lip shape is synchronized with the corresponding speech signal, for a low bandwidth mobile devices. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the kernel-based lip shape clustering algorithm is inspired based on one-class support vector machines (SVM). A set of speaker who has similar lip shape is clustered and a cluster-dependent vowel phoneme is then constructed for each cluster. We use sum of absolute difference (SAD) as vowel lip shape likelihood to cluster into categories. Then adjust the source and destination pictures of lip shape in the transparent level using alpha blending for lip-sync animation. We find that this method outperforms conventional CHMM method in phoneme error rate (PER), 8.78% and 32.25%, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lin, I.C., Hung, C.S., Yang, T.J., Ouhyoung, M.: A Speech Driven Talking Head System Based on a Single Face Image. In: Proc. Pacific Graphics 1999, Seoul, Korea, October 1999, pp. 43–49 (1999) IEEE ISBN 0-7695-0293-8
Ostermann, J., Weissenfeld, A.: Talking faces-technologies and applications. In: Proc. of ICPR 2004, August 2004, vol. 3, pp. 826–833 (2004)
Tamura, M.: Visual speech synthesis based on parameter generation from HMM: Speech driven and text-and-speech driven approaches. In: Proc. AVSP 1998, pp. 221–226 (1998)
Zoric, G., Pandzic, I.S.: Automatic lip sync. and its use in the new multimedia services for mobile devices. In: Proc. 8th Int. Conf. Telecommunications, vol. 2, pp. 353–358 (2005)
Xie, L., Liu, Z.: Realistic mouth-synching for speech-driven talking face using articulatory modeling. IEEE Trans. Multimedia 9(3), 500–510 (2007)
Park, J., Ko, H.: Real-Time Continuous Phoneme Recognition System Using Class-Dependent Tied-Mixture HMM With HBT Structure for Speech-Driven Lip-Sync. IEEE Transaction on Multimedia 10(7) (November 2008)
Sun, N., Suigetsu, K., Ayabe, T.: An Approach to Speech Driven Animation. In: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, August 15-17 (2008)
Camastra, F., Verri, A.: A novel kernel method for clustering. IEEE Trans. PAMI 27(5), 801–805 (2005)
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20(11-13), 1191–1199 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shih, PY., Wang, JF., Chen, ZY. (2010). Kernel-Based Lip Shape Clustering with Phoneme Recognition for Real-Time Voice Driven Talking Face. In: Zhang, L., Lu, BL., Kwok, J. (eds) Advances in Neural Networks - ISNN 2010. ISNN 2010. Lecture Notes in Computer Science, vol 6064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13318-3_64
Download citation
DOI: https://doi.org/10.1007/978-3-642-13318-3_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13317-6
Online ISBN: 978-3-642-13318-3
eBook Packages: Computer ScienceComputer Science (R0)