Kernel-Based Lip Shape Clustering with Phoneme Recognition for Real-Time Voice Driven Talking Face

Shih, Po-Yi; Wang, Jhing-Fa; Chen, Zong-You

doi:10.1007/978-3-642-13318-3_64

Po-Yi Shih¹⁸,
Jhing-Fa Wang¹⁸ &
Zong-You Chen¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6064))

Included in the following conference series:

International Symposium on Neural Networks

1774 Accesses

Abstract

This work describes a real-time voice driven method using which a speaker’s lip shape is synchronized with the corresponding speech signal, for a low bandwidth mobile devices. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the kernel-based lip shape clustering algorithm is inspired based on one-class support vector machines (SVM). A set of speaker who has similar lip shape is clustered and a cluster-dependent vowel phoneme is then constructed for each cluster. We use sum of absolute difference (SAD) as vowel lip shape likelihood to cluster into categories. Then adjust the source and destination pictures of lip shape in the transparent level using alpha blending for lip-sync animation. We find that this method outperforms conventional CHMM method in phoneme error rate (PER), 8.78% and 32.25%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Automatic lipreading based on optimized OLSDA and HMM

Article 01 March 2022

Speaker Identification System Based on Lip-Motion Feature

Lip-Reading: Toward Phoneme Recognition Through Lip Kinematics

References

Lin, I.C., Hung, C.S., Yang, T.J., Ouhyoung, M.: A Speech Driven Talking Head System Based on a Single Face Image. In: Proc. Pacific Graphics 1999, Seoul, Korea, October 1999, pp. 43–49 (1999) IEEE ISBN 0-7695-0293-8
Google Scholar
Ostermann, J., Weissenfeld, A.: Talking faces-technologies and applications. In: Proc. of ICPR 2004, August 2004, vol. 3, pp. 826–833 (2004)
Google Scholar
Tamura, M.: Visual speech synthesis based on parameter generation from HMM: Speech driven and text-and-speech driven approaches. In: Proc. AVSP 1998, pp. 221–226 (1998)
Google Scholar
Zoric, G., Pandzic, I.S.: Automatic lip sync. and its use in the new multimedia services for mobile devices. In: Proc. 8th Int. Conf. Telecommunications, vol. 2, pp. 353–358 (2005)
Google Scholar
Xie, L., Liu, Z.: Realistic mouth-synching for speech-driven talking face using articulatory modeling. IEEE Trans. Multimedia 9(3), 500–510 (2007)
Article Google Scholar
Park, J., Ko, H.: Real-Time Continuous Phoneme Recognition System Using Class-Dependent Tied-Mixture HMM With HBT Structure for Speech-Driven Lip-Sync. IEEE Transaction on Multimedia 10(7) (November 2008)
Google Scholar
Sun, N., Suigetsu, K., Ayabe, T.: An Approach to Speech Driven Animation. In: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, August 15-17 (2008)
Google Scholar
Camastra, F., Verri, A.: A novel kernel method for clustering. IEEE Trans. PAMI 27(5), 801–805 (2005)
Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20(11-13), 1191–1199 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Cheng Kung University, No. 1 University Road, Tainan City, Taiwan
Po-Yi Shih, Jhing-Fa Wang & Zong-You Chen

Authors

Po-Yi Shih
View author publications
You can also search for this author in PubMed Google Scholar
Jhing-Fa Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zong-You Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800, Dongchuan Road, 200240, Shanghai, China
Liqing Zhang & Bao-Liang Lu &
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water bay, Kowloon, Hong Kong, China
James Kwok

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shih, PY., Wang, JF., Chen, ZY. (2010). Kernel-Based Lip Shape Clustering with Phoneme Recognition for Real-Time Voice Driven Talking Face. In: Zhang, L., Lu, BL., Kwok, J. (eds) Advances in Neural Networks - ISNN 2010. ISNN 2010. Lecture Notes in Computer Science, vol 6064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13318-3_64

Download citation

DOI: https://doi.org/10.1007/978-3-642-13318-3_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13317-6
Online ISBN: 978-3-642-13318-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics