Abstract
Real time classification algorithms are presented for visual mouth appearances (visemes) which correspond to phonemes and their speech contexts. They are used at the design of talking head application. Two feature extraction procedures were verified. The first one is based on the normalized triangle mesh covering mouth area and the color image texture vector indexed by barycentric coordinates. The second procedure performs Discrete Fourier Transform on the image rectangle including mouth w.r.t. a small block of DFT coefficients. The classifier has been designed by the optimized LDA method which uses two singular subspace approach. Despite of higher computational complexity (about three milliseconds per video frame on Pentium IV 3.2GHz), the DFT+LDA approach has practical advantages over MESH+LDA classifier. Firstly, it is better in recognition rate more than two percent (97.2% versus 99.3%). Secondly, the automatic identification of the covering mouth rectangle is more robust than the automatic identification of the covering mouth triangle mesh.
Keywords
- Linear Discriminant Analysis
- Recognition Rate
- Discrete Fourier Transform
- Triangle Mesh
- Singular Subspace
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bober, M., Kucharski, K., Skarbek, W.: Face recognition by fisher and scatter linear discriminant analysis. In: Petkov, N., Westenberg, M.A. (eds.) CAIP 2003. LNCS, vol. 2756, pp. 638–645. Springer, Heidelberg (2003)
Grocholewski, S.: CORPORA - Speech Database for Polish Diphones. In: 5th European Conference on Speech Communication and Technology EUROSPEECH 1997 Rhodes, Greece, September 22-25 (1997)
Fukunaga, K.: Introduction to statistical pattern recognition, 2nd edn. Academic Press, Boston (1990)
Golub, G., Van Loan, C.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Swets, D.L., Weng, J.: Using Discriminant Eigenfeatures for Image Retrieval. IEEE Trans. on PAMI 18(8), 831–837 (1996)
The Hidden Markov Model Toolkit (HTK), http://htk.eng.cam.ac.uk
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Leszczynski, M., Skarbek, W. (2005). Viseme Classification for Talking Head Application. In: Gagalowicz, A., Philips, W. (eds) Computer Analysis of Images and Patterns. CAIP 2005. Lecture Notes in Computer Science, vol 3691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11556121_95
Download citation
DOI: https://doi.org/10.1007/11556121_95
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28969-2
Online ISBN: 978-3-540-32011-1
eBook Packages: Computer ScienceComputer Science (R0)