Abstract
The following paper describes a novel lipreading procedure based on dynamic programming. We proposed a new method of outer lip contour extraction and representation. Lip shapes, corresponding to selected group of visems, are firstly extracted using dynamic programming and then approximated by B-splines. Coordinates of B-spline control points form final feature vector used for visem recognition task. The discontinuity of lip gradient image is addressed by dynamic programming technique. This has the advantage of global minimum detection and consequently optimal lip contour extraction. Experiments for Polish language utterances show that seven classes of visems can be recognized with 75% accuracy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2, 141–151 (2000)
Faraj, M.I., Bigun, J.: Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition. IEEE Transactions on Computers 56(9), 1169–1175 (2007)
Chibelushi, C.C., Deravi, F., Mason, J.S.D.: A Review of Speech-Based Bimodal Recognition. IEEE Transaction on Multimedia 4(1), 23–36 (2002)
Gurbuz, S., Tufekci, Z., Patterson, E., Gowdy, J.N.: Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition. In: 2001 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 177–180 (2001)
Potamianos, G., Neti, C.: Improved ROI and within frame discriminant features for lipreading. In: International Conference on Image Processing, vol. 3, pp. 250–253 (2002)
Hennecke, M.E., Stork, D.G., Prasad, K.V.: Visionary speech: Looking ahead to practical speechreading systems. In: Speechreading by Humans and Machines, pp. 331–349 (1996)
Adjoudani, A. Benoit, C.: On the integration of auditory and visual,parameters in an HMM-based ASR. In: Speechreading by Humans and Machines, pp. 461–471 (1996)
Rogozan, A., Deltglise, P., Alissali, M.: Adaptive determination of audio and visual weights for automatic speech recognition. In: Proc. Europ. Tut. Res. Work. Audio-Visual Speech Process, pp. 61–64 (1997)
Wang, S.L., Lau, W.H., Leung, S.H., Yan, H.: A real-time automatic lipreading system. In: Proc. 2004 Int. Symp. Circuits and Systems, vol. 2, pp. 101–104 (2004)
Perez, J.F.G., Frangi, A.F., Solano, E.L., Lukas, K.: Lip reading for robust speech recognition on embedded devices. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing, vol. I, pp. 473–476 (2005)
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Internaltional Journal of Computer Vision, 321–331 (1987)
Matthews, I., Potamianos, G., Neti, C., Luettin, J.: A comparison of model and transform-based visual features for audio-visual LVCSR. In: Proc. Int. Conf. Multimedia Expo. (2001)
Duchnowski, P., Hunke, M., Biisching, D., Meier, U., Waibel, A.: Toward movement-invariant automatic lip-reading and speech recognition. In: Proc. Int. Conf. Acoust. Speech Signal Process., vol. 1, pp. 109–112 (1995)
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proc. IEEE 91(9), 1306–1326 (2003)
Bregler, C., Konig, Y.: Eigenlips for robust speech recognition. In: Proc. IEEE Conf. Acoustics, Speech and Signal Processing, pp. 669–672 (1994)
Chiou, G.I., Hwang, J.-N.: Lipreading from color video. Trans. Image Processing 6, 1192–1195 (1997)
Viola, P., Jones, M.J.: Robust Real-Time Face Detection. Information Journal of Computer Vision 57(2), 137–154 (2004)
Nowak, H.: Lip-reading with discriminative deformable models. Machine Graphic and Vision International Journal 15, 567–575 (2006)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Sys., Man., Cyber. 9(1), 62–66 (1979)
Bellman, R.E., Dreyfus, S.E.: Applied dynamic programming. Princeton University Press (1971)
Lee, E.T.Y.: Comments on some B-spline algorithms. Computing 36(3), 229–238
Slot, K.: Biometric Recognition, pp. 101–103. WKL Press, Warszawa (2010)
Schapire, R.E.: The boosting approach to machine learning: An overview: Nonlinear Estimation and Classification. Springer, Heidelberg (2003)
Matthews, I., Bangham, J.A., Cox, S.: Audio-visual speech recognition using multiscale nonlinear image decomposition. In: Proc. Znt. Gonf. Speech Lang. Process., Philadelphia, pp. 38–41 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Owczarek, A., Ślot, K. (2012). Lipreading Procedure Based on Dynamic Programming. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2012. Lecture Notes in Computer Science(), vol 7267. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29347-4_65
Download citation
DOI: https://doi.org/10.1007/978-3-642-29347-4_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29346-7
Online ISBN: 978-3-642-29347-4
eBook Packages: Computer ScienceComputer Science (R0)