Skip to main content

Lipreading Procedure Based on Dynamic Programming

  • Conference paper
  • 2215 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7267))

Abstract

The following paper describes a novel lipreading procedure based on dynamic programming. We proposed a new method of outer lip contour extraction and representation. Lip shapes, corresponding to selected group of visems, are firstly extracted using dynamic programming and then approximated by B-splines. Coordinates of B-spline control points form final feature vector used for visem recognition task. The discontinuity of lip gradient image is addressed by dynamic programming technique. This has the advantage of global minimum detection and consequently optimal lip contour extraction. Experiments for Polish language utterances show that seven classes of visems can be recognized with 75% accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2, 141–151 (2000)

    Article  Google Scholar 

  2. Faraj, M.I., Bigun, J.: Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition. IEEE Transactions on Computers 56(9), 1169–1175 (2007)

    Article  MathSciNet  Google Scholar 

  3. Chibelushi, C.C., Deravi, F., Mason, J.S.D.: A Review of Speech-Based Bimodal Recognition. IEEE Transaction on Multimedia 4(1), 23–36 (2002)

    Article  Google Scholar 

  4. Gurbuz, S., Tufekci, Z., Patterson, E., Gowdy, J.N.: Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition. In: 2001 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 177–180 (2001)

    Google Scholar 

  5. Potamianos, G., Neti, C.: Improved ROI and within frame discriminant features for lipreading. In: International Conference on Image Processing, vol. 3, pp. 250–253 (2002)

    Google Scholar 

  6. Hennecke, M.E., Stork, D.G., Prasad, K.V.: Visionary speech: Looking ahead to practical speechreading systems. In: Speechreading by Humans and Machines, pp. 331–349 (1996)

    Google Scholar 

  7. Adjoudani, A. Benoit, C.: On the integration of auditory and visual,parameters in an HMM-based ASR. In: Speechreading by Humans and Machines, pp. 461–471 (1996)

    Google Scholar 

  8. Rogozan, A., Deltglise, P., Alissali, M.: Adaptive determination of audio and visual weights for automatic speech recognition. In: Proc. Europ. Tut. Res. Work. Audio-Visual Speech Process, pp. 61–64 (1997)

    Google Scholar 

  9. Wang, S.L., Lau, W.H., Leung, S.H., Yan, H.: A real-time automatic lipreading system. In: Proc. 2004 Int. Symp. Circuits and Systems, vol. 2, pp. 101–104 (2004)

    Google Scholar 

  10. Perez, J.F.G., Frangi, A.F., Solano, E.L., Lukas, K.: Lip reading for robust speech recognition on embedded devices. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing, vol. I, pp. 473–476 (2005)

    Google Scholar 

  11. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Internaltional Journal of Computer Vision, 321–331 (1987)

    Google Scholar 

  12. Matthews, I., Potamianos, G., Neti, C., Luettin, J.: A comparison of model and transform-based visual features for audio-visual LVCSR. In: Proc. Int. Conf. Multimedia Expo. (2001)

    Google Scholar 

  13. Duchnowski, P., Hunke, M., Biisching, D., Meier, U., Waibel, A.: Toward movement-invariant automatic lip-reading and speech recognition. In: Proc. Int. Conf. Acoust. Speech Signal Process., vol. 1, pp. 109–112 (1995)

    Google Scholar 

  14. Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proc. IEEE 91(9), 1306–1326 (2003)

    Article  Google Scholar 

  15. Bregler, C., Konig, Y.: Eigenlips for robust speech recognition. In: Proc. IEEE Conf. Acoustics, Speech and Signal Processing, pp. 669–672 (1994)

    Google Scholar 

  16. Chiou, G.I., Hwang, J.-N.: Lipreading from color video. Trans. Image Processing 6, 1192–1195 (1997)

    Article  Google Scholar 

  17. Viola, P., Jones, M.J.: Robust Real-Time Face Detection. Information Journal of Computer Vision 57(2), 137–154 (2004)

    Article  Google Scholar 

  18. Nowak, H.: Lip-reading with discriminative deformable models. Machine Graphic and Vision International Journal 15, 567–575 (2006)

    Google Scholar 

  19. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Sys., Man., Cyber. 9(1), 62–66 (1979)

    Article  MathSciNet  Google Scholar 

  20. Bellman, R.E., Dreyfus, S.E.: Applied dynamic programming. Princeton University Press (1971)

    Google Scholar 

  21. Lee, E.T.Y.: Comments on some B-spline algorithms. Computing 36(3), 229–238

    Google Scholar 

  22. Slot, K.: Biometric Recognition, pp. 101–103. WKL Press, Warszawa (2010)

    Google Scholar 

  23. Schapire, R.E.: The boosting approach to machine learning: An overview: Nonlinear Estimation and Classification. Springer, Heidelberg (2003)

    Google Scholar 

  24. Matthews, I., Bangham, J.A., Cox, S.: Audio-visual speech recognition using multiscale nonlinear image decomposition. In: Proc. Znt. Gonf. Speech Lang. Process., Philadelphia, pp. 38–41 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Owczarek, A., Ślot, K. (2012). Lipreading Procedure Based on Dynamic Programming. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2012. Lecture Notes in Computer Science(), vol 7267. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29347-4_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29347-4_65

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29346-7

  • Online ISBN: 978-3-642-29347-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics