Skip to main content

A New Visual Speech Recognition Approach for RGB-D Cameras

  • Conference paper
  • First Online:
Image Analysis and Recognition (ICIAR 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8815))

Included in the following conference series:

Abstract

Visual speech recognition remains a challenging topic due to various speaking characteristics. This paper proposes a new approach for lipreading to recognize isolated speech segments (words, digits, phrases, etc.) using both of 2D image and depth data. The process of the proposed system is divided into three consecutive steps, namely, mouth region tracking and extraction, motion and appearance descriptors (HOG and MBH) computing, and classification using the Support Vector Machine (SVM) method. To evaluate the proposed approach, three public databases (MIRALC, Ouluvs, and CUAVE) were used. Speaker dependent and speaker independent settings were considered in the evaluation experiments. The obtained recognition results demonstrate that lipreading can be performed effectively, and the proposed approach outperforms recent works in the literature for the speaker dependent setting while being competitive for the speaker independent setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bakry, A., Elgammal, A.: Mkpls: Manifold kernel partial least squares for lipreading and speaker identification. In: CVPR, pp. 684–691. IEEE (2013)

    Google Scholar 

  2. Ben-Hamadou, A., Soussen, C., Daul, C., Blondel, W., Wolf, D.: Flexible projector calibration for active stereoscopic systems. In: 2010 IEEE International Conference on Image Processing, pp. 4241–4244 (September 2010)

    Google Scholar 

  3. Ben-Hamadou, A., Soussen, C., Daul, C., Blondel, W., Wolf, D.: Flexible calibration of structured-light systems projecting point patterns. Computer Vision and Image Understanding 117(10), 1468–1481 (2013)

    Article  Google Scholar 

  4. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)

    Google Scholar 

  5. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Huang, D., Shan, C., Ardabilian, M., Wang, Y., Chen, L.: Local binary patterns and its application to facial image analysis: a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 41(6), 765–781 (2011)

    Article  Google Scholar 

  7. Nanni, L., Lumini, A., Brahnam, S.: Survey on lbp based texture descriptors for image classification. Expert Syst. Appl. 39(3), 3634–3641 (2012)

    Article  Google Scholar 

  8. Patterson, E.K., Gurbuz, S., Tufekci, Z., Gowdy, J.: Cuave: A new audio-visual database for multimodal human-computer interface research. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. II-2017-II-2020. IEEE (2002)

    Google Scholar 

  9. Pei, Y., Kim, T.K., Zha, H.: Unsupervised random forest manifold alignment for lipreading. In: ICCV, pp. 129–136 (2013)

    Google Scholar 

  10. Rekik, A., Ben-Hamadou, A., Mahdi, W.: Face pose tracking under arbitrary illumination changes. In: VISAPP (2014)

    Google Scholar 

  11. Shaikh, A.A., Kumar, D.K., Yau, W.C., Che Azemin, M., Gubbi, J.: Lip reading using optical flow and support vector machines. In: 2010 3rd International Congress on Image and Signal Processing (CISP), vol. 1, pp. 327–330. IEEE (2010)

    Google Scholar 

  12. Shin, J., Lee, J., Kim, D.: Real-time lip reading system for isolated korean word recognition. Pattern Recognition 44(3), 559–571 (2011)

    Article  MATH  Google Scholar 

  13. Vapnik, V.: The nature of statistical learning theory. Springer (2000)

    Google Scholar 

  14. Yargic, A., Dogan, M.: A lip reading application on ms kinect camera. In: 2013 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 1–5. IEEE (2013)

    Google Scholar 

  15. Zhao, G., Barnard, M., Pietikainen, M.: Lipreading with local spatiotemporal descriptors. IEEE Transactions on Multimedia 11(7), 1254–1265 (2009)

    Article  Google Scholar 

  16. Zhou, Z., Zhao, G., Pietikainen, M.: Towards a practical lipreading system. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 137–144. IEEE (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Rekik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Rekik, A., Ben-Hamadou, A., Mahdi, W. (2014). A New Visual Speech Recognition Approach for RGB-D Cameras. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2014. Lecture Notes in Computer Science(), vol 8815. Springer, Cham. https://doi.org/10.1007/978-3-319-11755-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11755-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11754-6

  • Online ISBN: 978-3-319-11755-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics