skip to main content
10.1145/3297067.3297083acmotherconferencesArticle/Chapter ViewAbstractPublication PagesspmlConference Proceedingsconference-collections
research-article

Lip Reading using Simple Dynamic Features and a Novel ROI for Feature Extraction

Published: 28 November 2018 Publication History

Abstract

Deaf or hard-of-hearing people mostly rely on lip-reading to understand speech. They demonstrate the ability of humans to understand speech from visual cues only. Automatic lip reading systems work in a similar fashion - by obtaining speech or text from just the visual information, like a video of a person's face. In this paper, an automatic lip reading system for spoken digit recognition is presented. The system uses simple dynamic features by creating difference images between consecutive frames of the video input. Using this technique, word recognition rates of 83.79% and 65.58% are achieved in speaker-dependent and speaker-independent testing scenarios, respectively. A novel, extended region-of-interest (ROI) which includes lower jaw and neck region is also introduced. Most lip-reading algorithms use only the mouth/lip region for relevant feature extraction. Over simple mouth as the ROI, the proposed ROI improves the performance by 4% in speaker-dependent tests and by 11% in speaker-independent tests.

References

[1]
M. Cooke, J. Barker, S. Cunningham, and X. Shao. An audio-visual corpus for speech perception and automatic speech recognition. The Journal of the Acoustical Society of America, 120(5):2421--2424, 2006.
[2]
T. G. Dietterich and G. Bakiri. Error-correcting output codes: A general method for improving multiclass inductive learning programs. In AAAI, pages 572--577, 1991.
[3]
R. D. Easton and M. Basala. Perceptual dominance during lipreading. Perception & Psychophysics, 167--32(6):562--570, 1982.
[4]
M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf. Support vector machines. IEEE Intelligent Systems and their applications, 13(4):18--28, 1998.
[5]
A. Jain and G. Rathna. Visual speech recognition for isolated digits using discrete cosine transform and local binary pattern features. In IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 368--372. IEEE, 2017.
[6]
Y. Li, Y. Takashima, T. Takiguchi, and Y. Ariki. Lip reading using a dynamic feature of lip images and convolutional neural networks. In Computer and Information Science (ICIS), 2016 IEEE/ACIS 15th 175 International Conference on, pages 1--6. IEEE, 2016.
[7]
J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 689--696, 2011.
[8]
E. D. Petajan. Automatic lipreading to enhance speech recognition (speech reading). 1984.
[9]
F. Tao and C. Busso. Lipreading approach for isolated digits recognition under whisper and neutral speech. In Fifteenth Annual Conference of the International Speech Communication Association, 2014.
[10]
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 1, pages I-I. IEEE, 2001.
[11]
H. Yehia, P. Rubin, and E. Vatikiotis-Bateson. Quantitative association of vocal-tract and facial behavior. Speech Communication, 26(1--2):23--43, 1998.
[12]
G. Zhao, M. Barnard, and M. Pietikainen. Lipreading with local spatiotemporal descriptors. IEEE Transactions on Multimedia, 11(7):1254--1265, 2009.
[13]
Z. Zhou, G. Zhao, X. Hong, and M. Pietikäinen. A review of recent advances in visual speech decoding. 189 Image and vision computing, 32(9):590--605, 2014.

Cited By

View all
  • (2024)Silent Speech Interface Using Lip-Reading MethodsBiomedical Engineering Science and Technology10.1007/978-3-031-54547-4_2(9-23)Online publication date: 15-Mar-2024

Index Terms

  1. Lip Reading using Simple Dynamic Features and a Novel ROI for Feature Extraction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SPML '18: Proceedings of the 2018 International Conference on Signal Processing and Machine Learning
    November 2018
    177 pages
    ISBN:9781450366052
    DOI:10.1145/3297067
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 November 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Automatic lip reading
    2. feature extraction
    3. visual speech recognition

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SPML '18

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Silent Speech Interface Using Lip-Reading MethodsBiomedical Engineering Science and Technology10.1007/978-3-031-54547-4_2(9-23)Online publication date: 15-Mar-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media