skip to main content
research-article

Reducing viseme confusion in speech-reading

Published:16 March 2016Publication History
Skip Abstract Section

Abstract

Speech-reading is an invaluable technique for people with hearing loss or those in adverse listening conditions (e.g., in a noisy restaurant, near children playing loudly). However, speech-reading is often difficult because identical mouth shapes (visemes) can produce several speech sounds (phonemes); there is a one-to-many mapping from visemes to phonemes. This decreases comprehension, causing confusion and frustration during conversation. My doctoral research aims to design and evaluate a visualisation technique that displays textual representations of a speaker's phonemes to a speech-reader. By combining my visualisation with their pre-existing speech-reading ability, speech-readers should be able to disambiguate confusing viseme-to-phoneme mappings without shifting their focus from the speaker's face. This will result in an improved level of comprehension, supporting natural conversation.

References

  1. R. Campbell, B. Dodd and D. K. Burnham, Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech, vol. 2, Psychology Press, 1998.Google ScholarGoogle Scholar
  2. E. Goldstein, Sensation and perception, Cengage Learning, 2013.Google ScholarGoogle Scholar
  3. J. E. Shoup, "Phonological aspects of speech recognition," Trends in speech recognition, pp. 125--138, 1980.Google ScholarGoogle Scholar
  4. P. Lucey, T. Martin and S. Sridharan, "Confusability of phonemes grouped according to their viseme classes in noisy environments," in Proc. of Australian Int. Conf. on Speech Science & Tech, 2004.Google ScholarGoogle Scholar
  5. L. Ringham, "Not Just Lip Service," Action On Hearing Loss, {Online}. Available: http://www.actiononhearingloss.org.uk/notjustlipservice.aspx.Google ScholarGoogle Scholar
  6. R. O. Cornett, "Cued speech," Am. Ann. Deaf., vol. 112, no. 1, pp. 3--13, 1967.Google ScholarGoogle Scholar
  7. B. G. Greene, D. B. Pisoni and T. D. Carrell, "Recognition of speech spectrograms," JASA, vol. 76, no. 1, pp. 32--43, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Hailpern, K. Karahalios, L. DeThorne and J. Halle, "Vocsyl: Visualizing syllable production for children with ASD and speech delays," in Proc. ASSETS '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Pietrowicz and K. Karahalios, "Sonic shapes: Visualizing vocal expression," in ICAD 2013, 2013.Google ScholarGoogle Scholar
  10. C. J. Jensema, R. S. Danturthi and R. Burch, "Time spent viewing captions on television programs," Am. Ann. Deaf., vol. 145, no. 5, pp. 464--468, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  11. L. Xie, Y. Wang and Z.-Q. Liu, "Lip Assistant: Visualize Speech for Hearing Impaired People in Multimedia Services, in Proc. SMC'06, 2006.Google ScholarGoogle Scholar
  12. D. W. Massaro, M. M. Cohen, W. Schwartz, S. Vanderhyden and H. Meyer, "Facilitating Speech Understanding for Hearing-Challenged Perceivers in Face-to-Face Conversation and Spoken Presentations," ICTHP, 2013.Google ScholarGoogle Scholar
  13. P. Lucey, T. Martin and S. Sridharan, "Proc. of Australian Int. Conf. on Speech Science & Tech," in Confusability of phonemes grouped according to their viseme classes in noisy environments, 2004.Google ScholarGoogle Scholar
  14. L. E. Bernstein, P. E. Tucker and M. E. Demorest, "Speech perception without hearing," Perception & Psychophysics, vol. 62, no. 2, pp. 233--252, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  15. E. T. Auer and L. E. Bernstein, "Enhanced visual speech perception in individuals with early-onset hearing impairment," J Speech Lang Hear Res, vol. 50, no. 5, pp. 1157--1165, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  16. L. Armstrong, "On everybody's lips," Scottish Lipreading Strategy Group, 2015. {Online}. Available: http://www.scotlipreading.org.uk/files/1914/2686/1587/On_everybodys_lips_-_report.pdf.Google ScholarGoogle Scholar
  17. N. A. Altieri, D. B. Pisoni and J. T. Townsend, "Some normative data on lip-reading skills (L)," JASA, vol. 130, pp. 1--4, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  18. C. R. Berger and R. J. Calabrese, "Some explorations in initial interaction and beyond: Toward a developmental theory of interpersonal communication," in HUM. COMMUN. RES.Google ScholarGoogle Scholar
  19. C. R. Lansing and G. W. McConkie, "Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences," Percept. Psychophys., vol. 65, pp. 536--552, 2003.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGACCESS Accessibility and Computing
    ACM SIGACCESS Accessibility and Computing Just Accepted
    January 2016
    56 pages
    ISSN:1558-2337
    EISSN:1558-1187
    DOI:10.1145/2904092
    Issue’s Table of Contents

    Copyright © 2016 Copyright is held by the owner/author(s)

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 16 March 2016

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader