research-article

Reducing viseme confusion in speech-reading

Author:
Benjamin M. Gorman

University of Dundee, Dundee, Scotland

University of Dundee, Dundee, Scotland
View Profile

Authors Info & Claims

ACM SIGACCESS Accessibility and Computing Issue 114January 2016pp 36–43https://doi.org/10.1145/2904092.2904100

Published:16 March 2016Publication History

ACM SIGACCESS Accessibility and Computing

Abstract

Speech-reading is an invaluable technique for people with hearing loss or those in adverse listening conditions (e.g., in a noisy restaurant, near children playing loudly). However, speech-reading is often difficult because identical mouth shapes (visemes) can produce several speech sounds (phonemes); there is a one-to-many mapping from visemes to phonemes. This decreases comprehension, causing confusion and frustration during conversation. My doctoral research aims to design and evaluate a visualisation technique that displays textual representations of a speaker's phonemes to a speech-reader. By combining my visualisation with their pre-existing speech-reading ability, speech-readers should be able to disambiguate confusing viseme-to-phoneme mappings without shifting their focus from the speaker's face. This will result in an improved level of comprehension, supporting natural conversation.

References

R. Campbell, B. Dodd and D. K. Burnham, Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech, vol. 2, Psychology Press, 1998.Google Scholar
E. Goldstein, Sensation and perception, Cengage Learning, 2013.Google Scholar
J. E. Shoup, "Phonological aspects of speech recognition," Trends in speech recognition, pp. 125--138, 1980.Google Scholar
P. Lucey, T. Martin and S. Sridharan, "Confusability of phonemes grouped according to their viseme classes in noisy environments," in Proc. of Australian Int. Conf. on Speech Science & Tech, 2004.Google Scholar
L. Ringham, "Not Just Lip Service," Action On Hearing Loss, {Online}. Available: http://www.actiononhearingloss.org.uk/notjustlipservice.aspx.Google Scholar
R. O. Cornett, "Cued speech," Am. Ann. Deaf., vol. 112, no. 1, pp. 3--13, 1967.Google Scholar
B. G. Greene, D. B. Pisoni and T. D. Carrell, "Recognition of speech spectrograms," JASA, vol. 76, no. 1, pp. 32--43, 1984.Google ScholarCross Ref
J. Hailpern, K. Karahalios, L. DeThorne and J. Halle, "Vocsyl: Visualizing syllable production for children with ASD and speech delays," in Proc. ASSETS '10, 2010. Google ScholarDigital Library
M. Pietrowicz and K. Karahalios, "Sonic shapes: Visualizing vocal expression," in ICAD 2013, 2013.Google Scholar
C. J. Jensema, R. S. Danturthi and R. Burch, "Time spent viewing captions on television programs," Am. Ann. Deaf., vol. 145, no. 5, pp. 464--468, 2000.Google ScholarCross Ref
L. Xie, Y. Wang and Z.-Q. Liu, "Lip Assistant: Visualize Speech for Hearing Impaired People in Multimedia Services, in Proc. SMC'06, 2006.Google Scholar
D. W. Massaro, M. M. Cohen, W. Schwartz, S. Vanderhyden and H. Meyer, "Facilitating Speech Understanding for Hearing-Challenged Perceivers in Face-to-Face Conversation and Spoken Presentations," ICTHP, 2013.Google Scholar
P. Lucey, T. Martin and S. Sridharan, "Proc. of Australian Int. Conf. on Speech Science & Tech," in Confusability of phonemes grouped according to their viseme classes in noisy environments, 2004.Google Scholar
L. E. Bernstein, P. E. Tucker and M. E. Demorest, "Speech perception without hearing," Perception & Psychophysics, vol. 62, no. 2, pp. 233--252, 2000.Google ScholarCross Ref
E. T. Auer and L. E. Bernstein, "Enhanced visual speech perception in individuals with early-onset hearing impairment," J Speech Lang Hear Res, vol. 50, no. 5, pp. 1157--1165, 2007.Google ScholarCross Ref
L. Armstrong, "On everybody's lips," Scottish Lipreading Strategy Group, 2015. {Online}. Available: http://www.scotlipreading.org.uk/files/1914/2686/1587/On_everybodys_lips_-_report.pdf.Google Scholar
N. A. Altieri, D. B. Pisoni and J. T. Townsend, "Some normative data on lip-reading skills (L)," JASA, vol. 130, pp. 1--4, 2011.Google ScholarCross Ref
C. R. Berger and R. J. Calabrese, "Some explorations in initial interaction and beyond: Toward a developmental theory of interpersonal communication," in HUM. COMMUN. RES.Google Scholar
C. R. Lansing and G. W. McConkie, "Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences," Percept. Psychophys., vol. 65, pp. 536--552, 2003.Google ScholarCross Ref

Recommendations

Speech confusion index (Φ): A confusion-based speech quality indicator and recognition rate prediction for dysarthria

This paper presents an automated method to help us assess the speech quality of a dysarthric speaker, in place of laborious and subjective manual methods. The assessment result can be used as a good indicator for predicting the accuracy of speech ...
Read More
Automatic lipreading to enhance speech recognition (speech reading)
Read More
Speech confusion index (Ø): a recognition rate indicator for dysarthric speakers
FinTAL'06: Proceedings of the 5th international conference on Advances in Natural Language Processing

This paper presents an automated method to help us assess speech quality of a dysarthric speaker, instead of traditional manual methods that are laborious and subjective. The assessment result can also be a good indicator for predicting the accuracy of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGACCESS Accessibility and Computing Just Accepted
January 2016
56 pages
ISSN:1558-2337
EISSN:1558-1187
DOI:10.1145/2904092
Editor:
Hugo Nicolau
Issue’s Table of Contents
Copyright © 2016 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 March 2016
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 113
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reducing viseme confusion in speech-reading

ACM SIGACCESS Accessibility and Computing

Abstract

References

Cited By

Recommendations

Speech confusion index (Φ): A confusion-based speech quality indicator and recognition rate prediction for dysarthria

Automatic lipreading to enhance speech recognition (speech reading)

Speech confusion index (Ø): a recognition rate indicator for dysarthric speakers