Abstract
Speech-reading is an invaluable technique for people with hearing loss or those in adverse listening conditions (e.g., in a noisy restaurant, near children playing loudly). However, speech-reading is often difficult because identical mouth shapes (visemes) can produce several speech sounds (phonemes); there is a one-to-many mapping from visemes to phonemes. This decreases comprehension, causing confusion and frustration during conversation. My doctoral research aims to design and evaluate a visualisation technique that displays textual representations of a speaker's phonemes to a speech-reader. By combining my visualisation with their pre-existing speech-reading ability, speech-readers should be able to disambiguate confusing viseme-to-phoneme mappings without shifting their focus from the speaker's face. This will result in an improved level of comprehension, supporting natural conversation.
- R. Campbell, B. Dodd and D. K. Burnham, Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech, vol. 2, Psychology Press, 1998.Google Scholar
- E. Goldstein, Sensation and perception, Cengage Learning, 2013.Google Scholar
- J. E. Shoup, "Phonological aspects of speech recognition," Trends in speech recognition, pp. 125--138, 1980.Google Scholar
- P. Lucey, T. Martin and S. Sridharan, "Confusability of phonemes grouped according to their viseme classes in noisy environments," in Proc. of Australian Int. Conf. on Speech Science & Tech, 2004.Google Scholar
- L. Ringham, "Not Just Lip Service," Action On Hearing Loss, {Online}. Available: http://www.actiononhearingloss.org.uk/notjustlipservice.aspx.Google Scholar
- R. O. Cornett, "Cued speech," Am. Ann. Deaf., vol. 112, no. 1, pp. 3--13, 1967.Google Scholar
- B. G. Greene, D. B. Pisoni and T. D. Carrell, "Recognition of speech spectrograms," JASA, vol. 76, no. 1, pp. 32--43, 1984.Google ScholarCross Ref
- J. Hailpern, K. Karahalios, L. DeThorne and J. Halle, "Vocsyl: Visualizing syllable production for children with ASD and speech delays," in Proc. ASSETS '10, 2010. Google ScholarDigital Library
- M. Pietrowicz and K. Karahalios, "Sonic shapes: Visualizing vocal expression," in ICAD 2013, 2013.Google Scholar
- C. J. Jensema, R. S. Danturthi and R. Burch, "Time spent viewing captions on television programs," Am. Ann. Deaf., vol. 145, no. 5, pp. 464--468, 2000.Google ScholarCross Ref
- L. Xie, Y. Wang and Z.-Q. Liu, "Lip Assistant: Visualize Speech for Hearing Impaired People in Multimedia Services, in Proc. SMC'06, 2006.Google Scholar
- D. W. Massaro, M. M. Cohen, W. Schwartz, S. Vanderhyden and H. Meyer, "Facilitating Speech Understanding for Hearing-Challenged Perceivers in Face-to-Face Conversation and Spoken Presentations," ICTHP, 2013.Google Scholar
- P. Lucey, T. Martin and S. Sridharan, "Proc. of Australian Int. Conf. on Speech Science & Tech," in Confusability of phonemes grouped according to their viseme classes in noisy environments, 2004.Google Scholar
- L. E. Bernstein, P. E. Tucker and M. E. Demorest, "Speech perception without hearing," Perception & Psychophysics, vol. 62, no. 2, pp. 233--252, 2000.Google ScholarCross Ref
- E. T. Auer and L. E. Bernstein, "Enhanced visual speech perception in individuals with early-onset hearing impairment," J Speech Lang Hear Res, vol. 50, no. 5, pp. 1157--1165, 2007.Google ScholarCross Ref
- L. Armstrong, "On everybody's lips," Scottish Lipreading Strategy Group, 2015. {Online}. Available: http://www.scotlipreading.org.uk/files/1914/2686/1587/On_everybodys_lips_-_report.pdf.Google Scholar
- N. A. Altieri, D. B. Pisoni and J. T. Townsend, "Some normative data on lip-reading skills (L)," JASA, vol. 130, pp. 1--4, 2011.Google ScholarCross Ref
- C. R. Berger and R. J. Calabrese, "Some explorations in initial interaction and beyond: Toward a developmental theory of interpersonal communication," in HUM. COMMUN. RES.Google Scholar
- C. R. Lansing and G. W. McConkie, "Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences," Percept. Psychophys., vol. 65, pp. 536--552, 2003.Google ScholarCross Ref
Recommendations
Speech confusion index (Φ): A confusion-based speech quality indicator and recognition rate prediction for dysarthria
This paper presents an automated method to help us assess the speech quality of a dysarthric speaker, in place of laborious and subjective manual methods. The assessment result can be used as a good indicator for predicting the accuracy of speech ...
Speech confusion index (Ø): a recognition rate indicator for dysarthric speakers
FinTAL'06: Proceedings of the 5th international conference on Advances in Natural Language ProcessingThis paper presents an automated method to help us assess speech quality of a dysarthric speaker, instead of traditional manual methods that are laborious and subjective. The assessment result can also be a good indicator for predicting the accuracy of ...
Comments