Abstract
Evidence from the study of human language understanding is presented suggesting that our ability to perceive visible speech can greatly influence our ability to understand and remember spoken language. A view of the speaker's face can greatly aid in the perception of ambiguous or noisy speech and can aid cognitive processing of speech leading to better understanding and recall. Some of these effects have been replicated using computer synthesized visual and auditory speech. Thus, it appears that when giving an interface a voice, it may be best to give it a face too.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Baddeley, A. D. (1986).The Psychology of Memory Basic Books: New York.
Breeuwer, M. & Plomp, R. (1984). Speechreading Supplemented with Frequency-Selective Sound-Pressure Information.Journal of the Acoustical Society of America 76: 686–691.
Brunswik, E. (1955). Representative Design and Probabilistic Theory in a Functional Psychology.Psychological Review 62: 193–217.
Chi, M. T. H., Feltovich, P. J. & Glaser, R. (1981). Categorization and Representation of Physics Problems by Experts and Novices.Cognitive Science 5: 121–152.
Cohen, M. M. & Massaro, D. W. (1990). Synthesis of Visible Speech.Behavior Research Methods, Instruments, & Computers 22: 260–263.
Cohen, M. M. & Massaro, D. W. (1993). Modeling Coarticulation in Synthetic Visual Speech. In Thalmann, N. M. and Thalmann, D. (eds.)Models and Techniques in Computer Animation, 139–155. Springer-Verlag: New York.
Gesi, A. T., Massaro, D. W. & Cohen, M. M. (1992). Discovery and Expository Methods in Teaching Visual Consonant and Word Identification.Journal of Speech and Hearing Research 35: 1180–1188.
Guindon, R. (1988). How to Interface to Advisory Systems? Users Request Help With a Very Simple Language. In Proceedingsof CHI '88, 191–196 Association for Computing Machinery: New York.
Hotchkiss, D. (1987).Demographic Aspects of Hearing Impairment: Questions and Answers. Center for Assessment and Demographic Studies, Gallaudet Research Institute: Washington, DC.
Just, M. A. & Carpenter, P. A. (1980). A Theory of Reading: From Eye Fixations to Comprehension.Psychological Review 87: 329–354.
Kendon, A. (1983). Gesture and Speech: How They Interact. In Weimann, J. M. & Harrison, R. P. (eds.)Nonverbal Interaction, 13–45. Sage: Beverly Hills, CA.
Krauss, R., Morrel-Samuels, Pl. & Colasante, C. (1991). Do Conversational Hand Gestures Communicate?Journal of Personality and Social Psychology 61: 743–754.
Kuhl, P. K. & Meltzoff, A. N. (1988). Speech as an Intermodal Object of Perception. In Yonas, Albert (eds.)Perceptual Development in Infancy: The Minnesota Symposia on Child Psychology, Vol. 20, 235–266. Lawrence Erlbaum Associates: Hillsdale, NJ.
Leiser, R. G. (1989). Exploiting Convergence to Improve Natural Langauge Understanding.Interacting with Computers: The Interdisciplinary Journal of Human-Computer Interaction 1: 284–298.
Lesgold, A., Rubinson, H., Feltovich, P., Glaser, R., Klopfer, D. & Wang, Y. (1988). Expertise in a Complex Skill: Diagnosing X-ray Pictures. In Chi, M. T. H., Glaser, R. & Farr, M. J. (eds.)The Nature of Expertise. Lawrence Erlbaum Associates: Hillsdale, NJ.
MacLeod, A. & Summerfield, Q. (1990). A Procedure for Measuring Auditory and Audio-visual Speech-Reception Thresholds for Sentences in Noise: Rationale, Evaluation, and Recommendations for Use.British Journal of Audiology 24: 29–43.
Marslen-Wilson, W. D. & Tyler, L. K. (1980). The Temporal Structure of Spoken Language Understanding.Cognition 8: 1–71.
Massaro, D. W. (1987).Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Erlbaum: Hillsdale, NJ.
Massaro, D. W. (in press). Bimodal Speech Perception Across the Lifespan. In Lewkowicz, D.J. & Lickliter, R. (eds.)The Development of Intersensory Perception: Comparative Perspectives. Lawrence Erlbaum Associates: Hillsdale, NJ.
Massaro, D. W. & Cohen, M. M. (1990). Perception of Synthesized Audible and Visible Speech.Psychological Science 1: 55–63.
Massaro, D. W., Cohen, J. M. & Gesi, A. T. (1993). Long-Term Training, Transfer, and Retention in Learning to Lipread.Perception & Psychophysics 53: 549–562.
Massaro, D. W., Thompson, L. A., Barron, B. & Laren, E. (1986). Developmental Changes in Visual and Auditory Contributions to Speech Perception.Journal of Experimental Child Psychology 41: 93–113.
McGurk, H. & MacDonald, J. (1976). Hearing Lips and Seeing Voices.Nature 264: 746–748.
McNeill, D. (1987). So YouDo Think Gestures Are Nonverbal? Reply to Feyereisen (1987).Psychological Review 94: 499–504.
Ogden, W. C. (1988). Using Natural Language Interfaces. In Helander, M. (ed.)Handbook of Human-Computer Interaction Elsevier Science Publishers: North-Holland.
Ogden, W. C. & Brooks, S. R. (1983). Query Languages for the Casual User: Exploring the Middle Ground Between Formal and Natural Languages. In Proceedings ofCHI '83: Human Factors in Computing Systems, 161–165. Association for Computing Machinery: New York.
Pearce, A., Wyvill, B., Wyvill, G. & Hill, D. (1986). Speech and Expression: A Computer Solution to Face Animation.Graphics Interface '86.
Petajan, E. D. (1985). automatic Lipreading to Enhance Speech Recognition.IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 19–23, 40–47.
Schoenfeld, A. H. & Hermann, D. J. (1982). Problem Perception and Knowledge Structure in Expert and Novice Mathematical Problem Solvers.Journal of Experimental Psychology: Learning, Memory and Cognition 8: 484–494.
Short, J., Williams, E. & Christie, B. (1976).The Social Psychology of Telecommunications. Wiley: Chichester, England.
Silver, E. A. (1979). Students Perceptions of Relatedness Among Mathematical Verbal Problems.Journal for Research in Mathematics Education 12: 54–64.
Strassmann, P. (1990).The Business Value of Computers. Information Economics: New Caanan, CT.
Sumby, W. H. & Pollack, I. (1954). Visual Contribution to Speech Intelligibility in Noise.Journal of the Acoustical Society of America 26: 212–215.
Summerfield, A. Q. (1979). Use of Visual Information in Phonetic Perception.Phonetica 36: 314–331.
Thompson, L. A. (in press). Encoding and Memory for Visible Speech and Gestures: A Comparison Between Young and Older Adults.Psychology and Aging.
Thompson, L.A. & Lee, K. (in press). Information Integration in Cross-Model Pattern Recognition: An Argument for Acquired Modularity.Acta Psychologica.
Thompson, L. A. & Massaro, D. W. (1986). Evaluation and Integration of Speech and pointing Gestures During Referential Understanding.Journal of Experimental Child Psychology 42: 144–168.
Thompson, L. A. & Massaro, D. W. (1994). Children's Integration of Speech and Pointing Gestures in Comprehension.Journal of Experimental Child Psychology 57: 327–354.
Walden, B. E., Prosek, R. A., Montgomery, A., Scherr, C. K. & Jones, C. J. (1977). Effects of Training on the Visual Recognition of Consonants.Journal of Speech and Hearing Research 20: 130–145.
Walden, B. E., Prosek, R. A. & Worthington, D. W. (1974). Predicting Audiovisual Consonant Recognition Performance of Hearing-Impaired Adults.Journal of Speech and Hearing Research 18: 272–280.
Watt, W. C. (1968). Habitability.American Documentation. July, 338–351.
Weiser, M. & Shertz, J. (1983). Programming Problem Representation in Novice and Expert Programmers.International Journal of Man-Machine Studies 19: 391–398.
Williams, E. (1977). Experimental Comparisons of Face-to-Face and Mediated Communication: A Review.Psychological Bulletin 84: 963–976.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Thompson, L.A., Ogden, W.C. Visible speech improves human language understanding: Implications for speech processing systems. Artif Intell Rev 9, 347–358 (1995). https://doi.org/10.1007/BF00849044
Issue Date:
DOI: https://doi.org/10.1007/BF00849044