Skip to main content
Log in

To Mix or Not to Mix Synthetic Speech and Human Speech? Contrasting Impact on Judge-Rated Task Performance versus Self-Rated Performance and Attitudinal Responses

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Since it is impractical to prerecord human speech for dynamic content such as email messages and news, many commercial speech applications use recorded human speech for fixed content (e.g. system prompts) and synthetic speech for dynamic content. However, mixing human speech and synthetic speech may not be optimal from a consistency perspective. A two-condition between-participants experiment (N = 24) was conducted to compare two versions of a telephony application for Personal Information Management (PIM). In the first condition, all the system output was delivered with synthetic speech. In the second condition, users heard a mix of human speech and synthetic speech. Users managed several email and calendar tasks. Users' task performance was rated by two independent judges. Their self-ratings of task performance and attitudinal responses were also measured by means of questionnaires. Users interacting with the interface that used only synthetic speech performed the task significantly better, while users interacting with the mixed-speech interface thought they did better and had more positive attitudinal responses. A consistency framework drawn from human psychological processing is offered to explain the difference in task performance. Cognitive processing and attitudinal response are differentiated. Design implications and directions for future research are suggested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Asch, S.E. (1946). Forming impressions of personality. Journal of Abnormal and Social Psychology, 41:1230-1240.

    Google Scholar 

  • Dyer, F.N. (1973). Interference and facilitation for color naming with separate bilateral presentations of the word and color. Journal of Experimental Psychology, 99:314-317.

    Google Scholar 

  • Gong, L. (2001). Pairing media-captured human versus computersynthesized humanoid faces and voices for talking heads:Aconsistency theory for interface agents. Doctoral Dissertation, Stanford University, California.

  • Gong, L., Nass, C., Simard, C., and Takhteyev, Y. (2001). When non-human is better than semi-human: Consistency in speech interfaces. In M.J. Smith, G. Salvendy, D. Harris, and R. Koubek (Eds.), Usability Evaluation and Interface Design: Cognitive Engineering, Intelligent Agents, and Virtual Reality. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 1558-1562.

    Google Scholar 

  • Hamers, J.F. and Lambert,W.E. (1972). Bilingual interdependencies in auditory perception. Journal of Verbal Learning and Verbal Behavior, 11:303-310.

    Google Scholar 

  • Isbister,K. and Nass, C. (2000). Consistency of personality in interactive characters: Verbal cues, non-verbal cues, and user characteristics. International Journal of Human-Computer Studies, 53:251-267.

    Google Scholar 

  • Kahneman, D. and Chajczyk, D. (1983). Tests of the automaticity of reading: Dilution of Stroop effects by color-irrelevant stimuli. Journal of Experimental Psychology: Human Perception and Performance, 9:497-509.

    Google Scholar 

  • Kelley, H.H. (1967). Attribution theory in social psychology. In D. Levine (Ed.), Nebraska Symposium on Motivation. Lincoln, NE: University of Nebraska Press, vol. 15, pp. 192-240.

    Google Scholar 

  • Lai, J., Wood, D., and Considine, M. (2000). The effect of task conditions on the comprehensibility of synthetic speech. Proceedings of the Conference on Human Factors in Computing Systems (CHI '00), The Hague, The Netherlands: ACMPress, pp. 321-328.

    Google Scholar 

  • McInnes, F.R., Attwater, D.J., Edgington, M.D., Schmidt, M.S., and Jack, M.A. (1999). User attitudes to concatenated natural speech and text-to-speech synthesis in an automated information service. Proceedings of Eurospeech '99 (European Conference on Speech Communication and Technology). Budapest, Hungary, pp. 831-834.

  • Nass, C. and Lee, K.M. (2001). Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. Journal of Experimental Psychology: Applied, 7(3):171-181.

    Google Scholar 

  • Olive, J.P. (1997). “The talking computer”: Text-to-speech synthesis. In D.G. Stork (Ed.), HAL's Legacy: 2001's Computer as Dream and Reality. Cambridge, MA: MIT Press, pp. 101-131.

    Google Scholar 

  • Ralston, J.V., Pisoni, D.B., and Mullennix, J.W. (1995). Perception and comprehension of speech. In A.K. Syrdal, R.W. Bennett, and S.L. Greenspan (Eds.), Applied Speech Technology. Boca Raton, FL: CRC Press, pp. 233-288.

    Google Scholar 

  • Roy, L. and Sawyers, J.K. (1990). Interpreting subtle inconsistency and consistency: A developmental-clinical perspective. Journal of Genetic Psychology, 151:515-521.

    Google Scholar 

  • Spiegel, M.F. (1997). Advanced database preprocessing and preparation that enable telecommunication services based on speech synthesis. Speech Communication, 22:51-62.

    Google Scholar 

  • Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18:643-663.

    Google Scholar 

  • van Santen, J., Macon, M., Cronk, A., Hosom, P., Kain, A., Pagel, V., and Wouters, J. (2000). When will synthetic speech sound human: Role of rules and data. Proceedings of International Conference of Spoken Language Processing. Beijing, China, pp. 878-882.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gong, L., Lai, J. To Mix or Not to Mix Synthetic Speech and Human Speech? Contrasting Impact on Judge-Rated Task Performance versus Self-Rated Performance and Attitudinal Responses. International Journal of Speech Technology 6, 123–131 (2003). https://doi.org/10.1023/A:1022382413579

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022382413579

Navigation