ABSTRACT
Current implementations of real-time speech-to-speech (S2S) translation systems for intercultural collaboration have mainly focused on the accuracy of the recognition and translated content. Typically, the translated utterance is presented to users through text-to-speech (TTS), without projecting cultural nuances in the tone of voice. This study investigates whether there are cross-cultural markers of variations in voice dynamics, and, if these have any impact on user satisfaction. Based on subjective user evaluations (Chinese and English), we conclude that there are salient cross-cultural voice markers relevant to the interaction of culture and system design; with noticeable impact on user satisfaction in TTS and S2S systems.
- Abercombie, D. Elements of Phonetics. Edinburgh University Press, Edinburgh, 1967Google Scholar
- Barber, W., and Badre, A. Culturability: The merging of culture and usability. In Proc. Human Factors and the Web (1998)Google Scholar
- Beu, A., Honold, P., and Yuan, X. How to build up an infrastructure for intercultural usability engineering. International Journal of Human-Computer Interaction, 12 (3&4), (2000), 347--358Google ScholarCross Ref
- Chao, Y.R. A Grammar of Spoken Chinese, University of California Press, Berkeley, 1968Google Scholar
- Clemmensen, T., and Plocher, T. The cultural usability (CULTUSAB) project: Studies of cultural models in psychological usability evaluation methods, Lecture Notes in Computer Science 4559, (2007), 274--280. Google ScholarDigital Library
- Crystal, D. Prosodic Systems and Intonation in English, Cambridge University Press, Cambridge, 1969Google Scholar
- Day, D., and Evers, V. The role of culture in interface acceptance. International Conference on Human-Computer Interaction, (1997), 260--267 Google ScholarDigital Library
- Day, D., and Evers, V., Questionnaire development for multicultural data collection. In Proc. Workshop on Internationalization of Products and Systems, (1999)Google Scholar
- Hofstede, G., Cultures and Organizations: Software of the Mind. McGraw Hill, New York, New York, 1991Google Scholar
- Ladefoged, P., and Maddieson, I. The Sounds of the World's Languages, Blackwell publishers, Oxford, 1996Google Scholar
- Lehmann, W.P. Language and Linguistics in the People's Republic of China, University of Texas Press, Austin, 1975Google Scholar
- Lyons, J. Semantics 1 and 2. Cambridge University Press, Cambridge, 1977Google Scholar
- Marcus, A. Culture: Wanted? Alive or dead? Journal of Usability Studies, 1(2), (2006), 62--63Google ScholarDigital Library
- Marcus, A., and Gould, E.M., Cultural dimensions and global web-user interface design. Interactions, (2000), 32--46 Google ScholarDigital Library
- Murray, I., and Arnott, J. Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion, in Journal of Acoustical Society of America, 93 (1993), 1097--1108Google ScholarCross Ref
- O'Grady, W., and Dobrovolsky, M. Contemporary Linguistic Analysis: An Introduction, Copp Clark Pitman Press, Toronto, 1992Google Scholar
- Pike, K.L. The intonation of American English. In Blinger, D. (eds) Intonation, Penguin Press: Harmondsworth, (1945/1972), 53--83Google Scholar
- Vatrapu, R., and Pérez-Quiñones, M. Culture and usability evaluation: The effects of culture in structured interviews. Journal of Usability Studies, 1 (4), (2006), 156--170Google ScholarDigital Library
- Yeo, W.A., Cultural effects in usability assessment. In Proc. Human Factors in Computing Systems, (1998) Google ScholarDigital Library
Index Terms
- Cultural voice markers in speech-to-speech machine translation systems
Recommendations
Impacts of machine translation and speech synthesis on speech-to-speech translation
This paper analyzes the impacts of machine translation and speech synthesis on speech-to-speech translation systems. A typical speech-to-speech translation system consists of three components: speech recognition, machine translation and speech ...
The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks
This paper describes our recent improvements to IBM TRANSTAC speech-to-speech translation systems that address various issues arising from dealing with resource-constrained tasks, which include both limited amounts of linguistic resources and training ...
Distributed speech translation technologies for multiparty multilingual communication
Developing a multilingual speech translation system requires efforts in constructing automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS) components for all possible source and target languages. If the ...
Comments