Abstract
Recent improvements in audiovisual alignment for a translating videophone are presented. A method for audiovisual alignment in the target language is proposed and the process of audiovisual speech synthesis is described. The proposed method has been evaluated in the VideoTRAN translating videophone environment, where an H.323 software client translating videophone allows for the transmission and translation of a set of multimodal verbal and nonverbal clues in a multilingual face-to-face communication setting. An extension of subjective evaluation metrics of fluency and adequacy, which are commonly used in subjective machine translation evaluation tests, is proposed for usage in an audiovisual translation environment.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Roebuck, C.: Effective Communication. American Management Association (1999)
Begley, A.K.: Face to Face Communication: Making Human Connections in a Technology-Driven World. In: Thomson Learning, Boston, MA (2004)
Žganec Gros, J.: VideoTRAN: A translation framework for audiovisual face-to-face conversations. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 219–226. Springer, Heidelberg (2007)
Spens, K.-E., Agelfors, E., Beskow, J., Granström, B., Karlsson, I., Salvi, G.: SYNFACE, a Talking Head Telephone for the Hearing Impaired. In: Proceedings of the IFHOH 7th World Congress, Helsinki, Finland (2004)
Agelfors, E., Beskow, J., Karlsson, I., Kewley, J., Salvi, G., Thomas, N.: User evaluation of the SYNFACE talking head telephone. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2006. LNCS, vol. 4061, pp. 579–586. Springer, Heidelberg (2006)
Žganec Gros, J., Mihelič, F., Erjavec, T., Vintar, Š.: The VoiceTRAN Speech-to-Speech Communicator. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 379–384. Springer, Heidelberg (2005)
Žganec Gros, J., Gruden, S.: The VoiceTRAN Machine Translation System. In: Proceedings of the Interspeech 2007, Antwerpen, Belgium, pp. 1521–1524 (2007)
Campbell, N.: On the use of nonVerbal speech sounds in human communication. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 117–128. Springer, Heidelberg (2007)
Bernsen, N.O., Dybkjær, L.: Annotation schemes for verbal and non-verbal communication: Some general issues. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 11–22. Springer, Heidelberg (2007)
Ruttkay, Z.: A Presenting in Style by Virtual Humans. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 23–36. Springer, Heidelberg (2007)
Ekman, P., Friesen, W.V.: Facial Action Coding System. Consulting Psychologists Press, Palo Alto (1978)
Ekman, P., Friesen, W.V., Hager, J.C. (eds.): Facial Action Coding System. Research Nexus, Network Research Information, Salt Lake City, UT (2002)
Krahmer, E., Ruttkay, Z., Swerts, M., Wesselink, W.: Perceptual Evaluation of Audiovisual Cues for Prominence. In: Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), Denver, CO, pp. 1933–1936 (2002)
Beskow, J., Granström, B., House, D.: Visual Correlates to Prominence in Several Expressive Modes. In: Proceedings of the Interspeech 2006, Pittsburg, PA, pp. 1272–1275 (2006)
Tian, Y.L., Kanade, T., Cohn, J.F.: Facial Expression Analysis. In: Li, S.Z., Jain, A.K. (eds.) Handbook of Face Recognition. Springer, New York (2005)
Pandzic, I., Forchheimer, R.: MPEG-4 Facial Animation – the Standard, Implementation and Applications. John Wiley & Sons, Chichester (2002)
Beskow, J., Granström, B., House, D.: Analysis and Synthesis of Multimodal Verbal and Non-verbal Interaction for Animated Interface Agents. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 250–263. Springer, Heidelberg (2007)
Ezzat, T., Geiger, G., Poggio, T.: Trainable Videorealistic Speech Animation. In: Proceedings of the ACM SIGGRAPH 2002, San Antonio, TX, pp. 388–398 (2002)
Cphone project, http://sourceforge.net/projects/cphone
White, J., O’Connell, T., O’Mara, F.: The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In: Proc. of the AMTA, pp. 193–205 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gros, J.Ž., Mihelič, A. (2009). Audiovisual Alignment in a Face-to-Face Conversation Translation Framework. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-04391-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04390-1
Online ISBN: 978-3-642-04391-8
eBook Packages: Computer ScienceComputer Science (R0)