Audiovisual Alignment in a Face-to-Face Conversation Translation Framework

Gros, Jerneja Žganec; Mihelič, Aleš

doi:10.1007/978-3-642-04391-8_8

Jerneja Žganec Gros²⁰ &
Aleš Mihelič²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5707))

Included in the following conference series:

European Workshop on Biometrics and Identity Management

1075 Accesses

Abstract

Recent improvements in audiovisual alignment for a translating videophone are presented. A method for audiovisual alignment in the target language is proposed and the process of audiovisual speech synthesis is described. The proposed method has been evaluated in the VideoTRAN translating videophone environment, where an H.323 software client translating videophone allows for the transmission and translation of a set of multimodal verbal and nonverbal clues in a multilingual face-to-face communication setting. An extension of subjective evaluation metrics of fluency and adequacy, which are commonly used in subjective machine translation evaluation tests, is proposed for usage in an audiovisual translation environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Designing and Deploying an Interaction Modality for Articulatory-Based Audiovisual Speech Synthesis

The ALST Project: Technologies for Audio Description

Significance of Audio Quality in Speech-to-Text Translation Systems

References

Roebuck, C.: Effective Communication. American Management Association (1999)
Google Scholar
Begley, A.K.: Face to Face Communication: Making Human Connections in a Technology-Driven World. In: Thomson Learning, Boston, MA (2004)
Google Scholar
Žganec Gros, J.: VideoTRAN: A translation framework for audiovisual face-to-face conversations. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 219–226. Springer, Heidelberg (2007)
Chapter Google Scholar
Spens, K.-E., Agelfors, E., Beskow, J., Granström, B., Karlsson, I., Salvi, G.: SYNFACE, a Talking Head Telephone for the Hearing Impaired. In: Proceedings of the IFHOH 7th World Congress, Helsinki, Finland (2004)
Google Scholar
Agelfors, E., Beskow, J., Karlsson, I., Kewley, J., Salvi, G., Thomas, N.: User evaluation of the SYNFACE talking head telephone. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2006. LNCS, vol. 4061, pp. 579–586. Springer, Heidelberg (2006)
Chapter Google Scholar
Žganec Gros, J., Mihelič, F., Erjavec, T., Vintar, Š.: The VoiceTRAN Speech-to-Speech Communicator. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 379–384. Springer, Heidelberg (2005)
Chapter Google Scholar
Žganec Gros, J., Gruden, S.: The VoiceTRAN Machine Translation System. In: Proceedings of the Interspeech 2007, Antwerpen, Belgium, pp. 1521–1524 (2007)
Google Scholar
Campbell, N.: On the use of nonVerbal speech sounds in human communication. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 117–128. Springer, Heidelberg (2007)
Chapter Google Scholar
Bernsen, N.O., Dybkjær, L.: Annotation schemes for verbal and non-verbal communication: Some general issues. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 11–22. Springer, Heidelberg (2007)
Chapter Google Scholar
Ruttkay, Z.: A Presenting in Style by Virtual Humans. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 23–36. Springer, Heidelberg (2007)
Chapter Google Scholar
Ekman, P., Friesen, W.V.: Facial Action Coding System. Consulting Psychologists Press, Palo Alto (1978)
Google Scholar
Ekman, P., Friesen, W.V., Hager, J.C. (eds.): Facial Action Coding System. Research Nexus, Network Research Information, Salt Lake City, UT (2002)
Google Scholar
Krahmer, E., Ruttkay, Z., Swerts, M., Wesselink, W.: Perceptual Evaluation of Audiovisual Cues for Prominence. In: Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), Denver, CO, pp. 1933–1936 (2002)
Google Scholar
Beskow, J., Granström, B., House, D.: Visual Correlates to Prominence in Several Expressive Modes. In: Proceedings of the Interspeech 2006, Pittsburg, PA, pp. 1272–1275 (2006)
Google Scholar
Tian, Y.L., Kanade, T., Cohn, J.F.: Facial Expression Analysis. In: Li, S.Z., Jain, A.K. (eds.) Handbook of Face Recognition. Springer, New York (2005)
Google Scholar
Pandzic, I., Forchheimer, R.: MPEG-4 Facial Animation – the Standard, Implementation and Applications. John Wiley & Sons, Chichester (2002)
Book Google Scholar
Beskow, J., Granström, B., House, D.: Analysis and Synthesis of Multimodal Verbal and Non-verbal Interaction for Animated Interface Agents. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 250–263. Springer, Heidelberg (2007)
Chapter Google Scholar
Ezzat, T., Geiger, G., Poggio, T.: Trainable Videorealistic Speech Animation. In: Proceedings of the ACM SIGGRAPH 2002, San Antonio, TX, pp. 388–398 (2002)
Google Scholar
Cphone project, http://sourceforge.net/projects/cphone
White, J., O’Connell, T., O’Mara, F.: The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In: Proc. of the AMTA, pp. 193–205 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Alpineon Research and Development, Ulica Iga Grudna 15, 1000, Ljubljana, Slovenia
Jerneja Žganec Gros & Aleš Mihelič

Authors

Jerneja Žganec Gros
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Mihelič
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco Tomas y Valiente 11, 28049, Madrid, Spain
Julian Fierrez & Javier Ortega-Garcia &
Second University of Naples, and IIASS, Via Vivaldi 43, 81100, Caserta, Italy
Anna Esposito
EPFL, Speech Processing and Biometrics Group, EPFL-STI-IEL-LIDIAP, ELE 233, Station 11, 1015, Lausanne, Switzerland
Andrzej Drygajlo
Escola Universitària Politècnica de Mataró, Avda. Puig i Cadafalch 101-111, 08303, Mataro (Barcelona), Spain
Marcos Faundez-Zanuy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gros, J.Ž., Mihelič, A. (2009). Audiovisual Alignment in a Face-to-Face Conversation Translation Framework. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-04391-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04390-1
Online ISBN: 978-3-642-04391-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics