Skip to main content

Audiovisual Alignment in a Face-to-Face Conversation Translation Framework

  • Conference paper
Biometric ID Management and Multimodal Communication (BioID 2009)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5707))

Included in the following conference series:

  • 1075 Accesses

Abstract

Recent improvements in audiovisual alignment for a translating videophone are presented. A method for audiovisual alignment in the target language is proposed and the process of audiovisual speech synthesis is described. The proposed method has been evaluated in the VideoTRAN translating videophone environment, where an H.323 software client translating videophone allows for the transmission and translation of a set of multimodal verbal and nonverbal clues in a multilingual face-to-face communication setting. An extension of subjective evaluation metrics of fluency and adequacy, which are commonly used in subjective machine translation evaluation tests, is proposed for usage in an audiovisual translation environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Roebuck, C.: Effective Communication. American Management Association (1999)

    Google Scholar 

  2. Begley, A.K.: Face to Face Communication: Making Human Connections in a Technology-Driven World. In: Thomson Learning, Boston, MA (2004)

    Google Scholar 

  3. Žganec Gros, J.: VideoTRAN: A translation framework for audiovisual face-to-face conversations. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 219–226. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Spens, K.-E., Agelfors, E., Beskow, J., Granström, B., Karlsson, I., Salvi, G.: SYNFACE, a Talking Head Telephone for the Hearing Impaired. In: Proceedings of the IFHOH 7th World Congress, Helsinki, Finland (2004)

    Google Scholar 

  5. Agelfors, E., Beskow, J., Karlsson, I., Kewley, J., Salvi, G., Thomas, N.: User evaluation of the SYNFACE talking head telephone. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2006. LNCS, vol. 4061, pp. 579–586. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Žganec Gros, J., Mihelič, F., Erjavec, T., Vintar, Š.: The VoiceTRAN Speech-to-Speech Communicator. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 379–384. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Žganec Gros, J., Gruden, S.: The VoiceTRAN Machine Translation System. In: Proceedings of the Interspeech 2007, Antwerpen, Belgium, pp. 1521–1524 (2007)

    Google Scholar 

  8. Campbell, N.: On the use of nonVerbal speech sounds in human communication. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 117–128. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Bernsen, N.O., Dybkjær, L.: Annotation schemes for verbal and non-verbal communication: Some general issues. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 11–22. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  10. Ruttkay, Z.: A Presenting in Style by Virtual Humans. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 23–36. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Ekman, P., Friesen, W.V.: Facial Action Coding System. Consulting Psychologists Press, Palo Alto (1978)

    Google Scholar 

  12. Ekman, P., Friesen, W.V., Hager, J.C. (eds.): Facial Action Coding System. Research Nexus, Network Research Information, Salt Lake City, UT (2002)

    Google Scholar 

  13. Krahmer, E., Ruttkay, Z., Swerts, M., Wesselink, W.: Perceptual Evaluation of Audiovisual Cues for Prominence. In: Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), Denver, CO, pp. 1933–1936 (2002)

    Google Scholar 

  14. Beskow, J., Granström, B., House, D.: Visual Correlates to Prominence in Several Expressive Modes. In: Proceedings of the Interspeech 2006, Pittsburg, PA, pp. 1272–1275 (2006)

    Google Scholar 

  15. Tian, Y.L., Kanade, T., Cohn, J.F.: Facial Expression Analysis. In: Li, S.Z., Jain, A.K. (eds.) Handbook of Face Recognition. Springer, New York (2005)

    Google Scholar 

  16. Pandzic, I., Forchheimer, R.: MPEG-4 Facial Animation – the Standard, Implementation and Applications. John Wiley & Sons, Chichester (2002)

    Book  Google Scholar 

  17. Beskow, J., Granström, B., House, D.: Analysis and Synthesis of Multimodal Verbal and Non-verbal Interaction for Animated Interface Agents. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 250–263. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Ezzat, T., Geiger, G., Poggio, T.: Trainable Videorealistic Speech Animation. In: Proceedings of the ACM SIGGRAPH 2002, San Antonio, TX, pp. 388–398 (2002)

    Google Scholar 

  19. Cphone project, http://sourceforge.net/projects/cphone

  20. White, J., O’Connell, T., O’Mara, F.: The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In: Proc. of the AMTA, pp. 193–205 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gros, J.Ž., Mihelič, A. (2009). Audiovisual Alignment in a Face-to-Face Conversation Translation Framework. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04391-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04390-1

  • Online ISBN: 978-3-642-04391-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics