Skip to main content

VideoTRAN: A Translation Framework for Audiovisual Face-to-Face Conversations

  • Conference paper
Verbal and Nonverbal Communication Behaviours

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4775))

Abstract

Face-to-face communication remains the most powerful human interaction. Electronic devices can never fully replace the intimacy and immediacy of people conversing in the same room, or at least via a videophone. There are many subtle cues provided by facial expressions and vocal intonation that let us know how what we are saying is affecting the other person. Transmission of these nonverbal cues is very important when translating conversations from a source language into a target language. This chapter describes VideoTRAN, a conceptual framework for translating audiovisual face-to-face conversations. A simple method for audiovisual alignment in the target language is proposed and the process of audiovisual speech synthesis is described. The VideoTRAN framework has been tested in a translating videophone. An H.323 software client translating videophone allows for the transmission and translation of a set of multimodal verbal and nonverbal clues in a multilingual face-to-face communication setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Roebuck, C.: Effective Communication. American Management Association (1999)

    Google Scholar 

  2. Begley, A.K.: Face to Face Communication: Making Human Connections in a Technology-Driven World. Thomson Learning, Boston, MA (2004)

    Google Scholar 

  3. Spens, K.-E., Agelfors, E., Beskow, J., Granström, B., Karlsson, I., Salvi, G.: SYNFACE: a Talking Head Telephone for the Hearing Impaired. In: Proceedings of the IFHOH 7th World Congress. Helsinki, Finland (2004)

    Google Scholar 

  4. Agelfors, E., Beskow, J., Karlsson, I., Kewley, J., Salvi, G., Thomas, N.: User Evaluation of the SYNFACE Talking Head Telephone. In: Miesenberger, K., Klaus, J., Zagler, W., Karshmer, A.I. (eds.) ICCHP 2006. LNCS, vol. 4061, pp. 579–586. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. VoiceTRAN project website: http://www.voiceTRAN.org

  6. Žganec Gros, J., Gruden, S., Mihelič, F., Erjavec, T., Vintar, Š., Holozan, P., Mihelič, A., Dobrišek, S., Žibert, J., Logar, N., Korošec, T.: The VoiceTRAN Speech Translation Demonstrator. In: Proceedings of the IS-LTC 2006. Ljubljana, Slovenia, pp. 234–239 (2006)

    Google Scholar 

  7. Žganec Gros, J., Gruden, S.: The VoiceTRAN Machine Translation System. In: Interspeech 2007. Antwerpen, Belgium (submitted, 2007)

    Google Scholar 

  8. Ekman, P., Friesen, W.V.: Facial Action Coding System. Consulting Psychologists Press, Palo Alto, CA (1978)

    Google Scholar 

  9. Ekman, P., Friesen, W.V., Hager, J.C. (eds.): Facial Action Coding System. Research Nexus. Network Research Information, Salt Lake City, UT (2002)

    Google Scholar 

  10. Krahmer, E., Ruttkay, Z., Swerts, M., Wesselink, W.: Perceptual Evaluation of Audiovisual Cues for Prominence. In: ICSLP 2002. Proceedings of the 7th International Conference on Spoken Language Processing, Denver, CO, pp. 1933–1936 (2002)

    Google Scholar 

  11. Beskow, J., Granström, B., House, D.: Visual Correlates to Prominence in Several Expressive Modes. In: Proceedings of the Interspeech 2007. Pittsburg, PA, pp. 1272–1275 (2006)

    Google Scholar 

  12. Tian, Y.L., Kanade, T., Cohn, J.F.: Facial Expression Analysis. In: Li, S.Z., Jain, A.K. (eds.) Handbook of Face Recognition, Springer, NY (in press, 2007)

    Google Scholar 

  13. Pandzic, I., Forchheimer, R.: MPEG-4 Facial Animation - the Standard, Implementation and Applications. John Wiley & Sons, Chichester, England (2002)

    Google Scholar 

  14. Ezzat, T., Geiger, G., Poggio, T.: Trainable Videorealistic Speech Animation. In: Proceedings of the ACM SIGGRAPH 2002, San Antonio, TX, pp. 388–398. ACM Press, New York (2002)

    Google Scholar 

  15. Cphone project, http://sourceforge.net/projects/cphone

Download references

Author information

Authors and Affiliations

Authors

Editor information

Anna Esposito Marcos Faundez-Zanuy Eric Keller Maria Marinaro

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gros, J.Ž. (2007). VideoTRAN: A Translation Framework for Audiovisual Face-to-Face Conversations. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds) Verbal and Nonverbal Communication Behaviours. Lecture Notes in Computer Science(), vol 4775. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76442-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76442-7_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76441-0

  • Online ISBN: 978-3-540-76442-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics