Skip to main content

The Importance of Visual Context Clues in Multimedia Translation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6941))

Abstract

As video-sharing websites such as YouTube proliferate, the ability to rapidly translate video clips into multiple languages has become an essential component for enhancing their global reach and impact. Moreover, the ability to provide closed captioning in a variety of languages is paramount to reach a wider variety of viewers. We investigate the importance of visual context clues by comparing transcripts of multimedia clips (which allow transcriptionists to make use of visual context clues in their translations) with their corresponding written transcripts (which do not). Additionally, we contrast translations produced using crowdsourcing workers with those made by professional translators on cost and quality. Finally, we evaluate several genres of multimedia to examine the effects of visual context clues on each and demonstrate the results through heat maps.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rao, L.: comScore: YouTube Reaches All-Time High of 14.6 Billion Videos Viewed In (May), http://techcrunch.com/2010/06/24/comscore-youtube-reaches-all-time-high-of-14-6-billion-videos-viewed-in-may/ (retrieved May 5, 2011)

  2. Crocker, M.: Computational Psycholinguistics. Kluwer Academic Publishing, Dordrecht (1996)

    Book  Google Scholar 

  3. Grainger, J., Dijkstra, T. (eds.): Visual word recognition: Models and experiments. Computational psycholinguistics: AI and connectionist models of human language processing. Taylor & Francis, London (1996)

    Google Scholar 

  4. Johnson-Laird, P.N.: Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Cambridge University Press, Cambridge (1983)

    Google Scholar 

  5. Chun, M.M.: Contextual cueing of visual attention. Trends in Cognitive Sciences 4, 170–178 (2000)

    Article  Google Scholar 

  6. Torres-Oviedo, G., Bastian, A.J.: Seeing is believing: effects of visual contextual cues on learning and transfer of locomotor adaptation. Neuroscience 30, 17015–17022 (2010)

    Google Scholar 

  7. Deubel, H., et al. (eds.): Attention, information processing and eye movement control. Reading as a perceptual process. Elsevier, Oxford (2000)

    Google Scholar 

  8. Mueller, G.: Visual contextual cues and listening comprehension: An experiment. Modern Language Journal 64, 335–340 (1980)

    Article  Google Scholar 

  9. Meskill, C.: Listening skills development through multimedia. Journal of Educational Multimedia and Hypermedia 5, 179–201 (1996)

    Google Scholar 

  10. Fernald, A., et al. (eds.): Looking while listening: Using eye movements to monitor spoken language comprehension by infants and young children. Developmental Psycholonguistics: On-line methods in children’s language processing. John Benjamins, Amsterdam (2008)

    Google Scholar 

  11. Roy, D., Mukherjee, N.: Towards Situated Speech Understanding: Visual Context Priming of Language Models. Computer Speech and Language 19, 227–248 (2005)

    Article  Google Scholar 

  12. Hardison, D.: Visual and auditory input in second-language speech processing. Language Teaching 43, 84–95 (2010)

    Article  Google Scholar 

  13. Cunillera, T., et al.: Speech segmentation is facilitated by visual cues. Quarterly Journal of Experimental Psychology 63, 260–274 (2010)

    Article  Google Scholar 

  14. Long, D.R.: Second language listening comprehension: A schema-theoretic perspective. Modern Language Journal 73 (Spring 1989)

    Google Scholar 

  15. Gullberg, M., et al.: Adult Language Learning After Minimal Exposure to an Unknown Natural Language. Language Learning 60, 5–24 (2010)

    Article  Google Scholar 

  16. Kawahara, J.: Auditory-visual contextual cuing effect. Percept. Psychophys 69, 1399–1408 (2007)

    Article  Google Scholar 

  17. Lew, M.S., et al.: Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1–19 (2006)

    Article  Google Scholar 

  18. Zhang, X., et al.: A visualized communication system using cross-media semantic association. Presented at the 17th International Conference on Advances in Multimedia Modeling - Volume Part II, Taipei, Taiwan (2011)

    Google Scholar 

  19. Tung, L.L., Quaddus, M.A.: Cultural differences explaining the differences in results in GSS: implications for the next decade. Decis. Support Syst. 33, 177–199 (2002)

    Article  Google Scholar 

  20. Morita, D., Ishida, T.: Collaborative translation by monolinguals with machine translators. Presented at the 14th International Conference on Intelligent User Interfaces, Sanibel Island, Florida, USA (2009)

    Google Scholar 

  21. Bar-Hillel, Y.: A demonstration of the nonfeasibility of fully automatic high quality machine translation. Jerusalem Academic Press, Jerusalem (1964)

    Google Scholar 

  22. Madsen, M.: The Limits of Machine Translation, Masters in Information Technology and Cognition, Scandanavian Studies and Linguistics. University of Copenhagen, Copenhagen (2009)

    Google Scholar 

  23. Howe, J.: The Rise of Crowdsourcing. Wired (June 2006)

    Google Scholar 

  24. Munro, R., et al.: Crowdsourcing and language studies: the new generation of linguistic data. Presented at the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 122–130 (2010)

    Google Scholar 

  25. Snow, R., et al.: Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. Presented at the Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii (2008)

    Google Scholar 

  26. Marge, M., et al.: Using the Amazon Mechanical Turk for transcription of spoken language. In: ICASSP (2010)

    Google Scholar 

  27. Novotney, S., Callison-Burch, C.: Cheap, fast and good enough: automatic speech recognition with non-expert transcription. Presented at Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT 2010), pp. 207–215 (2010)

    Google Scholar 

  28. Banerjee, S., Lavie, A.: METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Presented at the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan (2005)

    Google Scholar 

  29. Porter, M.: Snowball: A language for stemming algorithms (2001), http://snowball.tartarus.org/texts/

  30. Miller, G., Fellbaum, C.: WordNet, http://wordnet.princeton.edu (retrieved April 6, 2011)

  31. van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)

    MATH  Google Scholar 

  32. Agarwal, A., Lavie, A.: METEOR, M-BLEU and M-TER: evaluation metrics for high-correlation with human rankings of machine translation output. Presented at the Third Workshop on Statistical Machine Translation, Columbus, Ohio (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Harris, C.G., Xu, T. (2011). The Importance of Visual Context Clues in Multimedia Translation. In: Forner, P., Gonzalo, J., Kekäläinen, J., Lalmas, M., de Rijke, M. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2011. Lecture Notes in Computer Science, vol 6941. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23708-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23708-9_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23707-2

  • Online ISBN: 978-3-642-23708-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics