Abstract
As video-sharing websites such as YouTube proliferate, the ability to rapidly translate video clips into multiple languages has become an essential component for enhancing their global reach and impact. Moreover, the ability to provide closed captioning in a variety of languages is paramount to reach a wider variety of viewers. We investigate the importance of visual context clues by comparing transcripts of multimedia clips (which allow transcriptionists to make use of visual context clues in their translations) with their corresponding written transcripts (which do not). Additionally, we contrast translations produced using crowdsourcing workers with those made by professional translators on cost and quality. Finally, we evaluate several genres of multimedia to examine the effects of visual context clues on each and demonstrate the results through heat maps.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Rao, L.: comScore: YouTube Reaches All-Time High of 14.6 Billion Videos Viewed In (May), http://techcrunch.com/2010/06/24/comscore-youtube-reaches-all-time-high-of-14-6-billion-videos-viewed-in-may/ (retrieved May 5, 2011)
Crocker, M.: Computational Psycholinguistics. Kluwer Academic Publishing, Dordrecht (1996)
Grainger, J., Dijkstra, T. (eds.): Visual word recognition: Models and experiments. Computational psycholinguistics: AI and connectionist models of human language processing. Taylor & Francis, London (1996)
Johnson-Laird, P.N.: Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Cambridge University Press, Cambridge (1983)
Chun, M.M.: Contextual cueing of visual attention. Trends in Cognitive Sciences 4, 170–178 (2000)
Torres-Oviedo, G., Bastian, A.J.: Seeing is believing: effects of visual contextual cues on learning and transfer of locomotor adaptation. Neuroscience 30, 17015–17022 (2010)
Deubel, H., et al. (eds.): Attention, information processing and eye movement control. Reading as a perceptual process. Elsevier, Oxford (2000)
Mueller, G.: Visual contextual cues and listening comprehension: An experiment. Modern Language Journal 64, 335–340 (1980)
Meskill, C.: Listening skills development through multimedia. Journal of Educational Multimedia and Hypermedia 5, 179–201 (1996)
Fernald, A., et al. (eds.): Looking while listening: Using eye movements to monitor spoken language comprehension by infants and young children. Developmental Psycholonguistics: On-line methods in children’s language processing. John Benjamins, Amsterdam (2008)
Roy, D., Mukherjee, N.: Towards Situated Speech Understanding: Visual Context Priming of Language Models. Computer Speech and Language 19, 227–248 (2005)
Hardison, D.: Visual and auditory input in second-language speech processing. Language Teaching 43, 84–95 (2010)
Cunillera, T., et al.: Speech segmentation is facilitated by visual cues. Quarterly Journal of Experimental Psychology 63, 260–274 (2010)
Long, D.R.: Second language listening comprehension: A schema-theoretic perspective. Modern Language Journal 73 (Spring 1989)
Gullberg, M., et al.: Adult Language Learning After Minimal Exposure to an Unknown Natural Language. Language Learning 60, 5–24 (2010)
Kawahara, J.: Auditory-visual contextual cuing effect. Percept. Psychophys 69, 1399–1408 (2007)
Lew, M.S., et al.: Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1–19 (2006)
Zhang, X., et al.: A visualized communication system using cross-media semantic association. Presented at the 17th International Conference on Advances in Multimedia Modeling - Volume Part II, Taipei, Taiwan (2011)
Tung, L.L., Quaddus, M.A.: Cultural differences explaining the differences in results in GSS: implications for the next decade. Decis. Support Syst. 33, 177–199 (2002)
Morita, D., Ishida, T.: Collaborative translation by monolinguals with machine translators. Presented at the 14th International Conference on Intelligent User Interfaces, Sanibel Island, Florida, USA (2009)
Bar-Hillel, Y.: A demonstration of the nonfeasibility of fully automatic high quality machine translation. Jerusalem Academic Press, Jerusalem (1964)
Madsen, M.: The Limits of Machine Translation, Masters in Information Technology and Cognition, Scandanavian Studies and Linguistics. University of Copenhagen, Copenhagen (2009)
Howe, J.: The Rise of Crowdsourcing. Wired (June 2006)
Munro, R., et al.: Crowdsourcing and language studies: the new generation of linguistic data. Presented at the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 122–130 (2010)
Snow, R., et al.: Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. Presented at the Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii (2008)
Marge, M., et al.: Using the Amazon Mechanical Turk for transcription of spoken language. In: ICASSP (2010)
Novotney, S., Callison-Burch, C.: Cheap, fast and good enough: automatic speech recognition with non-expert transcription. Presented at Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT 2010), pp. 207–215 (2010)
Banerjee, S., Lavie, A.: METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Presented at the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan (2005)
Porter, M.: Snowball: A language for stemming algorithms (2001), http://snowball.tartarus.org/texts/
Miller, G., Fellbaum, C.: WordNet, http://wordnet.princeton.edu (retrieved April 6, 2011)
van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Agarwal, A., Lavie, A.: METEOR, M-BLEU and M-TER: evaluation metrics for high-correlation with human rankings of machine translation output. Presented at the Third Workshop on Statistical Machine Translation, Columbus, Ohio (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Harris, C.G., Xu, T. (2011). The Importance of Visual Context Clues in Multimedia Translation. In: Forner, P., Gonzalo, J., Kekäläinen, J., Lalmas, M., de Rijke, M. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2011. Lecture Notes in Computer Science, vol 6941. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23708-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-23708-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23707-2
Online ISBN: 978-3-642-23708-9
eBook Packages: Computer ScienceComputer Science (R0)