skip to main content
10.1145/2596695.2596713acmconferencesArticle/Chapter ViewAbstractPublication Pagesw4aConference Proceedingsconference-collections
research-article

Introducing game elements in crowdsourced video captioning by non-experts

Published:07 April 2014Publication History

ABSTRACT

Video captioning can increase the accessibility of information for people who are deaf or hard-of-hearing and benefit second language learners and reading-deficient students. We propose a caption editing system that harvests crowdsourced work for the useful task of video captioning. To make the task an engaging activity, its interface incorporates game-like elements. Non-expert users submit their transcriptions for short video segments against a countdown timer, either in a "type" or "fix" mode, to score points. Transcriptions from multiple users are aligned and merged to form the final captions. Preliminary results with 42 participants and 578 short video segments show that the Word Error Rate of the merged captions with two users per segment improved from 20.7% in ASR to 16%. Finally, we discuss our work in progress to improve both the accuracy of the collected data and to increase the crowd engagement.

References

  1. Amazon Mechanical Turk. http://www.mturk.com.Google ScholarGoogle Scholar
  2. Automatic captions for YouTube videos. http://googleblog.blogspot.com.Google ScholarGoogle Scholar
  3. CastingWords. https://castingwords.com.Google ScholarGoogle Scholar
  4. Gruenstein, A., McGraw, I., and Sutherland, A. 2009. A self-transcribing speech corpus: collecting continuous speech with an online educational game. In SLaTE Workshop.Google ScholarGoogle Scholar
  5. Goto, M., and Ogata, J. 2011. PodCastle: Recent Advances of a Spoken Document Retrieval Service Improved by Anonymous User Contributions. In INTERSPEECH 2011, 3073--3076.Google ScholarGoogle Scholar
  6. Kirkland, E., Byrom, E., MacDougall, M., and Corcoran, M. 1995. The Effectiveness of Television Captioning on Comprehension and Preference. American Educational Research Association, 1995 Annual Meeting, San Francisco, CA.Google ScholarGoogle Scholar
  7. Krajka, J. 2013. Audiovisual Translation in LSP--A Case for Using Captioning in Teaching Languages for Specific Purposes. Scripta Manent 8(1), 2--14.Google ScholarGoogle Scholar
  8. Lasecki, W. S., Miller, C., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., and Bigham, J. P. 2012. Real-time captioning by groups of non-experts. In Proc. UIST 2012, ACM, 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Liem, B., Zhang, H., and Chen, Y. 2011. An Iterative Dual Pathway Structure for Speech-to-Text Transcription. In Human Computation.Google ScholarGoogle Scholar
  10. Meyer, M. J., and Lee, Y. B. B. 1995. Closed-Captioned Prompt Rates: Their Influence on Reading Outcomes. Office of Special Education and Rehabilitative Services.Google ScholarGoogle Scholar
  11. Naim, I., Gildea, D., Lasecki, W. S., and Bigham, J. P. 2013. Text Alignment for Real-Time Crowd Captioning. In Proc. NAACL-HLT 2013, 201--210.Google ScholarGoogle Scholar
  12. Sobhi, A., Nagatsuma, R., and Saitoh, T. 2012. Collaborative Caption Editing System--Enhancing the Quality of a Captioning and Editing System. In Proc. of the 28th Annual International Technology and Persons with Disabilities Conference (CSUN).Google ScholarGoogle Scholar
  13. Soltau, H., Saon, G., and Kingsbury, B. 2010. The IBM Attila speech recognition toolkit. In Spoken Language Technology Workshop (SLT), IEEE, 97--102.Google ScholarGoogle Scholar
  14. von Ahn, L., and Dabbish, L. 2008. Designing games with a purpose. Comm. ACM, 51(8), 58--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wald, M. 2013. Concurrent Collaborative Captioning. In Proc. SERP 2013Google ScholarGoogle Scholar
  16. Winke, P., Gass, S., and Sydorenko, T. 2010. The effects of captioning videos used for foreign language listening activities. Language Learning & Technology, 14(1), 65--8.Google ScholarGoogle Scholar

Index Terms

  1. Introducing game elements in crowdsourced video captioning by non-experts

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          W4A '14: Proceedings of the 11th Web for All Conference
          April 2014
          192 pages
          ISBN:9781450326513
          DOI:10.1145/2596695

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 April 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          W4A '14 Paper Acceptance Rate6of14submissions,43%Overall Acceptance Rate171of371submissions,46%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader