ABSTRACT
We describe an initial attempt to develop a common platform for adding an audio description (AD) to an online video so that blind and visually impaired people can enjoy such material. A speech synthesis technology allows content providers to offer the AD at minimal cost. We exploit external metadata so that the AD can be independent of the video format. The external approach also allows external supporters to add ADs to any online videos. Our technology includes an authoring tool for writing AD scripts, a Web browser add-on for synthesizing ADs synchronized with original videos, and a text-based format to exchange AD scripts.
- Fukuda, T., Ichikawa, O., and Nishimura, M. Phone-duration-dependent Long-term Dynamic Features for Stochastic Model-based Voice Activity Detection, In Proceedings of ICSLP 2008/Interspeech 2008, ISCA, 2008, pp. 1293--1296.Google Scholar
- Miyashita, H., Sato, D., Takagi, H., and Asakawa, C. aiBrowser for Multimedia: Introducing Multimedia Content Accessibility for Visually Impaired Users, In Proceedings of ASSETS '07, ACM, 2007, pp. 91--98. Google ScholarDigital Library
- Miyashita, H., Sato, D., Takagi, H., and Asakawa, C. Making Multimedia Content Accessible for Screen Reader Users, In Proceedings of W4A '07, ACM, 2007, pp. 126--127. Google ScholarDigital Library
- Takagi, H., Kawanaka, S., Kobayashi, M., Itoh, T., and Asakawa, C. Social Accessibility: Achieving Accessibility through Collaborative Metadata Authoring. In Proceedings of ASSETS '08, ACM, 2008, pp. 193--200. Google ScholarDigital Library
- CapScribe, http://capscribe.snow.utoronto.ca/Google Scholar
- CNN Video, http://www.cnn.com/video/Google Scholar
- LiveDescribe, http://www.livedescribe.com/Google Scholar
- MAGpie, http://ncam.wgbh.org/webaccess/magpie/Google Scholar
- Section 508, http://www.section508.gov/Google Scholar
- Synchronized Multimedia Integration Language (SMIL 3.0), http://www.w3.org/TR/SMIL/Google Scholar
- Speech Synthesis Markup Language (SSML) Version 1.0, http://www.w3.org/TR/speech-synthesis/Google Scholar
- Timed Text (TT) Authoring Format 1.0 - Distribution Format Exchange Profile (DFXP), http://www.w3.org/TR/ttaf1-dfxp/Google Scholar
- Web Contents Accessibility Guidelines (WCAG) 2.0, http://www.w3.org/TR/WCAG20/Google Scholar
- YouTube, http://www.youtube.com/Google Scholar
Index Terms
- Providing synthesized audio description for online videos
Recommendations
What Makes Videos Accessible to Blind and Visually Impaired People?
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsUser-generated videos are an increasingly important source of information online, yet most online videos are inaccessible to blind and visually impaired (BVI) people. To find videos that are accessible, or understandable without additional description ...
Toward Automatic Audio Description Generation for Accessible Videos
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsVideo accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating high-quality audio descriptions requires a lot of ...
Are synthesized video descriptions acceptable?
ASSETS '10: Proceedings of the 12th international ACM SIGACCESS conference on Computers and accessibilityWe conducted a series of experiments to assess the feasibility of synthesized narrations to describe online videos. To reduce the cultural bias, we included adult blind or low-vision participants from Japan and the U.S. in the main study. Our research ...
Comments