Abstract
Automating the generation of audio descriptions (AD) for blind and visually impaired (BVI) people is a difficult task, since it has several challenges involved, such as: identifying gaps in dialogues; describing the essential elements; summarizing and fitting the descriptions into the dialogue gaps; generating an AD narration track, and synchronizing it with the main soundtrack. In our previous work (Campos et al. [6]), we propose a solution for automatic AD script generation, named CineAD, which uses the movie’s script as a basis for the AD generation. This article proposes extending this solution to complement the information extracted from the script and reduce its dependency based on the classification of visual information from the video. To assess the viability of the proposed solution, we implemented a proof of concept of the solution and evaluated it with 11 blind users. The results showed that the solution could generate a more succinct and objective AD but with a similar users’ level of understanding compared to our previous work. Thus, the solution can provide relevant information to blind users using less video time for descriptions.
Supplemental Material
Available for Download
Supplementary material
- [1] . 2019. The Audio Description Project. Retrieved from https://www.acb.org/adp/ad.html.Google Scholar
- [2] . 2018. Egocentric video description based on temporally-linked sequences. J. Vis. Commun. Image Repres. 50 (2018), 205–216.
DOI: Google ScholarDigital Library - [3] . 2020. Global prevalence of blindness and distance and near vision impairment in 2020: Progress towards the vision 2020 targets and what the future holds. Investig. Ophthalm. Vis. Sci. 61 (2020).Google Scholar
- [4] . 2020. Comparing Human and Automated Approaches to Visual Storytelling. 159–196.
DOI: Google ScholarCross Ref - [5] . 2016. Web prototype for creating descriptions and playing videos with audio description using a speech synthesizer. In Proceedings of the 8th Euro American Conference on Telematics and Information Systems (EATIS’16). 1–7.
DOI: Google ScholarDigital Library - [6] . 2020. CineAD: A system for automated audio description script generation for the visually impaired. Univ. Access Inf. Societ.19 (2020), 99–111.
DOI: Google ScholarCross Ref - [7] . 2009. Accessible videodescription on-demand. In Proceedings of the 11th International ACM SIGACCESS Conference on Computers and Accessibility (Assets’09). ACM, New York, NY, 221–222.
DOI: Google ScholarDigital Library - [8] . 2017. Video captioning via sentence augmentation and spatio-temporal attention. In Computer Vision—ACCV 2016 Workshops, , , and , (Eds.). Springer International Publishing, Cham, 269–286. Google Scholar
- [9] . 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- [10] . 2016. Accessibility in digital cinema: A proposal for generation and distribution of audio description. In Proceedings of the 22nd Brazilian Symposium on Multimedia and the Web (Webmedia’16). ACM, New York, NY, 119–126.
DOI: Google ScholarDigital Library - [11] . 2017. Automated audio captioning with recurrent neural networks. CoRR abs/1706.10006 (2017).Google Scholar
- [12] . 2013. Towards the usage of pauses in audio-described videos. In Proceedings of the 10th International Cross Disciplinary Conference on Web Access (W4A’13). ACM, New York, NY.
DOI: Google ScholarDigital Library - [13] . 2016. Audio description and technologies: Study on the semi-automatisation of the translation and voicing of audio descriptions. Ph.D. Dissertation. Universitat Autònoma de Barcelona, Spain.Google Scholar
- [14] . 2012. Development of audio video describer using narration to visualize movie film for blind and visually impaired children. In Proceedings of the International Conference on Computer and Information Science (ICCIS’12). 1068–1072.
DOI: Google ScholarCross Ref - [15] L. Gagnon, S. Foucher, M. Heritier, et al. 2009. Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss. Univ. Access. Inf. Soc. 8 (2009), 199–218. Google ScholarDigital Library
- [16] . 2010. Un Corpus de Cine. Fundamentos Teoricos de la Audiodescripcion (A Corpus of Cinema. Theoretical Foundations of Audio Description). Universidad de Granada, Proyecto Tracce. 13–56.Google Scholar
- [17] . 2018. Study on automated audio descriptions overlapping live television commentary. In Computers Helping People with Special Needs, and (Eds.). Springer International Publishing, Cham, 220–224. Google Scholar
- [18] . 2018. A bilingual scene-to-speech mobile based application. International Conference on Computer and Applications (ICCA’18), Beirut, 1–240.
DOI: Google ScholarCross Ref - [19] . 2010. Describing online videos with text-to-speech narration. In Proceedings of the International Cross Disciplinary Conference on Web Accessibility (W4A’10). ACM, New York, NY.
DOI: Google ScholarDigital Library - [20] . 2010. Are synthesized video descriptions acceptable? In Proceedings of the 12th International ACM SIGACCESS Conference on Computer Access (ASSETS’10). ACM, New York, NY, 163–170.
DOI: Google ScholarDigital Library - [21] . 2002. The Semi-automatic Generation of Audio Description from Screenplays, Technical Report CS-06-05. Dept. of Computing, University of Surrey.Google Scholar
- [22] . 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision.Google ScholarCross Ref
- [23] . 2017. Hierarchical & multimodal video captioning. Comput. Vis. Image Underst. 163, C (
Oct. 2017), 113–125.DOI: Google ScholarDigital Library - [24] . 2018. Audio description in the UK: What works, what doesn’t, and understanding the need for personalising access. Brit. J. Vis. Impair. 36 (
08 2018).DOI: Google ScholarCross Ref - [25] . 2011. REST API Design Rulebook. O’Reilly Media, Sebastopol.Google Scholar
- [26] Iwona Mazur. 2020. Audio description: Concepts, theories and research approaches. In Ł. Bogucki and M. Deckert (Eds.). The Palgrave Handbook of Audiovisual Translation and Media Accessibility, Palgrave Studies in Translating and Interpreting, Palgrave Macmillan, Cham. Google ScholarCross Ref
- [27] . 2020. A functional approach to audio description. J. Audiovis. Transl. 3, 2 (
Dec. 2020), 226–245.DOI: Google ScholarCross Ref - [28] . 2020. Temporal sub-sampling of audio feature sequences for automated audio captioning. arXiv preprint arXiv:2007.02676.Google Scholar
- [29] . 2011. Audiodescricao como Tecnologia Assistiva para o Acesso ao Conhecimento por Pessoas Cegas. (Audio Description as Assistive Technology for Access to Knowledge for the Blind). Pandion, Florianopolis, 191–232.Google Scholar
- [30] . 2016. Inclusive approaches for audiovisual translation production in interactive television (iTV). In Proceedings of the 7th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion (DSAI’16). ACM, New York, NY, 146–153.
DOI: Google ScholarDigital Library - [31] . 2017. Automatic video descriptor for human action recognition. In Proceedings of the National Information Technology Conference (NITC’17). 61–67.
DOI: Google ScholarCross Ref - [32] . 2016. YOLO9000: Better, faster, stronger. CoRR abs/1612.08242 (2016).Google Scholar
- [33] . 2016. Audio description of videos for people with visual disabilities. In Universal Access in Human-Computer Interaction. Users and Context Diversity, and (Eds.). Springer International Publishing, Cham, 505–515. Google Scholar
- [34] . 2011. Text-to-speech audio description: Towards wider availability of AD. J. Spec. Transl. 15 (2011), 142–162.Google Scholar
- [35] . 2014. Going deeper with convolutions. CoRR abs/1409.4842 (2014).Google Scholar
- [36] Asociación Española de Normalización. UNE-153020. 2005. Audiodescripción para Personas con Discapacidad Visual. Requisitos para la audiodescripción y elaboración de audioguías (Audio description for visually impaired people. Guidelines for audio description procedures and for the preparation of audio guides). Technical Report. AENOR. Available in: www.une.org/encuentra-tu-norma/busca-tu-norma/norma?c=N0032787.Google Scholar
- [37] . 2019. Blindness and Vision Impairment. Retrieved from http://www.who.int/news-room/fact-sheets/detailblindness-and-visual-impairment.Google Scholar
- [38] . 2019. Semantic-filtered soft-split-aware video captioning with audio-augmented feature. Neurocomputing 357 (2019), 24–35.
DOI: Google ScholarDigital Library - [39] . 2016. First-feed LSTM model for video description. J. China Univ. Posts Telecommun. 23, 3 (2016), 89–93.
DOI: Google ScholarCross Ref
Index Terms
- Machine Generation of Audio Description for Blind and Visually Impaired People
Recommendations
What Makes Videos Accessible to Blind and Visually Impaired People?
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsUser-generated videos are an increasingly important source of information online, yet most online videos are inaccessible to blind and visually impaired (BVI) people. To find videos that are accessible, or understandable without additional description ...
Toward Automatic Audio Description Generation for Accessible Videos
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsVideo accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating high-quality audio descriptions requires a lot of ...
Interactive audio-tactile maps for visually impaired people
Visually impaired people face important challenges related to orientation and mobility. Indeed, 56% of visually impaired people in France declared having problems concerning autonomous mobility [10]. These problems often mean that visually impaired ...
Comments