poster

Towards a Rich Format for Closed-Captioning

Authors:

Mark Cartwright,

Sooyeon LeeAuthors Info & Claims

ASSETS '24: Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility

Article No.: 126, Pages 1 - 5

https://doi.org/10.1145/3663548.3688504

Published: 27 October 2024 Publication History

Abstract

Closed-captioning is an essential part of viewing audio-visual content for many people, including those who are D/deaf and Hard-of-Hearing. Traditional closed-captioning systems generally consist of a single track of timed text that offers limited options for personalization. Research into extending the capabilities of captioning, such as affective, poetic, and customizable captions has shown a desire among a subset of users for these features, but only in specific contexts. However, due to the difficulty in creating custom stimuli videos utilizing the custom captioning system, comparisons between systems and longitudinal studies have not been pursued. This demo paper introduces Rich Captions, a structured system that allows for a single closed-caption file to be tagged with additional information that can then be flexibly leveraged to render different customizable, creative, and poetic captions from the same file. Additionally, we introduce the Rich Caption Editor 1, a free, open-source software system designed to author, edit, and render rich captions. The system design was informed by a formative design workshop with closed-captioning researchers and advocates. The current design allows researchers to generate reproducible stimuli for closed-captioning studies. Once the design space and user preferences are better understood, the rich captioning framework could be refined to serve a general audience.

References

[1]

Oliver Alonzo, Hijung Valentina Shin, and Dingzeyu Li. 2022. Beyond Subtitles: Captioning and Visualizing Non-speech Sounds to Improve Accessibility of User-Generated Videos. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility. 1–12.

Digital Library

[2]

Akhter Al Amin, Abraham Glasser, Raja Kushalnagar, Christian Vogler, and Matt Huenerfauth. 2021. Preferences of deaf or hard of hearing users for live-TV caption appearance. In Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments: 15th International Conference, UAHCI 2021, Held as Part of the 23rd HCI International Conference, HCII 2021, Virtual Event, July 24–29, 2021, Proceedings, Part II. Springer, 189–201.

Digital Library

[3]

Ben Caldwell, Michael Cooper, Loretta Guarino Reid, Gregg Vanderheiden, Wendy Chisholm, John Slatin, and Jason White. 2008. Web content accessibility guidelines (WCAG) 2.0. WWW Consortium (W3C) 290 (2008), 1–34.

[4]

Caluã de Lacerda Pataca, Saad Hassan, Nathan Tinker, Roshan Lalintha Peiris, and Matt Huenerfauth. 2024. Caption Royale: Exploring the Design Space of Affective Captions from the Perspective of Deaf and Hard-of-Hearing Individuals. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–17.

Digital Library

[5]

Caluã de Lacerda Pataca, Matthew Watkins, Roshan Peiris, Sooyeon Lee, and Matt Huenerfauth. 2023. Visualization of Speech Prosody and Emotion in Captions: Accessibility for Deaf and Hard-of-Hearing Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–15.

Digital Library

[6]

Caluã de Lacerda Pataca, Matthew Watkins, Roshan Peiris, Sooyeon Lee, and Matt Huenerfauth. 2023. Visualization of Speech Prosody and Emotion in Captions: Accessibility For Deaf And Hard-of-Hearing Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 831, 15 pages. https://doi.org/10.1145/3544548.3581511

Digital Library

[7]

Jordan Aiko Deja, Alexczar Dela Torre, Hans Joshua Lee, Jose Florencio Ciriaco IV, and Carlo Miguel Eroles. 2020. Vitune: A visualizer tool to allow the deaf and hard of hearing to see music with their eyes. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1–8.

Digital Library

[8]

Gregory J Downey. 2008. Closed captioning: Subtitling, stenography, and the digital convergence of text with television. JHU Press.

[9]

Benjamin M Gorman, Michael Crabb, and Michael Armstrong. 2021. Adaptive Subtitles: Preferences and Trade-Offs in Real-Time Media Adaption. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–11.

Digital Library

[10]

Saad Hassan, Yao Ding, Agneya Abhimanyu Kerure, Christi Miller, John Burnett, Emily Biondo, and Brenden Gilbert. 2023. Exploring the Design Space of Automatically Generated Emotive Captions for Deaf or Hard of Hearing Users. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA ’23). Association for Computing Machinery, New York, NY, USA, Article 125, 10 pages. https://doi.org/10.1145/3544549.3585880

Digital Library

[11]

Brett R Jones, Hrvoje Benko, Eyal Ofek, and Andrew D Wilson. 2013. IllumiRoom: peripheral projected illusions for interactive experiences. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 869–878.

Digital Library

[12]

JooYeong Kim, SooYeon Ahn, and Jin-Hyuk Hong. 2023. Visible Nuances: A Caption System to Visualize Paralinguistic Speech Cues for Deaf and Hard-of-Hearing Individuals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 54, 15 pages. https://doi.org/10.1145/3544548.3581130

Digital Library

[13]

Raja S Kushalnagar, Gary W Behm, Joseph S Stanislow, and Vasu Gupta. 2014. Enhancing caption accessibility through simultaneous multimodal information: visual-tactile captions. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility. 185–192.

Digital Library

[14]

Daniel G Lee, Deborah I Fels, and John Patrick Udo. 2007. Emotive captioning. Computers in Entertainment (CIE) 5, 2 (2007), 11.

Digital Library

[15]

Lloyd May, Sarah Miller, Sehuam Bakri, Lorna C Quandt, and Melissa Malzkuhn. 2023. Designing Access in Sound Art Exhibitions: Centering Deaf Experiences in Musical Thinking. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA ’23). Association for Computing Machinery, New York, NY, USA, Article 380, 8 pages. https://doi.org/10.1145/3544549.3573872

Digital Library

[16]

Lloyd May, Keita Ohshiro, Khang Dang, Sripathi Sridhar, Jhanvi Pai, Magdalena Fuentes, Sooyeon Lee, and Mark Cartwright. 2024. Unspoken Sound: Identifying Trends in Non-Speech Audio Captioning on YouTube. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–19.

Digital Library

[17]

Lloyd May, So Yeon Park, and Jonathan Berger. 2023. Enhancing Non-Speech Information Communicated in Closed Captioning Through Critical Design. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (New York, NY, USA) (ASSETS ’23). Association for Computing Machinery, New York, NY, USA, Article 16, 14 pages. https://doi.org/10.1145/3597638.3608398

Digital Library

[18]

Emma J McDonnell, Tessa Eagle, Pitch Sinlapanuntakul, Soo Hyun Moon, Kathryn E Ringland, Jon E Froehlich, and Leah Findlater. 2024. “Caption It in an Accessible Way That Is Also Enjoyable”: Characterizing User-Driven Captioning Practices on TikTok. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–16.

Digital Library

[19]

Suranga Chandima Nanayakkara, Lonce Wyse, Sim Heng Ong, and Elizabeth A Taylor. 2013. Enhancing musical experience for the hearing-impaired using visual and haptic displays. Human–Computer Interaction 28, 2 (2013), 115–160.

[20]

Christine Sun Kim. 2020. Artist Christine Sun Kim rewrites closed captions | pop-up Magazine. https://www.youtube.com/watch?v=tfe479qL8hg

[21]

Niels Van Tomme and Christine Sun Kim. 2021. Activating Captions, ARGOS. https://www.argosarts.org/activatingcaptions/info

[22]

Fangzhou Wang, Hidehisa Nagano, Kunio Kashino, and Takeo Igarashi. 2016. Visualizing video sounds with sound word animation to enrich user experience. IEEE Transactions on Multimedia 19, 2 (2016), 418–429.

Digital Library

[23]

Yiwen Wang, Ziming Li, Pratheep Kumar Chelladurai, Wendy Dannels, Tae Oh, and Roshan L Peiris. 2023. Haptic-Captioning: Using Audio-Haptic Interfaces to Enhance Speaker Indication in Real-Time Captions for Deaf and Hard-of-Hearing Viewers. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14.

Digital Library

[24]

Sean Zdenek. 2015. Reading sounds. In Reading Sounds. University of Chicago Press.

Index Terms

Towards a Rich Format for Closed-Captioning
1. Human-centered computing
  1. Accessibility
    1. Accessibility systems and tools
    2. Accessibility technologies

Recommendations

Unspoken Sound: Identifying Trends in Non-Speech Audio Captioning on YouTube
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

High-quality closed captioning of both speech and non-speech elements (e.g., music, sound effects, manner of speaking, and speaker identification) is essential for the accessibility of video content, especially for d/Deaf and hard-of-hearing individuals. ...
Using placement and name for speaker identification in captioning
ICCHP'10: Proceedings of the 12th international conference on Computers helping people with special needs: Part I

The current method for speaker identification in closed captioning on television is ineffective and difficult in situations with multiple speakers, offscreen speakers, or narration. An enhanced captioning system that uses graphical elements (e.g., ...
N-gram Language Model Based on Multi-Word Expressions in Web Documents for Speech Recognition and Closed-Captioning
IALP '12: Proceedings of the 2012 International Conference on Asian Language Processing

Automatic speech recognition technique is generally used to align the closed caption text to video data. It is important to increase the speech recognition accuracy for the accurate closed-captioning. This paper proposes the method for constructing N-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASSETS '24: Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility

October 2024

1475 pages

ISBN:9798400706776

DOI:10.1145/3663548

Editors:
David Flatla
University of Guelph, CANADA
,
Faustina Hwang
University of Reading, UNITED KINGDOM
,
Tiago Guerreiro
University of Lisbon, PORTUGAL
,
Robin Brewer
University of Michigan, UNITED STATES

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGACCESS: ACM Special Interest Group on Accessible Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2024

Check for updates

Author Tags

Qualifiers

Poster
Research
Refereed limited

Conference

ASSETS '24

Sponsor:

SIGACCESS

ASSETS '24: The 26th International ACM SIGACCESS Conference on Computers and Accessibility

October 27 - 30, 2024

NL, St. John's, Canada

Acceptance Rates

Overall Acceptance Rate 436 of 1,556 submissions, 28%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
86
Total Downloads

Downloads (Last 12 months)86
Downloads (Last 6 weeks)17

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten