skip to main content
10.1145/3529190.3534770acmotherconferencesArticle/Chapter ViewAbstractPublication PagespetraConference Proceedingsconference-collections
research-article

Towards a DHH Accessible Theater: Real-Time Synchronization of Subtitles and Sign Language Videos with ASR and NLP Solutions

Published: 11 July 2022 Publication History

Abstract

Despite the fact that in some areas of cultural life, as in the case of certain online video platforms or TV programs, notable progress has been made to provide content accessible to Deaf and Hard of Hearing people (DHH), the same cannot be said for live theater performances. In this work, a system called NLP-Theatre is presented, with the emphasis given to its technical aspects relevant to the problem of DHH inclusivity. The leading concept behind this system’s functioning is the automatic real-time extraction of the actors’ speech transcripts and their matching to a pre-arranged set of subtitles that correspond to the theater script at hand. The matching accuracy scores indicate the correct timings for the display of each subtitle to the audience during a live performance and the concurrent activation of several other events that render possible the synchronization of multiple streams of information directed towards the spectators. One such stream is also the video of sign language interpretation that is also synchronized in real-time with the live performance and mainly targets DHH individuals. A subjective evaluation of the subtitles and sign language streams is presented based on questionnaire responses provided by the guests in an experimental theater show. Furthermore, purely technical experiments were run, centered around the performance assessment of the Speech-To-Text (STT) alternatives that are employed in the system, that is both a commercial general-purpose remote solution and a custom ASR service. The latter employs a script-specific TDNN-HMM model that runs locally and is created using a Language Model (LM) adaptation strategy.

References

[1]
Carlo Aliprandi, Cristina Scudellari, Isabella Gallucci, Nicola Piccinini, Matteo Raffaelli, Arantza del Pozo, Aitor Álvarez, Haritz Arzelus, Renato Cassaca, and Tiago Luis. 2014. Automatic Live Subtitling: state of the art, expectations and current trends. In Proceedings of NAB Broadcast Engineering Conference: Papers on Advanced Media Technologies, Las Vegas, Vol. 13.
[2]
Mike Armstrong. 2017. Automatic recovery and verification of subtitles for large collections of video clips. SMPTE Motion Imaging Journal 126, 8 (2017), 1–7.
[3]
Larwan Berke, Christopher Caulfield, and Matt Huenerfauth. 2017. Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, Baltimore Maryland USA, 155–164. https://doi.org/10.1145/3132525.3132541
[4]
Larwan Berke, Sushant Kafle, and Matt Huenerfauth. 2018. Methods for evaluation of imperfect captioning tools by deaf or hard-of-hearing users at different reading literacy levels. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–12.
[5]
Zdenek Bumbalek, Jan Zelenka, and Lukas Kencl. 2010. E-Scribe: Ubiquitous Real-Time Speech Transcription for the Hearing-Impaired. In Computers Helping People with Special Needs, David Hutchison, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz, C. Pandu Rangan, Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos, Doug Tygar, Moshe Y. Vardi, Gerhard Weikum, Klaus Miesenberger, Joachim Klaus, Wolfgang Zagler, and Arthur Karshmer (Eds.). Vol. 6180. Springer Berlin Heidelberg, Berlin, Heidelberg, 160–168. https://doi.org/10.1007/978-3-642-14100-3_25 Series Title: Lecture Notes in Computer Science.
[6]
Hao-Jan Howard Chen. 2011. Developing and evaluating SynctoLearn, a fully automatic video and transcript synchronization tool for EFL learners. Computer Assisted Language Learning 24, 2 (2011), 117–130.
[7]
Mercedes de Castro, Diego Carrero, Luis Puente, and Belen Ruiz. 2011. Real-time subtitle synchronization in live television programs. In 2011 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE, 1–6.
[8]
Vassilios Digalakis, Dimitris Oikonomidis, Dimitris Pratsolis, Nikos Tsourakis, Christos Vosnidis, Nikos Chatzichrisafis, and Vassilios Diakoloukas. 2003. Large vocabulary continuous speech recognition in greek: corpus and an automatic dictation system. In INTERSPEECH.
[9]
Solène Evain, Benjamin Lecouteux, François Portet, Isabelle Esteve, and Marion Fabre. 2020. Towards Automatic Captioning of University Lectures for French students who are Deaf. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility. ACM, Virtual Event Greece, 1–4. https://doi.org/10.1145/3373625.3418010
[10]
Yashesh Gaur, Walter S. Lasecki, Florian Metze, and Jeffrey P. Bigham. 2016. The effects of automatic speech recognition quality on human transcription latency. In Proceedings of the 13th International Web for All Conference. ACM, Montreal Canada, 1–8. https://doi.org/10.1145/2899475.2899478
[11]
Israel González-Carrasco, Luis Puente, Belén Ruiz-Mezcua, and José Luis López-Cuadrado. 2019. Sub-sync: Automatic synchronization of subtitles in the broadcasting of true live programs in spanish. IEEE Access 7(2019), 60968–60983.
[12]
David Griol Barres, Zoraida Callejas Carrión, Ramón López-Cózar Delgado, and Lawrence Tomei (Eds.). 2013. Technologies for Inclusive Education: Beyond Traditional Integration Approaches. IGI Global. https://doi.org/10.4018/978-1-4666-2530-3
[13]
Nick Hatzigeorgiu, Maria Gavrilidou, Stelios Piperidis, George Carayannis, Anastasia Papakostopoulou, Athanassia Spiliotopoulou, Anna Vacalopoulou, Penny Labropoulou, Elena Mantzari, Harris Papageorgiou, 2000. Design and Implementation of the Online ILSP Greek Corpus. In LREC.
[14]
Timothy J Hazen. 2006. Automatic alignment and error correction of human generated transcripts for long speech recordings. In Ninth International Conference on Spoken Language Processing.
[15]
Dhruv Jain, Bonnie Chinh, Leah Findlater, Raja Kushalnagar, and Jon Froehlich. 2018. Exploring Augmented Reality Approaches to Real-Time Captioning: A Preliminary Autoethnographic Study. In Proceedings of the 2018 ACM Conference Companion Publication on Designing Interactive Systems. ACM, Hong Kong China, 7–11. https://doi.org/10.1145/3197391.3205404
[16]
Sushant Kafle and Matt Huenerfauth. 2016. Effect of Speech Recognition Errors on Text Understandability for People who are Deaf or Hard of Hearing. In 7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2016). ISCA, 20–25. https://doi.org/10.21437/SLPAT.2016-4
[17]
Alkiviadis Katsalis, Konstantinos Christantonis, Charalampos Tsioustas, Pantelis Kaplanoglou, Maximos Kaliakatsos-Papakostas, Athanasios Katsamanis, Konstantinos Diamantaras, Vassilis Katsouros, Evita Fotinea, Depy Panga, and Dimitra Loupi. 2022. NLP-Theatre: Employing Speech Recognition Technologies for Improving Accessibility and Augmenting the Theatrical Experience. In Proceedings of SAI Intelligent Systems Conference. Springer, (to appear).
[18]
Franklin Mingzhe Li, Cheng Lu, Zhicong Lu, Patrick Carrington, and Khai N Truong. 2022. An Exploration of Captioning Practices and Challenges of Individual Content Creators on YouTube for People with Hearing Impairments. arXiv preprint arXiv:2201.11226(2022).
[19]
Antonia Mele Scorcia. 2018. Surtitling and the audience: A love-hate relationship. JOURNAL OF SPECIALISED TRANSLATION30 (2018), 181–202.
[20]
Pam Millett. 2021. Accuracy of Speech-to-Text Captioning for Students Who are Deaf or Hard of Hearing.Journal of Educational, Pediatric & (Re) Habilitative Audiology 25 (2021).
[21]
Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, and Petr Schwarz. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society. Issue: CONF.
[22]
Aleš Pražák, Zdeněk Loos, Jan Trmal, Josef V. Psutka, and Josef Psutka. 2012. Novel approach to live captioning through re-speaking: tailoring speech recognition to re-speaker’s needs. In Interspeech 2012. ISCA, 1372–1375. https://doi.org/10.21437/Interspeech.2012-395
[23]
Aleš Pražák, Zdeněk Loose, Josef V. Psutka, Vlasta Radová, and Josef Psutka. 2020. Live TV subtitling through respeaking with remote cutting-edge technology. Multimedia Tools and Applications 79, 1 (Jan. 2020), 1203–1220. https://doi.org/10.1007/s11042-019-08235-3
[24]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683(2019).
[25]
Matthias Sperber, Graham Neubig, Christian Fügen, Satoshi Nakamura, and Alex Waibel. 2013. Efficient speech transcription through respeaking. In Interspeech. 1087–1091.
[26]
Mauro Teófilo, Alvaro Lourenço, Juliana Postal, and Vicente F. Lucena. 2018. Exploring Virtual Reality to Enable Deaf or Hard of Hearing Accessibility in Live Theaters: A Case Study. In Universal Access in Human-Computer Interaction. Virtual, Augmented, and Intelligent Environments, Margherita Antona and Constantine Stephanidis (Eds.). Springer International Publishing, Cham, 132–148. https://doi.org/10.1007/978-3-319-92052-8_11
[27]
Jörg Tiedemann. 2008. Synchronizing Translated Movie Subtitles. In LREC.
[28]
Norman E Youngblood and Lakshmi N Tirumala. 2020. Local television news station compliance with online captioning rules. Universal Access in the Information Society(2020), 1–9.

Cited By

View all
  • (2024)Enhancing Education for Deaf People: A Systematic Review of NLP Strategies for Automatic Translation From Portuguese to Brazilian Sign Language2024 IEEE Frontiers in Education Conference (FIE)10.1109/FIE61694.2024.10892982(1-8)Online publication date: 13-Oct-2024
  • (2024)A method for real-time translation of online video subtitles in sports eventsSignal, Image and Video Processing10.1007/s11760-024-03606-219:2Online publication date: 18-Dec-2024
  • (2024)EasyCaption: Investigating the Impact of Prolonged Exposure to Captioning on VR HMD on General PopulationUniversal Access in Human-Computer Interaction10.1007/978-3-031-60881-0_24(382-403)Online publication date: 1-Jun-2024

Index Terms

  1. Towards a DHH Accessible Theater: Real-Time Synchronization of Subtitles and Sign Language Videos with ASR and NLP Solutions
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          PETRA '22: Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments
          June 2022
          704 pages
          ISBN:9781450396318
          DOI:10.1145/3529190
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 11 July 2022

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. LM adaptation
          2. STT model
          3. sign language video synchronization
          4. subtitles alignment

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Funding Sources

          • European Regional Development Fund (ERDF)

          Conference

          PETRA '22

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)61
          • Downloads (Last 6 weeks)3
          Reflects downloads up to 05 Mar 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Enhancing Education for Deaf People: A Systematic Review of NLP Strategies for Automatic Translation From Portuguese to Brazilian Sign Language2024 IEEE Frontiers in Education Conference (FIE)10.1109/FIE61694.2024.10892982(1-8)Online publication date: 13-Oct-2024
          • (2024)A method for real-time translation of online video subtitles in sports eventsSignal, Image and Video Processing10.1007/s11760-024-03606-219:2Online publication date: 18-Dec-2024
          • (2024)EasyCaption: Investigating the Impact of Prolonged Exposure to Captioning on VR HMD on General PopulationUniversal Access in Human-Computer Interaction10.1007/978-3-031-60881-0_24(382-403)Online publication date: 1-Jun-2024

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media