research-article

Towards a DHH Accessible Theater: Real-Time Synchronization of Subtitles and Sign Language Videos with ASR and NLP Solutions

Authors:

Grigoris Bastas,

Maximos Kaliakatsos-Papakostas,

Georgios Paraskevopoulos,

Pantelis Kaplanoglou,

Konstantinos Christantonis,

Charalampos Tsioustas,

Dimitris Mastrogiannopoulos,

Athanasios Katsamanis,

Vassilis Katsouros,

Konstantinos Diamantaras,

Petros MaragosAuthors Info & Claims

PETRA '22: Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments

Pages 653 - 661

https://doi.org/10.1145/3529190.3534770

Published: 11 July 2022 Publication History

Abstract

Despite the fact that in some areas of cultural life, as in the case of certain online video platforms or TV programs, notable progress has been made to provide content accessible to Deaf and Hard of Hearing people (DHH), the same cannot be said for live theater performances. In this work, a system called NLP-Theatre is presented, with the emphasis given to its technical aspects relevant to the problem of DHH inclusivity. The leading concept behind this system’s functioning is the automatic real-time extraction of the actors’ speech transcripts and their matching to a pre-arranged set of subtitles that correspond to the theater script at hand. The matching accuracy scores indicate the correct timings for the display of each subtitle to the audience during a live performance and the concurrent activation of several other events that render possible the synchronization of multiple streams of information directed towards the spectators. One such stream is also the video of sign language interpretation that is also synchronized in real-time with the live performance and mainly targets DHH individuals. A subjective evaluation of the subtitles and sign language streams is presented based on questionnaire responses provided by the guests in an experimental theater show. Furthermore, purely technical experiments were run, centered around the performance assessment of the Speech-To-Text (STT) alternatives that are employed in the system, that is both a commercial general-purpose remote solution and a custom ASR service. The latter employs a script-specific TDNN-HMM model that runs locally and is created using a Language Model (LM) adaptation strategy.

References

[1]

Carlo Aliprandi, Cristina Scudellari, Isabella Gallucci, Nicola Piccinini, Matteo Raffaelli, Arantza del Pozo, Aitor Álvarez, Haritz Arzelus, Renato Cassaca, and Tiago Luis. 2014. Automatic Live Subtitling: state of the art, expectations and current trends. In Proceedings of NAB Broadcast Engineering Conference: Papers on Advanced Media Technologies, Las Vegas, Vol. 13.

[2]

Mike Armstrong. 2017. Automatic recovery and verification of subtitles for large collections of video clips. SMPTE Motion Imaging Journal 126, 8 (2017), 1–7.

[3]

Larwan Berke, Christopher Caulfield, and Matt Huenerfauth. 2017. Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, Baltimore Maryland USA, 155–164. https://doi.org/10.1145/3132525.3132541

Digital Library

[4]

Larwan Berke, Sushant Kafle, and Matt Huenerfauth. 2018. Methods for evaluation of imperfect captioning tools by deaf or hard-of-hearing users at different reading literacy levels. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–12.

Digital Library

[5]

Zdenek Bumbalek, Jan Zelenka, and Lukas Kencl. 2010. E-Scribe: Ubiquitous Real-Time Speech Transcription for the Hearing-Impaired. In Computers Helping People with Special Needs, David Hutchison, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz, C. Pandu Rangan, Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos, Doug Tygar, Moshe Y. Vardi, Gerhard Weikum, Klaus Miesenberger, Joachim Klaus, Wolfgang Zagler, and Arthur Karshmer (Eds.). Vol. 6180. Springer Berlin Heidelberg, Berlin, Heidelberg, 160–168. https://doi.org/10.1007/978-3-642-14100-3_25 Series Title: Lecture Notes in Computer Science.

[6]

Hao-Jan Howard Chen. 2011. Developing and evaluating SynctoLearn, a fully automatic video and transcript synchronization tool for EFL learners. Computer Assisted Language Learning 24, 2 (2011), 117–130.

[7]

Mercedes de Castro, Diego Carrero, Luis Puente, and Belen Ruiz. 2011. Real-time subtitle synchronization in live television programs. In 2011 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE, 1–6.

[8]

Vassilios Digalakis, Dimitris Oikonomidis, Dimitris Pratsolis, Nikos Tsourakis, Christos Vosnidis, Nikos Chatzichrisafis, and Vassilios Diakoloukas. 2003. Large vocabulary continuous speech recognition in greek: corpus and an automatic dictation system. In INTERSPEECH.

[9]

Solène Evain, Benjamin Lecouteux, François Portet, Isabelle Esteve, and Marion Fabre. 2020. Towards Automatic Captioning of University Lectures for French students who are Deaf. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility. ACM, Virtual Event Greece, 1–4. https://doi.org/10.1145/3373625.3418010

Digital Library

[10]

Yashesh Gaur, Walter S. Lasecki, Florian Metze, and Jeffrey P. Bigham. 2016. The effects of automatic speech recognition quality on human transcription latency. In Proceedings of the 13th International Web for All Conference. ACM, Montreal Canada, 1–8. https://doi.org/10.1145/2899475.2899478

Digital Library

[11]

Israel González-Carrasco, Luis Puente, Belén Ruiz-Mezcua, and José Luis López-Cuadrado. 2019. Sub-sync: Automatic synchronization of subtitles in the broadcasting of true live programs in spanish. IEEE Access 7(2019), 60968–60983.

[12]

David Griol Barres, Zoraida Callejas Carrión, Ramón López-Cózar Delgado, and Lawrence Tomei (Eds.). 2013. Technologies for Inclusive Education: Beyond Traditional Integration Approaches. IGI Global. https://doi.org/10.4018/978-1-4666-2530-3

[13]

Nick Hatzigeorgiu, Maria Gavrilidou, Stelios Piperidis, George Carayannis, Anastasia Papakostopoulou, Athanassia Spiliotopoulou, Anna Vacalopoulou, Penny Labropoulou, Elena Mantzari, Harris Papageorgiou, 2000. Design and Implementation of the Online ILSP Greek Corpus. In LREC.

[14]

Timothy J Hazen. 2006. Automatic alignment and error correction of human generated transcripts for long speech recordings. In Ninth International Conference on Spoken Language Processing.

[15]

Dhruv Jain, Bonnie Chinh, Leah Findlater, Raja Kushalnagar, and Jon Froehlich. 2018. Exploring Augmented Reality Approaches to Real-Time Captioning: A Preliminary Autoethnographic Study. In Proceedings of the 2018 ACM Conference Companion Publication on Designing Interactive Systems. ACM, Hong Kong China, 7–11. https://doi.org/10.1145/3197391.3205404

Digital Library

[16]

Sushant Kafle and Matt Huenerfauth. 2016. Effect of Speech Recognition Errors on Text Understandability for People who are Deaf or Hard of Hearing. In 7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2016). ISCA, 20–25. https://doi.org/10.21437/SLPAT.2016-4

[17]

Alkiviadis Katsalis, Konstantinos Christantonis, Charalampos Tsioustas, Pantelis Kaplanoglou, Maximos Kaliakatsos-Papakostas, Athanasios Katsamanis, Konstantinos Diamantaras, Vassilis Katsouros, Evita Fotinea, Depy Panga, and Dimitra Loupi. 2022. NLP-Theatre: Employing Speech Recognition Technologies for Improving Accessibility and Augmenting the Theatrical Experience. In Proceedings of SAI Intelligent Systems Conference. Springer, (to appear).

[18]

Franklin Mingzhe Li, Cheng Lu, Zhicong Lu, Patrick Carrington, and Khai N Truong. 2022. An Exploration of Captioning Practices and Challenges of Individual Content Creators on YouTube for People with Hearing Impairments. arXiv preprint arXiv:2201.11226(2022).

[19]

Antonia Mele Scorcia. 2018. Surtitling and the audience: A love-hate relationship. JOURNAL OF SPECIALISED TRANSLATION30 (2018), 181–202.

[20]

Pam Millett. 2021. Accuracy of Speech-to-Text Captioning for Students Who are Deaf or Hard of Hearing.Journal of Educational, Pediatric & (Re) Habilitative Audiology 25 (2021).

[21]

Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, and Petr Schwarz. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society. Issue: CONF.

[22]

Aleš Pražák, Zdeněk Loos, Jan Trmal, Josef V. Psutka, and Josef Psutka. 2012. Novel approach to live captioning through re-speaking: tailoring speech recognition to re-speaker’s needs. In Interspeech 2012. ISCA, 1372–1375. https://doi.org/10.21437/Interspeech.2012-395

[23]

Aleš Pražák, Zdeněk Loose, Josef V. Psutka, Vlasta Radová, and Josef Psutka. 2020. Live TV subtitling through respeaking with remote cutting-edge technology. Multimedia Tools and Applications 79, 1 (Jan. 2020), 1203–1220. https://doi.org/10.1007/s11042-019-08235-3

Digital Library

[24]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683(2019).

[25]

Matthias Sperber, Graham Neubig, Christian Fügen, Satoshi Nakamura, and Alex Waibel. 2013. Efficient speech transcription through respeaking. In Interspeech. 1087–1091.

[26]

Mauro Teófilo, Alvaro Lourenço, Juliana Postal, and Vicente F. Lucena. 2018. Exploring Virtual Reality to Enable Deaf or Hard of Hearing Accessibility in Live Theaters: A Case Study. In Universal Access in Human-Computer Interaction. Virtual, Augmented, and Intelligent Environments, Margherita Antona and Constantine Stephanidis (Eds.). Springer International Publishing, Cham, 132–148. https://doi.org/10.1007/978-3-319-92052-8_11

Digital Library

[27]

Jörg Tiedemann. 2008. Synchronizing Translated Movie Subtitles. In LREC.

[28]

Norman E Youngblood and Lakshmi N Tirumala. 2020. Local television news station compliance with online captioning rules. Universal Access in the Information Society(2020), 1–9.

Cited By

Brongar APinho JSimas GBarwaldt R(2024)Enhancing Education for Deaf People: A Systematic Review of NLP Strategies for Automatic Translation From Portuguese to Brazilian Sign Language2024 IEEE Frontiers in Education Conference (FIE)10.1109/FIE61694.2024.10892982(1-8)Online publication date: 13-Oct-2024
https://doi.org/10.1109/FIE61694.2024.10892982
Zhiliang ZLei WQiang L(2024)A method for real-time translation of online video subtitles in sports eventsSignal, Image and Video Processing10.1007/s11760-024-03606-219:2Online publication date: 18-Dec-2024
https://doi.org/10.1007/s11760-024-03606-2
Ubur SEtori NGhasemi SKing KGračanin DGini M(2024)EasyCaption: Investigating the Impact of Prolonged Exposure to Captioning on VR HMD on General PopulationUniversal Access in Human-Computer Interaction10.1007/978-3-031-60881-0_24(382-403)Online publication date: 1-Jun-2024
https://doi.org/10.1007/978-3-031-60881-0_24

Index Terms

Towards a DHH Accessible Theater: Real-Time Synchronization of Subtitles and Sign Language Videos with ASR and NLP Solutions

Index terms have been assigned to the content through auto-classification.

Recommendations

Dynamic Subtitles: The User Experience
TVX '15: Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video

Subtitles (closed captions) on television are typically placed at the bottom-centre of the screen. However, placing subtitles in varying positions, according to the underlying video content (`dynamic subtitles'), has the potential to make the overall ...
Smart subtitles for language learning
CHI EA '13: CHI '13 Extended Abstracts on Human Factors in Computing Systems

Language learners often use subtitled videos to help them learn the language. However, standard subtitles are suboptimal for vocabulary acquisition, as translations are nonliteral and made at the phrase level, making it hard to find connections between ...
Thailand on the Way towards Accessible TV
i-CREATe 2018: Proceedings of the 12th International Convention on Rehabilitation Engineering and Assistive Technology

Thai television is in a transition period and digital television has the potential to offer more types of services. This paper explores television service and its accessibility for persons with disabilities in Thailand, the voice and requirement of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

PETRA '22: Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments

June 2022

704 pages

ISBN:9781450396318

DOI:10.1145/3529190

Conference Chair:
Fillia Makedon

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

European Regional Development Fund (ERDF)

Conference

PETRA '22

PETRA '22: The15th International Conference on PErvasive Technologies Related to Assistive Environments

June 29 - July 1, 2022

Corfu, Greece

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
181
Total Downloads

Downloads (Last 12 months)61
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Brongar APinho JSimas GBarwaldt R(2024)Enhancing Education for Deaf People: A Systematic Review of NLP Strategies for Automatic Translation From Portuguese to Brazilian Sign Language2024 IEEE Frontiers in Education Conference (FIE)10.1109/FIE61694.2024.10892982(1-8)Online publication date: 13-Oct-2024
https://doi.org/10.1109/FIE61694.2024.10892982
Zhiliang ZLei WQiang L(2024)A method for real-time translation of online video subtitles in sports eventsSignal, Image and Video Processing10.1007/s11760-024-03606-219:2Online publication date: 18-Dec-2024
https://doi.org/10.1007/s11760-024-03606-2
Ubur SEtori NGhasemi SKing KGračanin DGini M(2024)EasyCaption: Investigating the Impact of Prolonged Exposure to Captioning on VR HMD on General PopulationUniversal Access in Human-Computer Interaction10.1007/978-3-031-60881-0_24(382-403)Online publication date: 1-Jun-2024
https://doi.org/10.1007/978-3-031-60881-0_24

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten