Developing a closed captioning quality assessment system using a multi-label classifier with active learning from deaf and hard of hearing viewers

Nam, Somang; Fels, Deborah; Chignell, Mark

doi:10.1007/s10489-023-04677-3

Developing a closed captioning quality assessment system using a multi-label classifier with active learning from deaf and hard of hearing viewers

Published: 04 July 2023

Volume 53, pages 22882–22897, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

170 Accesses
Explore all metrics

Abstract

Closed Captioning (CC) is a telecommunications service to display textual information equivalent to audio. Although the primary consumer group is Deaf (D) and Hard of Hearing (HOH) viewers, they are typically excluded from the quality assessment process. Including D and HOH viewers for all assessments is nearly impossible and requires enormous effort. One solution is to use machine learning algorithms to replicate human subjective evaluation of the quality of CC. In this paper, a multi-label classifier was trained using an active learning algorithm with D and HOH viewers, predicting human subjective ratings on the various quality of CC. An online user study was conducted to train the multilayer perception with D and HoH participant subjective quality assessment rating data on machine-queried CC encoded to short video clips in various genres. The result revealed the possibilities of using an automated assessment system to predict the human perceived quality of CC, human preferences and perceiving behaviour upon viewing CC with deliberate error inclusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Exploring ChatGPT and its impact on society

Article 21 February 2024

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Article 14 June 2021

Availability of data and materials

The data that support the findings of this study are available on request from the corresponding author SN. The data are not publicly available due to them containing information that could compromise research participant privacy/consent.

Code Availability

Not applicable

References

of the Deaf CA (2018) Final project report: Understanding user responses to live closed captioning in canada. https://www.osti.gov/servlets/purl/1000008/
Zdenek S (2015) Reading Sounds, 1st edn. University of Chicago Press. https://www.bibliovault.org/BV.landing.epl?ISBN=9780226312811
Australia MA (2016) Caption quality: International approaches to standards and measurement
CRTC (2012) Broadcasting regulatory policy crtc 2012-362 quality standards for english-language closed captioning. https://crtc.gc.ca/eng/archive/2012/2012-362.htm
Romero-Fresco P, Pérez JM (2015) Accuracy rate in live subtitling: The ner model. Audiovisual Translation in a Global Context 28–50. https://doi.org/10.1057/9781137552891_3
Apone T, Botkin B, Brooks M, Goldberg L (2011) Caption accuracy metrics project research into automated error ranking of real-time captions in live television news programs. Tech. Rep, The WGBH National Center for Accessible Media
Google Scholar
Varela M, Technical VTT (2012) Toward total quality of experience: A qoe model in a communication ecosystem. IEEE Communications Magazine 50:28–36. https://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp= &arnumber=6178831 &contentType=Journals+ &+Magazines &sortType=asc_p_Sequence &filter=AND(p_IS_Number:6178822)https://doi.org/10.1109/MCOM.2012.6178831
Nam S, Fels DI, Chignell MH (2020) Modeling closed captioning subjective quality assessment by deaf and hard of hearing viewers. IEEE Transactions on Computational Social Systems 7:621–631. https://doi.org/10.1109/TCSS.2020.2972399
Article Google Scholar
Radio-television C, of Canada TC-G (2016) English-language closed captioning quality standard related to the accuracy rate for live programming. https://www.crtc.gc.ca/eng/archive/2016/2016-435.pdf
Downey GJ (2008) Closed Captioning: Subtitling, Stenography, and the Digital Convergence of Text with Television. The Johns Hopkins University Press
Book Google Scholar
Romero-Fresco P (2020) Subtitling through speech recognition: Respeaking. Routledge
Commission FC (2017) Consumer guide. https://www.fcc.gov/consumers/guides/closed-captioning-television
Union ITE (2008) 800: Definitions of terms related to quality of service. ITUT Recommendation 1–30. https://www.itu.int/rec/dologin_pub.asp?lang=e &id=T-REC-E.800-200809-I!!PDF-E &type=items
Raake A, Egger S (2014) Quality and Quality of Experience, 1st edn, Springer. https://link.springer.com/10.1007/978-3-319-02681-7
Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10
Apone T, Brooks M, O’Connell T (2010) Caption accuracy metrics project. Error Ranking of Real-time Captions in Live Television News Programs. Boston, Caption Viewer Survey
Google Scholar
Romero-Fresco P (2016) Accessing communication: The quality of live subtitles in the uk. Language and Communication. https://doi.org/10.1016/j.langcom.2016.06.001
Article Google Scholar
Radio-television C, of Canada TC-G (2018) Canadian ner evaluation guidelines. https://crtc.gc.ca/eng/archive/2019/2019-9.htm
Romero-Fresco P (2009) More haste less speed: Edited versus verbatim respoken subtitles. Vigo Int J Appl Linguist 6:109–133
Google Scholar
Szarkowska A, Krejtz I, Klyszejko Z, Wieczorek A (2011) Verbatim, standard, or edited? reading patterns of different captioning styles among deaf, hard of hearing, and hearing viewers. American annals of the deaf 156(4):363–378
Article Google Scholar
IEEE. Impact of technical and content quality on overall experience of OTT video
Postman L, Bruner JS, Walk RD (1951) The perception of error. British Journal of Psychology. General Section 42:1–10. https://doi.org/10.1111/j.2044-8295.1951.tb00277.x
Article Google Scholar
Sekuler R, Blake R (1985) Perception, 5th edn. McGraw Hill
Google Scholar
Reiter U et al (2014) Factors influencing quality of experience. T-Labs Series in Telecommunication Services. https://doi.org/10.1007/978-3-319-02681-7_4
Article Google Scholar
Varela M, Skorin-Kapov L, Ebrahimi T (2014) Quality of service versus quality of experience. T-Labs Series in Telecommunication Services. https://doi.org/10.1007/978-3-319-02681-7_6
Article Google Scholar
Berry LL, Parasuraman A, Zeithaml VA (1990) Delivering quality service: Balancing customer perceptions and expectations. Business 260. https://doi.org/10.1177/0001699303046002008
Reeves C, Bednar D (1994) Alternatives defining quality and implications. The Acad Manag Rev 19(3):419–445
Article Google Scholar
Parasuraman A, Zeithaml VA, Berry LL (1988) Servqual: A multipleitem scale for measuring consumer perc. J Retail 64(12)
Santos J (2003) E-service quality: a model of virtual service quality dimensions. Managing Service Quality: An International Journal 13:233–246. https://doi.org/10.1108/09604520310476490
Article Google Scholar
Sharma G, Lijuan W (2015) The effects of online service quality of ecommerce websites on user satisfaction. Electronic Library 33. https://doi.org/10.1108/EL-10-2013-0193
Cichy RM, Kaiser D (2019) Deep neural networks as scientific models. Trends in Cognitive Sciences 23. https://doi.org/10.1016/j.tics.2019.01.009
Ruck DW, Rogers SK, Kabrisky M (1990) Feature selection using a multilayer perceptron. J Neural Netw Comput 2(2):40–48
Google Scholar
Tang J, Deng C, Huang G-B (2016) Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems 27:809–821. https://ieeexplore.ieee.org/document/7103337/https://doi.org/10.1109/TNNLS.2015.2424995
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539
Article Google Scholar
Jain A, Mao J, Mohiuddin K (1996) Artificial neural networks: a tutorial. Computer 29:31–44. https://ieeexplore.ieee.org/document/485891/https://doi.org/10.1109/2.485891
Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Prieditis A, Russell S (eds) Machine Learning Proceedings 1995 150–157 Morgan Kaufmann, San Francisco (CA). https://www.sciencedirect.com/science/article/pii/B978155860377650027X
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. EMNLP 2008-2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL 1070–1079. https://doi.org/10.3115/1613715.1613855
Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28:133–168. https://doi.org/10.1023/A:1007330508534
Article MATH Google Scholar
Seung HS, Opper M, Sompolinsky H (1992) Query by committee. Proceedings of the fifth annual workshop on Computational learning theory-COLT ’92 287–294. https://doi.org/10.1145/130385.130417
Macmillan NA, Creelman CD (2005) Detection Theory: A User’s Guide, 2nd edn. Taylor & Francis
Google Scholar
Harvey LOJ (2011) Detection theory: Sensory and decision processes. Psych-Www.Colorado.Edu 4165–100. https://psych-www.colorado.edu/~lharvey/P4165/P4165_2005_Spring/2005_Spring_pdf/P4165_SDT.pdf%5Cn http://psych-www.colorado.edu/~lharvey/P4165/P4165%5C_2005%5C_Spring/2005%5C_Spring%5C_pdf/P4165%5C_SDT.pdf
of the Deaf CA (2016) Key findings: Understanding user responses to live closed captioning in canada. Tech. Rep., Canadian Association of the Deaf. https://www.livecaptioningcanada.ca/assets/User_Responses_Survey_Key_Findings_FINAL.pdf
Pfeiffer S (2019) WebVTT: The web video text tracks format. Candidate Recommendation, W3C. https://www.w3.org/TR/2019/CR-webvtt1-20190404/
Cheng J, Wang Z, Pollastri G (2008) A neural network approach to ordinal regression. Proceedings of the International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN.2008.4633963
Article Google Scholar
Jones E, Oliphant T, Peterson P et al (2001) Scipy.org
Pedregosa F et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Chollet F (2015) Keras: The python deep learning library. Keras, Io
Google Scholar
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint. arXiv:1412.6980
Nam S, Fels D (2019) Simulation of subjective closed captioning quality assessment using prediction models. Int J Semant Comput 13(01):45–65
Article Google Scholar
Braun V, Clarke V (2006) Using thematic analysis in psychology. Qualitative research in psychology 3(2):77–101
Article Google Scholar
Braun V, Clarke V (2012) Thematic analysis. APA handbook of research methods in psychology, Vol 2: Research designs: Quantitative, qualitative, neuropsychological, and biological. https://doi.org/10.1037/13620-004
McHugh ML (2012) Lessons in biostatistics interrater reliability : the kappa statistic. Biochem Med 22
Fan H, Zhong Y, Zeng G, Ge C (2022) Improving recommender system via knowledge graph based exploring user preference. Appl Intell 1–13
Kuzma M, Andrejková G (2016) Predicting user’s preferences using neural networks and psychology models. Appl Intell 44:526–538
Article Google Scholar

Download references

Acknowledgements

We thank Broadcasting Accessibility Fund (BAF), and Natural Sciences and Engineering Research Council (NSERC) for their support. We also thank Christie Christelis, the Canadian Association of Broadcasters (CAB), and the Steering Committee for “Understanding User Responses to Live Closed Captioning in Canada” for generously providing the caption and video samples. Also, thanks to all participants that took part in the user survey.

Funding

This study was funded by the Broadcasting Accessibility Fund, and the Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Industrial Engineering, University of Toronto, 27 King’s College Circle, M5S 1A1, Toronto, ON, Canada
Somang Nam & Mark Chignell
Information Technology Management, Toronto Metropolitan University, 350 Victoria Street, M5B 2K3, Toronto, ON, Canada
Deborah Fels

Authors

Somang Nam
View author publications
You can also search for this author in PubMed Google Scholar
Deborah Fels
View author publications
You can also search for this author in PubMed Google Scholar
Mark Chignell
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Somang Nam. The first draft of the manuscript was written by Somang Nam and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Somang Nam.

Ethics declarations

Competing interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Ethics approval

The questionnaire and methodology for this study were approved by the Human Research Ethics committee of the University of Toronto (RIS Protocol Number: 38671).

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent for publication

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Video clips used in the active learning study

Clip	Genre	Duration	Number	WPM
number		(s)	of English	(audio)
			sentences
1	Sports talk show	19	3	301.25
2	Sports talk show	22	3	257.48
3	Sports talk show	20	3	199.37
4	NHL broadcast	22	2	199.78
5	NHL broadcast	21	2	210.10
6	NHL broadcast	19	3	173.88
7	Weather forecast	22	3	225.61
8	Weather forecast	25	3	200.62
9	Weather forecast	24	3	195.49
10	Breakfast talk show	20	3	258.82
11	Breakfast talk show	26	3	264.47
12	Breakfast talk show	25	2	216.12
13	Weather forecast	25	2	211.06
14	Weather forecast	29	3	268.97
15	Weather forecast	24	2	252.31
16	Weather forecast	15	3	288.63
17	Weather forecast	20	3	282.27
18	Weather forecast	21	3	308.78
19	Weather forecast	22	2	295.20
20	Weather forecast	18	2	243.19

Appendix B: Calculation of the potential number of participants to achieve 100% PA

Overall average Percent Agreement (PA) for D group is 73% and for Hard of Hearing (HoH) group is 76.92%. The rate of change in trendline of PA in D group was 0.237 and HoH group was 0.607.

A simple linear equation can be formed by using \(ax+b=y\), where,

y: current average PA
x: number of participants
a: rate of change from trendline
b: constant variable

Then, by substitution,

Deaf group	Hard of Hearing group
\(\begin{array}{r} 73 = 0.237 \times 15 + b \\ b = 73-3.56 \\ b = 69.45 \end{array}\)	\(\begin{array}{r} 76.92 = 0.607 \times 15 + b \\ b = 76.92-9.1 \\ b = 67.82\\ \end{array}\)
To achieve 100% PA,	To achieve 100% PA,
\(\begin{array}{r} 100 = 0.237 \times x + 69.45 \\ 30.55 = 0.237 \times x \\ x = 30.55/0.237 \approxeq 129\\ \end{array}\)	\(\begin{array}{r} 100 = 0.607 \times x + 67.82 \\ 32.18 = 0.607 \times x \\ x = 32.18/0.607 \approxeq 53\\ \end{array}\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nam, S., Fels, D. & Chignell, M. Developing a closed captioning quality assessment system using a multi-label classifier with active learning from deaf and hard of hearing viewers. Appl Intell 53, 22882–22897 (2023). https://doi.org/10.1007/s10489-023-04677-3

Download citation

Accepted: 26 April 2023
Published: 04 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04677-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Developing a closed captioning quality assessment system using a multi-label classifier with active learning from deaf and hard of hearing viewers

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Exploring ChatGPT and its impact on society

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Availability of data and materials

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix A: Video clips used in the active learning study

Appendix B: Calculation of the potential number of participants to achieve 100% PA

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Developing a closed captioning quality assessment system using a multi-label classifier with active learning from deaf and hard of hearing viewers

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Exploring ChatGPT and its impact on society

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Availability of data and materials

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix A: Video clips used in the active learning study

Appendix B: Calculation of the potential number of participants to achieve 100% PA

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation