Abstract
Closed Captioning (CC) is a telecommunications service to display textual information equivalent to audio. Although the primary consumer group is Deaf (D) and Hard of Hearing (HOH) viewers, they are typically excluded from the quality assessment process. Including D and HOH viewers for all assessments is nearly impossible and requires enormous effort. One solution is to use machine learning algorithms to replicate human subjective evaluation of the quality of CC. In this paper, a multi-label classifier was trained using an active learning algorithm with D and HOH viewers, predicting human subjective ratings on the various quality of CC. An online user study was conducted to train the multilayer perception with D and HoH participant subjective quality assessment rating data on machine-queried CC encoded to short video clips in various genres. The result revealed the possibilities of using an automated assessment system to predict the human perceived quality of CC, human preferences and perceiving behaviour upon viewing CC with deliberate error inclusion.
Similar content being viewed by others
Availability of data and materials
The data that support the findings of this study are available on request from the corresponding author SN. The data are not publicly available due to them containing information that could compromise research participant privacy/consent.
Code Availability
Not applicable
References
of the Deaf CA (2018) Final project report: Understanding user responses to live closed captioning in canada. https://www.osti.gov/servlets/purl/1000008/
Zdenek S (2015) Reading Sounds, 1st edn. University of Chicago Press. https://www.bibliovault.org/BV.landing.epl?ISBN=9780226312811
Australia MA (2016) Caption quality: International approaches to standards and measurement
CRTC (2012) Broadcasting regulatory policy crtc 2012-362 quality standards for english-language closed captioning. https://crtc.gc.ca/eng/archive/2012/2012-362.htm
Romero-Fresco P, Pérez JM (2015) Accuracy rate in live subtitling: The ner model. Audiovisual Translation in a Global Context 28–50. https://doi.org/10.1057/9781137552891_3
Apone T, Botkin B, Brooks M, Goldberg L (2011) Caption accuracy metrics project research into automated error ranking of real-time captions in live television news programs. Tech. Rep, The WGBH National Center for Accessible Media
Varela M, Technical VTT (2012) Toward total quality of experience: A qoe model in a communication ecosystem. IEEE Communications Magazine 50:28–36. https://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp= &arnumber=6178831 &contentType=Journals+ &+Magazines &sortType=asc_p_Sequence &filter=AND(p_IS_Number:6178822)https://doi.org/10.1109/MCOM.2012.6178831
Nam S, Fels DI, Chignell MH (2020) Modeling closed captioning subjective quality assessment by deaf and hard of hearing viewers. IEEE Transactions on Computational Social Systems 7:621–631. https://doi.org/10.1109/TCSS.2020.2972399
Radio-television C, of Canada TC-G (2016) English-language closed captioning quality standard related to the accuracy rate for live programming. https://www.crtc.gc.ca/eng/archive/2016/2016-435.pdf
Downey GJ (2008) Closed Captioning: Subtitling, Stenography, and the Digital Convergence of Text with Television. The Johns Hopkins University Press
Romero-Fresco P (2020) Subtitling through speech recognition: Respeaking. Routledge
Commission FC (2017) Consumer guide. https://www.fcc.gov/consumers/guides/closed-captioning-television
Union ITE (2008) 800: Definitions of terms related to quality of service. ITUT Recommendation 1–30. https://www.itu.int/rec/dologin_pub.asp?lang=e &id=T-REC-E.800-200809-I!!PDF-E &type=items
Raake A, Egger S (2014) Quality and Quality of Experience, 1st edn, Springer. https://link.springer.com/10.1007/978-3-319-02681-7
Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10
Apone T, Brooks M, O’Connell T (2010) Caption accuracy metrics project. Error Ranking of Real-time Captions in Live Television News Programs. Boston, Caption Viewer Survey
Romero-Fresco P (2016) Accessing communication: The quality of live subtitles in the uk. Language and Communication. https://doi.org/10.1016/j.langcom.2016.06.001
Radio-television C, of Canada TC-G (2018) Canadian ner evaluation guidelines. https://crtc.gc.ca/eng/archive/2019/2019-9.htm
Romero-Fresco P (2009) More haste less speed: Edited versus verbatim respoken subtitles. Vigo Int J Appl Linguist 6:109–133
Szarkowska A, Krejtz I, Klyszejko Z, Wieczorek A (2011) Verbatim, standard, or edited? reading patterns of different captioning styles among deaf, hard of hearing, and hearing viewers. American annals of the deaf 156(4):363–378
IEEE. Impact of technical and content quality on overall experience of OTT video
Postman L, Bruner JS, Walk RD (1951) The perception of error. British Journal of Psychology. General Section 42:1–10. https://doi.org/10.1111/j.2044-8295.1951.tb00277.x
Sekuler R, Blake R (1985) Perception, 5th edn. McGraw Hill
Reiter U et al (2014) Factors influencing quality of experience. T-Labs Series in Telecommunication Services. https://doi.org/10.1007/978-3-319-02681-7_4
Varela M, Skorin-Kapov L, Ebrahimi T (2014) Quality of service versus quality of experience. T-Labs Series in Telecommunication Services. https://doi.org/10.1007/978-3-319-02681-7_6
Berry LL, Parasuraman A, Zeithaml VA (1990) Delivering quality service: Balancing customer perceptions and expectations. Business 260. https://doi.org/10.1177/0001699303046002008
Reeves C, Bednar D (1994) Alternatives defining quality and implications. The Acad Manag Rev 19(3):419–445
Parasuraman A, Zeithaml VA, Berry LL (1988) Servqual: A multipleitem scale for measuring consumer perc. J Retail 64(12)
Santos J (2003) E-service quality: a model of virtual service quality dimensions. Managing Service Quality: An International Journal 13:233–246. https://doi.org/10.1108/09604520310476490
Sharma G, Lijuan W (2015) The effects of online service quality of ecommerce websites on user satisfaction. Electronic Library 33. https://doi.org/10.1108/EL-10-2013-0193
Cichy RM, Kaiser D (2019) Deep neural networks as scientific models. Trends in Cognitive Sciences 23. https://doi.org/10.1016/j.tics.2019.01.009
Ruck DW, Rogers SK, Kabrisky M (1990) Feature selection using a multilayer perceptron. J Neural Netw Comput 2(2):40–48
Tang J, Deng C, Huang G-B (2016) Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems 27:809–821. https://ieeexplore.ieee.org/document/7103337/https://doi.org/10.1109/TNNLS.2015.2424995
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539
Jain A, Mao J, Mohiuddin K (1996) Artificial neural networks: a tutorial. Computer 29:31–44. https://ieeexplore.ieee.org/document/485891/https://doi.org/10.1109/2.485891
Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Prieditis A, Russell S (eds) Machine Learning Proceedings 1995 150–157 Morgan Kaufmann, San Francisco (CA). https://www.sciencedirect.com/science/article/pii/B978155860377650027X
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. EMNLP 2008-2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL 1070–1079. https://doi.org/10.3115/1613715.1613855
Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28:133–168. https://doi.org/10.1023/A:1007330508534
Seung HS, Opper M, Sompolinsky H (1992) Query by committee. Proceedings of the fifth annual workshop on Computational learning theory-COLT ’92 287–294. https://doi.org/10.1145/130385.130417
Macmillan NA, Creelman CD (2005) Detection Theory: A User’s Guide, 2nd edn. Taylor & Francis
Harvey LOJ (2011) Detection theory: Sensory and decision processes. Psych-Www.Colorado.Edu 4165–100. https://psych-www.colorado.edu/~lharvey/P4165/P4165_2005_Spring/2005_Spring_pdf/P4165_SDT.pdf%5Cnhttp://psych-www.colorado.edu/~lharvey/P4165/P4165%5C_2005%5C_Spring/2005%5C_Spring%5C_pdf/P4165%5C_SDT.pdf
of the Deaf CA (2016) Key findings: Understanding user responses to live closed captioning in canada. Tech. Rep., Canadian Association of the Deaf. https://www.livecaptioningcanada.ca/assets/User_Responses_Survey_Key_Findings_FINAL.pdf
Pfeiffer S (2019) WebVTT: The web video text tracks format. Candidate Recommendation, W3C. https://www.w3.org/TR/2019/CR-webvtt1-20190404/
Cheng J, Wang Z, Pollastri G (2008) A neural network approach to ordinal regression. Proceedings of the International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN.2008.4633963
Jones E, Oliphant T, Peterson P et al (2001) Scipy.org
Pedregosa F et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830
Chollet F (2015) Keras: The python deep learning library. Keras, Io
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint. arXiv:1412.6980
Nam S, Fels D (2019) Simulation of subjective closed captioning quality assessment using prediction models. Int J Semant Comput 13(01):45–65
Braun V, Clarke V (2006) Using thematic analysis in psychology. Qualitative research in psychology 3(2):77–101
Braun V, Clarke V (2012) Thematic analysis. APA handbook of research methods in psychology, Vol 2: Research designs: Quantitative, qualitative, neuropsychological, and biological. https://doi.org/10.1037/13620-004
McHugh ML (2012) Lessons in biostatistics interrater reliability : the kappa statistic. Biochem Med 22
Fan H, Zhong Y, Zeng G, Ge C (2022) Improving recommender system via knowledge graph based exploring user preference. Appl Intell 1–13
Kuzma M, Andrejková G (2016) Predicting user’s preferences using neural networks and psychology models. Appl Intell 44:526–538
Acknowledgements
We thank Broadcasting Accessibility Fund (BAF), and Natural Sciences and Engineering Research Council (NSERC) for their support. We also thank Christie Christelis, the Canadian Association of Broadcasters (CAB), and the Steering Committee for “Understanding User Responses to Live Closed Captioning in Canada” for generously providing the caption and video samples. Also, thanks to all participants that took part in the user survey.
Funding
This study was funded by the Broadcasting Accessibility Fund, and the Natural Sciences and Engineering Research Council of Canada.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Somang Nam. The first draft of the manuscript was written by Somang Nam and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Ethics approval
The questionnaire and methodology for this study were approved by the Human Research Ethics committee of the University of Toronto (RIS Protocol Number: 38671).
Consent to participate
Informed consent was obtained from all individual participants included in the study.
Consent for publication
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Video clips used in the active learning study
Clip | Genre | Duration | Number | WPM |
---|---|---|---|---|
number | (s) | of English | (audio) | |
sentences | ||||
1 | Sports talk show | 19 | 3 | 301.25 |
2 | Sports talk show | 22 | 3 | 257.48 |
3 | Sports talk show | 20 | 3 | 199.37 |
4 | NHL broadcast | 22 | 2 | 199.78 |
5 | NHL broadcast | 21 | 2 | 210.10 |
6 | NHL broadcast | 19 | 3 | 173.88 |
7 | Weather forecast | 22 | 3 | 225.61 |
8 | Weather forecast | 25 | 3 | 200.62 |
9 | Weather forecast | 24 | 3 | 195.49 |
10 | Breakfast talk show | 20 | 3 | 258.82 |
11 | Breakfast talk show | 26 | 3 | 264.47 |
12 | Breakfast talk show | 25 | 2 | 216.12 |
13 | Weather forecast | 25 | 2 | 211.06 |
14 | Weather forecast | 29 | 3 | 268.97 |
15 | Weather forecast | 24 | 2 | 252.31 |
16 | Weather forecast | 15 | 3 | 288.63 |
17 | Weather forecast | 20 | 3 | 282.27 |
18 | Weather forecast | 21 | 3 | 308.78 |
19 | Weather forecast | 22 | 2 | 295.20 |
20 | Weather forecast | 18 | 2 | 243.19 |
Appendix B: Calculation of the potential number of participants to achieve 100% PA
Overall average Percent Agreement (PA) for D group is 73% and for Hard of Hearing (HoH) group is 76.92%. The rate of change in trendline of PA in D group was 0.237 and HoH group was 0.607.
A simple linear equation can be formed by using \(ax+b=y\), where,
-
y: current average PA
-
x: number of participants
-
a: rate of change from trendline
-
b: constant variable
Then, by substitution,
Deaf group | Hard of Hearing group |
---|---|
\(\begin{array}{r} 73 = 0.237 \times 15 + b \\ b = 73-3.56 \\ b = 69.45 \end{array}\) | \(\begin{array}{r} 76.92 = 0.607 \times 15 + b \\ b = 76.92-9.1 \\ b = 67.82\\ \end{array}\) |
To achieve 100% PA, | To achieve 100% PA, |
\(\begin{array}{r} 100 = 0.237 \times x + 69.45 \\ 30.55 = 0.237 \times x \\ x = 30.55/0.237 \approxeq 129\\ \end{array}\) | \(\begin{array}{r} 100 = 0.607 \times x + 67.82 \\ 32.18 = 0.607 \times x \\ x = 32.18/0.607 \approxeq 53\\ \end{array}\) |
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nam, S., Fels, D. & Chignell, M. Developing a closed captioning quality assessment system using a multi-label classifier with active learning from deaf and hard of hearing viewers. Appl Intell 53, 22882–22897 (2023). https://doi.org/10.1007/s10489-023-04677-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04677-3