Skip to main content

Advertisement

Log in

Developing a closed captioning quality assessment system using a multi-label classifier with active learning from deaf and hard of hearing viewers

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Closed Captioning (CC) is a telecommunications service to display textual information equivalent to audio. Although the primary consumer group is Deaf (D) and Hard of Hearing (HOH) viewers, they are typically excluded from the quality assessment process. Including D and HOH viewers for all assessments is nearly impossible and requires enormous effort. One solution is to use machine learning algorithms to replicate human subjective evaluation of the quality of CC. In this paper, a multi-label classifier was trained using an active learning algorithm with D and HOH viewers, predicting human subjective ratings on the various quality of CC. An online user study was conducted to train the multilayer perception with D and HoH participant subjective quality assessment rating data on machine-queried CC encoded to short video clips in various genres. The result revealed the possibilities of using an automated assessment system to predict the human perceived quality of CC, human preferences and perceiving behaviour upon viewing CC with deliberate error inclusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

The data that support the findings of this study are available on request from the corresponding author SN. The data are not publicly available due to them containing information that could compromise research participant privacy/consent.

Code Availability

Not applicable

References

  1. of the Deaf CA (2018) Final project report: Understanding user responses to live closed captioning in canada. https://www.osti.gov/servlets/purl/1000008/

  2. Zdenek S (2015) Reading Sounds, 1st edn. University of Chicago Press. https://www.bibliovault.org/BV.landing.epl?ISBN=9780226312811

  3. Australia MA (2016) Caption quality: International approaches to standards and measurement

  4. CRTC (2012) Broadcasting regulatory policy crtc 2012-362 quality standards for english-language closed captioning. https://crtc.gc.ca/eng/archive/2012/2012-362.htm

  5. Romero-Fresco P, Pérez JM (2015) Accuracy rate in live subtitling: The ner model. Audiovisual Translation in a Global Context 28–50. https://doi.org/10.1057/9781137552891_3

  6. Apone T, Botkin B, Brooks M, Goldberg L (2011) Caption accuracy metrics project research into automated error ranking of real-time captions in live television news programs. Tech. Rep, The WGBH National Center for Accessible Media

    Google Scholar 

  7. Varela M, Technical VTT (2012) Toward total quality of experience: A qoe model in a communication ecosystem. IEEE Communications Magazine 50:28–36. https://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp= &arnumber=6178831 &contentType=Journals+ &+Magazines &sortType=asc_p_Sequence &filter=AND(p_IS_Number:6178822)https://doi.org/10.1109/MCOM.2012.6178831

  8. Nam S, Fels DI, Chignell MH (2020) Modeling closed captioning subjective quality assessment by deaf and hard of hearing viewers. IEEE Transactions on Computational Social Systems 7:621–631. https://doi.org/10.1109/TCSS.2020.2972399

    Article  Google Scholar 

  9. Radio-television C, of Canada TC-G (2016) English-language closed captioning quality standard related to the accuracy rate for live programming. https://www.crtc.gc.ca/eng/archive/2016/2016-435.pdf

  10. Downey GJ (2008) Closed Captioning: Subtitling, Stenography, and the Digital Convergence of Text with Television. The Johns Hopkins University Press

    Book  Google Scholar 

  11. Romero-Fresco P (2020) Subtitling through speech recognition: Respeaking. Routledge

  12. Commission FC (2017) Consumer guide. https://www.fcc.gov/consumers/guides/closed-captioning-television

  13. Union ITE (2008) 800: Definitions of terms related to quality of service. ITUT Recommendation 1–30. https://www.itu.int/rec/dologin_pub.asp?lang=e &id=T-REC-E.800-200809-I!!PDF-E &type=items

  14. Raake A, Egger S (2014) Quality and Quality of Experience, 1st edn, Springer. https://link.springer.com/10.1007/978-3-319-02681-7

  15. Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10

  16. Apone T, Brooks M, O’Connell T (2010) Caption accuracy metrics project. Error Ranking of Real-time Captions in Live Television News Programs. Boston, Caption Viewer Survey

    Google Scholar 

  17. Romero-Fresco P (2016) Accessing communication: The quality of live subtitles in the uk. Language and Communication. https://doi.org/10.1016/j.langcom.2016.06.001

    Article  Google Scholar 

  18. Radio-television C, of Canada TC-G (2018) Canadian ner evaluation guidelines. https://crtc.gc.ca/eng/archive/2019/2019-9.htm

  19. Romero-Fresco P (2009) More haste less speed: Edited versus verbatim respoken subtitles. Vigo Int J Appl Linguist 6:109–133

    Google Scholar 

  20. Szarkowska A, Krejtz I, Klyszejko Z, Wieczorek A (2011) Verbatim, standard, or edited? reading patterns of different captioning styles among deaf, hard of hearing, and hearing viewers. American annals of the deaf 156(4):363–378

    Article  Google Scholar 

  21. IEEE. Impact of technical and content quality on overall experience of OTT video

  22. Postman L, Bruner JS, Walk RD (1951) The perception of error. British Journal of Psychology. General Section 42:1–10. https://doi.org/10.1111/j.2044-8295.1951.tb00277.x

    Article  Google Scholar 

  23. Sekuler R, Blake R (1985) Perception, 5th edn. McGraw Hill

    Google Scholar 

  24. Reiter U et al (2014) Factors influencing quality of experience. T-Labs Series in Telecommunication Services. https://doi.org/10.1007/978-3-319-02681-7_4

    Article  Google Scholar 

  25. Varela M, Skorin-Kapov L, Ebrahimi T (2014) Quality of service versus quality of experience. T-Labs Series in Telecommunication Services. https://doi.org/10.1007/978-3-319-02681-7_6

    Article  Google Scholar 

  26. Berry LL, Parasuraman A, Zeithaml VA (1990) Delivering quality service: Balancing customer perceptions and expectations. Business 260. https://doi.org/10.1177/0001699303046002008

  27. Reeves C, Bednar D (1994) Alternatives defining quality and implications. The Acad Manag Rev 19(3):419–445

    Article  Google Scholar 

  28. Parasuraman A, Zeithaml VA, Berry LL (1988) Servqual: A multipleitem scale for measuring consumer perc. J Retail 64(12)

  29. Santos J (2003) E-service quality: a model of virtual service quality dimensions. Managing Service Quality: An International Journal 13:233–246. https://doi.org/10.1108/09604520310476490

    Article  Google Scholar 

  30. Sharma G, Lijuan W (2015) The effects of online service quality of ecommerce websites on user satisfaction. Electronic Library 33. https://doi.org/10.1108/EL-10-2013-0193

  31. Cichy RM, Kaiser D (2019) Deep neural networks as scientific models. Trends in Cognitive Sciences 23. https://doi.org/10.1016/j.tics.2019.01.009

  32. Ruck DW, Rogers SK, Kabrisky M (1990) Feature selection using a multilayer perceptron. J Neural Netw Comput 2(2):40–48

    Google Scholar 

  33. Tang J, Deng C, Huang G-B (2016) Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems 27:809–821. https://ieeexplore.ieee.org/document/7103337/https://doi.org/10.1109/TNNLS.2015.2424995

  34. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  35. Jain A, Mao J, Mohiuddin K (1996) Artificial neural networks: a tutorial. Computer 29:31–44. https://ieeexplore.ieee.org/document/485891/https://doi.org/10.1109/2.485891

  36. Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Prieditis A, Russell S (eds) Machine Learning Proceedings 1995 150–157 Morgan Kaufmann, San Francisco (CA). https://www.sciencedirect.com/science/article/pii/B978155860377650027X

  37. Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. EMNLP 2008-2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL 1070–1079. https://doi.org/10.3115/1613715.1613855

  38. Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28:133–168. https://doi.org/10.1023/A:1007330508534

    Article  MATH  Google Scholar 

  39. Seung HS, Opper M, Sompolinsky H (1992) Query by committee. Proceedings of the fifth annual workshop on Computational learning theory-COLT ’92 287–294. https://doi.org/10.1145/130385.130417

  40. Macmillan NA, Creelman CD (2005) Detection Theory: A User’s Guide, 2nd edn. Taylor & Francis

    Google Scholar 

  41. Harvey LOJ (2011) Detection theory: Sensory and decision processes. Psych-Www.Colorado.Edu 4165–100. https://psych-www.colorado.edu/~lharvey/P4165/P4165_2005_Spring/2005_Spring_pdf/P4165_SDT.pdf%5Cnhttp://psych-www.colorado.edu/~lharvey/P4165/P4165%5C_2005%5C_Spring/2005%5C_Spring%5C_pdf/P4165%5C_SDT.pdf

  42. of the Deaf CA (2016) Key findings: Understanding user responses to live closed captioning in canada. Tech. Rep., Canadian Association of the Deaf. https://www.livecaptioningcanada.ca/assets/User_Responses_Survey_Key_Findings_FINAL.pdf

  43. Pfeiffer S (2019) WebVTT: The web video text tracks format. Candidate Recommendation, W3C. https://www.w3.org/TR/2019/CR-webvtt1-20190404/

  44. Cheng J, Wang Z, Pollastri G (2008) A neural network approach to ordinal regression. Proceedings of the International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN.2008.4633963

    Article  Google Scholar 

  45. Jones E, Oliphant T, Peterson P et al (2001) Scipy.org

  46. Pedregosa F et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  47. Chollet F (2015) Keras: The python deep learning library. Keras, Io

    Google Scholar 

  48. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint. arXiv:1412.6980

  49. Nam S, Fels D (2019) Simulation of subjective closed captioning quality assessment using prediction models. Int J Semant Comput 13(01):45–65

    Article  Google Scholar 

  50. Braun V, Clarke V (2006) Using thematic analysis in psychology. Qualitative research in psychology 3(2):77–101

    Article  Google Scholar 

  51. Braun V, Clarke V (2012) Thematic analysis. APA handbook of research methods in psychology, Vol 2: Research designs: Quantitative, qualitative, neuropsychological, and biological. https://doi.org/10.1037/13620-004

  52. McHugh ML (2012) Lessons in biostatistics interrater reliability : the kappa statistic. Biochem Med 22

  53. Fan H, Zhong Y, Zeng G, Ge C (2022) Improving recommender system via knowledge graph based exploring user preference. Appl Intell 1–13

  54. Kuzma M, Andrejková G (2016) Predicting user’s preferences using neural networks and psychology models. Appl Intell 44:526–538

    Article  Google Scholar 

Download references

Acknowledgements

We thank Broadcasting Accessibility Fund (BAF), and Natural Sciences and Engineering Research Council (NSERC) for their support. We also thank Christie Christelis, the Canadian Association of Broadcasters (CAB), and the Steering Committee for “Understanding User Responses to Live Closed Captioning in Canada” for generously providing the caption and video samples. Also, thanks to all participants that took part in the user survey.

Funding

This study was funded by the Broadcasting Accessibility Fund, and the Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Somang Nam. The first draft of the manuscript was written by Somang Nam and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Somang Nam.

Ethics declarations

Competing interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Ethics approval

The questionnaire and methodology for this study were approved by the Human Research Ethics committee of the University of Toronto (RIS Protocol Number: 38671).

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent for publication

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Video clips used in the active learning study

Clip

Genre

Duration

Number

WPM

number

 

(s)

of English

(audio)

   

sentences

 

1

Sports talk show

19

3

301.25

2

Sports talk show

22

3

257.48

3

Sports talk show

20

3

199.37

4

NHL broadcast

22

2

199.78

5

NHL broadcast

21

2

210.10

6

NHL broadcast

19

3

173.88

7

Weather forecast

22

3

225.61

8

Weather forecast

25

3

200.62

9

Weather forecast

24

3

195.49

10

Breakfast talk show

20

3

258.82

11

Breakfast talk show

26

3

264.47

12

Breakfast talk show

25

2

216.12

13

Weather forecast

25

2

211.06

14

Weather forecast

29

3

268.97

15

Weather forecast

24

2

252.31

16

Weather forecast

15

3

288.63

17

Weather forecast

20

3

282.27

18

Weather forecast

21

3

308.78

19

Weather forecast

22

2

295.20

20

Weather forecast

18

2

243.19

Appendix B: Calculation of the potential number of participants to achieve 100% PA

Overall average Percent Agreement (PA) for D group is 73% and for Hard of Hearing (HoH) group is 76.92%. The rate of change in trendline of PA in D group was 0.237 and HoH group was 0.607.

A simple linear equation can be formed by using \(ax+b=y\), where,

  • y: current average PA

  • x: number of participants

  • a: rate of change from trendline

  • b: constant variable

Then, by substitution,

Deaf group

Hard of Hearing group

\(\begin{array}{r} 73 = 0.237 \times 15 + b \\ b = 73-3.56 \\ b = 69.45 \end{array}\)

\(\begin{array}{r} 76.92 = 0.607 \times 15 + b \\ b = 76.92-9.1 \\ b = 67.82\\ \end{array}\)

To achieve 100% PA,

To achieve 100% PA,

\(\begin{array}{r} 100 = 0.237 \times x + 69.45 \\ 30.55 = 0.237 \times x \\ x = 30.55/0.237 \approxeq 129\\ \end{array}\)

\(\begin{array}{r} 100 = 0.607 \times x + 67.82 \\ 32.18 = 0.607 \times x \\ x = 32.18/0.607 \approxeq 53\\ \end{array}\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nam, S., Fels, D. & Chignell, M. Developing a closed captioning quality assessment system using a multi-label classifier with active learning from deaf and hard of hearing viewers. Appl Intell 53, 22882–22897 (2023). https://doi.org/10.1007/s10489-023-04677-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04677-3

Keywords

Navigation