Abstract
This study offers a fresh technique for translating subtitles in sports events, addressing the issues of real-time translation with improved accuracy and efficiency. Different from standard methods, which often result in delayed or inaccurate subtitles, the proposed method integrates advanced annotation techniques and machine learning algorithms to increase subtitle recognition and extraction. Annotation techniques in this study include systematically labeling spoken elements like commentary and dialogue, enabling accurate subtitle recognition and real-time adjustments in live sports broadcasts to ensure both accuracy and contextual relevance. These novel ideas allow for seamless adjustments to multiple language types, including the voices of commentators, off-site hosts, and athletes, while maintaining critical information within strict word count limits. Key improvements include faster processing times and increased translation precision, which are crucial for the dynamic environment of live sports broadcasts. The study builds on past studies in audiovisual translation, specifically tailoring its strategy to the unique demands of sports media. By emphasizing the importance of clear and contextually appropriate real-time subtitles, this research presents significant advancements over existing methods, providing valuable insights for future translation projects in sports and similar contexts. The results contribute to a more effective subtitle translation framework, enhancing the accessibility and viewing experience for audiences during live sports events.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03606-2/MediaObjects/11760_2024_3606_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03606-2/MediaObjects/11760_2024_3606_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03606-2/MediaObjects/11760_2024_3606_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03606-2/MediaObjects/11760_2024_3606_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03606-2/MediaObjects/11760_2024_3606_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03606-2/MediaObjects/11760_2024_3606_Fig6_HTML.png)
Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
Abbreviations
- NLP:
-
Natural language processing
- CV:
-
Computer vision
- ML:
-
Machine learning
- DL:
-
Deep learning
- SLR:
-
Systematic literature review
- NER:
-
Named entity recognition
- DHH:
-
Deaf and hard of hearing
- IBM:
-
International Business Machines Corporation
- HMM:
-
Hidden Markov model
- EM:
-
Expectation–maximization
- GB:
-
Gigabytes
- OCR:
-
Optical character recognition
- BP:
-
Length-based penalty factor
- Bleu:
-
Bilingual evaluation understudy
- LS:
-
Length statistics
- CLM:
-
Character-level modeling
- ES:
-
Experimental setup
- TT:
-
Training time
- ASD:
-
Abnormal subtitle displays
- RA:
-
Relationship analysis
- N:
-
Gray level series of the image/total pixels
- z:
-
Detection framework
- S:
-
Mean value of the gray level difference of the adjacent frames of the whole video
- \(\left( {P_{r} \left( {\overline{e}_{l} ,\overline{f}} \right)} \right)\) :
-
Number of times the phrase pair appears in the corpus
- M:
-
Length of the video frame sequence
- r:
-
Window size
- W:
-
Inter-frame difference measurement of each frame
- g:
-
Function
- A:
-
Gray value histogram
- L:
-
Inter-frame difference measurement of each frame
- j:
-
Frame index
- k:
-
Cumulative number of blocks
- D:
-
Euclidean distance
- F:
-
Sobel gradient amplitude
- k:
-
The weighting factor for Sobel operator
- L:
-
Inter-frame difference measurement of each frame
- ∏:
-
Product operator
- G:
-
Horizontal template for convolution
- g:
-
Vertical template for convolution
- x,y:
-
Pixel coordinates
- E:
-
Translation probability estimation
- ξ:
-
Normalization factor
- γ:
-
Number of times a phrase appears in the target sentence
- δ:
-
Translation probability
- τ:
-
Number of times a word appears in the target sentence
- n:
-
Number of word pairs
- E:
-
Translation probability estimation
- aj (x):
-
Gray value histogram of frame J
- bk (y):
-
Gray value histogram of frame g
- Fj (x,y):
-
Gray value at pixel point (x,y) in frame j
- Fk (x,y):
-
Gray value at pixel point (x,y) in frame k
- Gx :
-
Sobel gradient in the horizontal direction
- Gy :
-
Sobel gradient in the vertical direction
- U1 :
-
Gradient matrix from horizontal template
- U2 :
-
Gradient matrix from vertical template
- \(Ecount\) \(\left( {P_{r} \left( {\overline{e}_{l} ,\overline{f}} \right)} \right)\) :
-
Parallel bilingual phrase pair
- w(ei,fi):
-
The Lexicalized weighted feature between words ei and fi
- count(fi,ei):
-
Number of times the word pair (fi,ei) appears in the corpus
- wn :
-
Corresponding weight of co-occurrence n-ary words.
- pn :
-
Precision of n-ary words
References
Zhang, B., Chen, D.: Resource scheduling of green communication network for large sports events based on edge computing. Comput. Commun. 159, 299–309 (2020)
Zhang, H., Li, Y., Zhang, H.: Risk early warning safety model for sports events based on back propagation neural network machine learning. Saf. Sci. 118, 332–336 (2019)
Le, T.M., Le, V., Venkatesh, S., Tran, T.: Hierarchical conditional relation networks for multimodal video question answering. Int. J. Comput. Vis. 129(11), 3027–3050 (2021)
Yan, H., Xu, X.: End-to-end video subtitle recognition via a deep residual neural network. Pattern Recognit. Lett. 131, 368–375 (2020)
Barbero, J.M., de la Riva, I.R., Páez, M.S.S.: Multilanguage subtitle platform for production, distribution and diffusion of live sports events. Technol. Disabil. 27, 127–139 (2015). https://doi.org/10.3233/TAD-150435
Pražák, A., Loose, Z., Psutka, J.V., Radová, V., Psutka, J.: Live TV subtitling through respeaking with remote cutting-edge technology. Multimed. Tools Appl. 79(1), 1203–1220 (2020). https://doi.org/10.1007/s11042-019-08235-3
Khan, A.A., Shao, J., Ali, W., Tumrani, S.: Content-aware summarization of broadcast sports videos: an audio-visual feature extraction approach. Neural. Process. Lett. 52(3), 1945–1968 (2020). https://doi.org/10.1007/s11063-020-10200-3
Petrova, X.Y., Anisimovsky, V.V., Rychagov, M.N.: Real-time detection of sports broadcasts using video content analysis. In: Rychagov, M.N., Tolstaya, E.V., Sirotenko, M.Y. (eds.) Smart Algorithms for Multimedia and Imaging, pp. 193–217. Springer International Publishing, Cham (2021)
Bastas, G., Kaliakatsos-Papakostas, M., Paraskevopoulos, G., Kaplanoglou, P., Christantonis, K., Tsioustas, C., Mastrogiannopoulos, D., Panga, D., Fotinea, E., Katsamanis, A.: Towards a DHH accessible theater: real-time synchronization of subtitles and sign language videos with ASR and NLP solutions. In: Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments, pp. 653–661 (2022)
Moores, Z.: The NERLE model—a tool for assessing the quality of intralingual subtitles at live events. Univers. Access Inf. Soc. 23(2), 589–607 (2024). https://doi.org/10.1007/s10209-023-01050-6
Mkhallati, H., Cioppa, A., Giancola, S., Ghanem, B., Van Droogenbroeck, M.: Soccernet-caption: dense video captioning for soccer broadcasts commentaries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5073–5084 (2023)
Masiello-Ruiz, J.M., Ruiz-Mezcua, B., Martinez, P., Gonzalez-Carrasco, I.: Synchro-Sub, an adaptive multi-algorithm framework for real-time subtitling synchronisation of multi-type TV programmes. Computing 105(7), 1467–1495 (2023)
Kehkashan, T., Alsaeedi, A., Yafooz, W.M.S., Ismail, N.A., Al-Dhaqm, A.: Combinatorial analysis of deep learning and machine learning video captioning studies: a systematic literature review. IEEE Access. 12, 35048–35080 (2024). https://doi.org/10.1109/ACCESS.2024.3357980
Andrews, P., Nordberg, O.E., Borch, N., Guribye, F., Fjeld, M.: Designing for automated sports commentary systems. In: Proceedings of the 2024 ACM International Conference on Interactive Media Experiences, pp. 75–93 (2024)
Campos, V.P., de Araújo, T.M.U., de Souza Filho, G.L., Gonçalves, L.M.G.: CineAD: a system for automated audio description script generation for the visually impaired. Univers. Access Inf. Soc. 19, 99–111 (2020)
Salem, N., Alharbi, S., Khezendar, R., Alshami, H.: Real-time glove and android application for visual and audible Arabic sign language translation. Proc. Comput. Sci. 163, 450–459 (2019)
Tian, M., Guan, B., Xing, Z., Fraundorfer, F.: Efficient ego-motion estimation for multi-camera systems with decoupled rotation and translation. Ieee Access. 8, 153804–153814 (2020)
Manjunath, A., Li, H., Song, S., Zhang, Z., Liu, S., Kahrobai, N., Gowda, A., Seffens, A., Zou, J., Kumar, I.: Comprehensive analysis of 2.4 million patent-to-research citations maps the biomedical innovation and translation landscape. Nat. Biotechnol. 39(6), 678–683 (2021)
Chen, J., Brunner, A.-D., Cogan, J.Z., Nuñez, J.K., Fields, A.P., Adamson, B., Itzhak, D.N., Li, J.Y., Mann, M., Leonetti, M.D.: Pervasive functional translation of noncanonical human open reading frames. Science (1979) 367(6482), 1140–1146 (2020)
Li, H., Sha, J., Shi, C.: Revisiting back-translation for low-resource machine translation between Chinese and Vietnamese. IEEE Access. 8, 119931–119939 (2020)
Araújo, M., Pereira, A., Benevenuto, F.: A comparative study of machine translation for multilingual sentence-level sentiment analysis. Inf. Sci. (N Y). 512, 1078–1102 (2020)
Su, J., Chen, J., Jiang, H., Zhou, C., Lin, H., Ge, Y., Wu, Q., Lai, Y.: Multi-modal neural machine translation with deep semantic interactions. Inf. Sci. (N Y). 554, 47–60 (2021)
Liu, C.-H., Karakanta, A., Tong, A.N., Aulov, O., Soboroff, I.M., Washington, J., Zhao, X.: Introduction to the second issue on machine translation for low-resource languages. Mach. Transl. 35, 1–2 (2021)
Guo, D., Zhou, W., Li, A., Li, H., Wang, M.: Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation. IEEE Trans. Image Process. 29, 1575–1590 (2019)
Tao, R., Li, Z., Tao, R., Li, B.: ResAttr-GAN: unpaired deep residual attributes learning for multi-domain face image translation. IEEE Access. 7, 132594–132608 (2019)
Chatzikoumi, E.: How to evaluate machine translation: a review of automated and human metrics. Nat. Lang. Eng. 26(2), 137–161 (2020)
Castilho, S., Gaspari, F., Moorkens, J., Popović, M., Toral, A.: Editors’ foreword to the special issue on human factors in neural machine translation. Mach. Transl. 33(1–2), 1–7 (2019)
Yuan, R., Zhang, Z., Song, P., Zhang, J., Qin, L.: Construction of virtual video scene and its visualization during sports training. IEEE Access. 8, 124999–125012 (2020)
Felipe, J.L., Garcia-Unanue, J., Viejo-Romero, D., Navandar, A., Sánchez-Sánchez, J.: Validation of a video-based performance analysis system (Mediacoach®) to analyze the physical demands during matches in LaLiga. Sensors. 19(19), 4113 (2019)
Jian, M., Zhang, S., Wu, L., Zhang, S., Wang, X., He, Y.: Deep key frame extraction for sport training. Neurocomputing 328, 147–156 (2019)
Jiang, T.-Q., Xu, X.-M., Zhang, Q.-C., Wang, Z.: A sentiment classification model based on bi-directional LSTM with positional attention for fresh food consumer reviews. In: 2020 IEEE 20th International Conference on Software Quality, Reliability and Security Companion (QRS-C), pp. 589–594. IEEE (2020)
Liu, W.L., Yang, H.: Improved simulation research of dynamic data fusion algorithm. Comput. Simul. 37(4), 294–297 (2020)
Lingxin, K., Yajun, M.: Big data adaptive migration and fusion simulation based on fuzzy matrix. Comput. Simul. 37(3), 4 (2020)
Acknowledgements
The manuscript has been read and approved by all the authors, the requirements for authorship, as stated earlier in this document, have been met, and each author believes that the manuscript represents honest work.
Funding
This research is supported by Gansu Province Philosophy and Social Science Planning Project Periodical Achievement (2021YB019).
Author information
Authors and Affiliations
Contributions
Liu Qiang: Writing—original draft preparation, conceptualization, supervision, project administration. Zeng Zhiliang: formal analysis, methodology. Wang Lei: software, validation.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
All authors have been personally and actively involved in substantial work leading to the paper, and will take public responsibility for its content.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhiliang, Z., Lei, W. & Qiang, L. A method for real-time translation of online video subtitles in sports events. SIViP 19, 146 (2025). https://doi.org/10.1007/s11760-024-03606-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03606-2