Skip to main content
Log in

A method for real-time translation of online video subtitles in sports events

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

This study offers a fresh technique for translating subtitles in sports events, addressing the issues of real-time translation with improved accuracy and efficiency. Different from standard methods, which often result in delayed or inaccurate subtitles, the proposed method integrates advanced annotation techniques and machine learning algorithms to increase subtitle recognition and extraction. Annotation techniques in this study include systematically labeling spoken elements like commentary and dialogue, enabling accurate subtitle recognition and real-time adjustments in live sports broadcasts to ensure both accuracy and contextual relevance. These novel ideas allow for seamless adjustments to multiple language types, including the voices of commentators, off-site hosts, and athletes, while maintaining critical information within strict word count limits. Key improvements include faster processing times and increased translation precision, which are crucial for the dynamic environment of live sports broadcasts. The study builds on past studies in audiovisual translation, specifically tailoring its strategy to the unique demands of sports media. By emphasizing the importance of clear and contextually appropriate real-time subtitles, this research presents significant advancements over existing methods, providing valuable insights for future translation projects in sports and similar contexts. The results contribute to a more effective subtitle translation framework, enhancing the accessibility and viewing experience for audiences during live sports events.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

Abbreviations

NLP:

Natural language processing

CV:

Computer vision

ML:

Machine learning

DL:

Deep learning

SLR:

Systematic literature review

NER:

Named entity recognition

DHH:

Deaf and hard of hearing

IBM:

International Business Machines Corporation

HMM:

Hidden Markov model

EM:

Expectation–maximization

GB:

Gigabytes

OCR:

Optical character recognition

BP:

Length-based penalty factor

Bleu:

Bilingual evaluation understudy

LS:

Length statistics

CLM:

Character-level modeling

ES:

Experimental setup

TT:

Training time

ASD:

Abnormal subtitle displays

RA:

Relationship analysis

N:

Gray level series of the image/total pixels

z:

Detection framework

S:

Mean value of the gray level difference of the adjacent frames of the whole video

\(\left( {P_{r} \left( {\overline{e}_{l} ,\overline{f}} \right)} \right)\) :

Number of times the phrase pair appears in the corpus

M:

Length of the video frame sequence

r:

Window size

W:

Inter-frame difference measurement of each frame

g:

Function

A:

Gray value histogram

L:

Inter-frame difference measurement of each frame

j:

Frame index

k:

Cumulative number of blocks

D:

Euclidean distance

F:

Sobel gradient amplitude

k:

The weighting factor for Sobel operator

L:

Inter-frame difference measurement of each frame

∏:

Product operator

G:

Horizontal template for convolution

g:

Vertical template for convolution

x,y:

Pixel coordinates

E:

Translation probability estimation

ξ:

Normalization factor

γ:

Number of times a phrase appears in the target sentence

δ:

Translation probability

τ:

Number of times a word appears in the target sentence

n:

Number of word pairs

E:

Translation probability estimation

aj (x):

Gray value histogram of frame J

bk (y):

Gray value histogram of frame g

Fj (x,y):

Gray value at pixel point (x,y) in frame j

Fk (x,y):

Gray value at pixel point (x,y) in frame k

Gx :

Sobel gradient in the horizontal direction

Gy :

Sobel gradient in the vertical direction

U1 :

Gradient matrix from horizontal template

U2 :

Gradient matrix from vertical template

\(Ecount\) \(\left( {P_{r} \left( {\overline{e}_{l} ,\overline{f}} \right)} \right)\) :

Parallel bilingual phrase pair

w(ei,fi):

The Lexicalized weighted feature between words ei and fi

count(fi,ei):

Number of times the word pair (fi,ei) appears in the corpus

wn :

Corresponding weight of co-occurrence n-ary words.

pn :

Precision of n-ary words

References

  1. Zhang, B., Chen, D.: Resource scheduling of green communication network for large sports events based on edge computing. Comput. Commun. 159, 299–309 (2020)

    Article  MATH  Google Scholar 

  2. Zhang, H., Li, Y., Zhang, H.: Risk early warning safety model for sports events based on back propagation neural network machine learning. Saf. Sci. 118, 332–336 (2019)

    Article  MATH  Google Scholar 

  3. Le, T.M., Le, V., Venkatesh, S., Tran, T.: Hierarchical conditional relation networks for multimodal video question answering. Int. J. Comput. Vis. 129(11), 3027–3050 (2021)

    Article  MATH  Google Scholar 

  4. Yan, H., Xu, X.: End-to-end video subtitle recognition via a deep residual neural network. Pattern Recognit. Lett. 131, 368–375 (2020)

    Article  MATH  Google Scholar 

  5. Barbero, J.M., de la Riva, I.R., Páez, M.S.S.: Multilanguage subtitle platform for production, distribution and diffusion of live sports events. Technol. Disabil. 27, 127–139 (2015). https://doi.org/10.3233/TAD-150435

    Article  MATH  Google Scholar 

  6. Pražák, A., Loose, Z., Psutka, J.V., Radová, V., Psutka, J.: Live TV subtitling through respeaking with remote cutting-edge technology. Multimed. Tools Appl. 79(1), 1203–1220 (2020). https://doi.org/10.1007/s11042-019-08235-3

    Article  Google Scholar 

  7. Khan, A.A., Shao, J., Ali, W., Tumrani, S.: Content-aware summarization of broadcast sports videos: an audio-visual feature extraction approach. Neural. Process. Lett. 52(3), 1945–1968 (2020). https://doi.org/10.1007/s11063-020-10200-3

    Article  MATH  Google Scholar 

  8. Petrova, X.Y., Anisimovsky, V.V., Rychagov, M.N.: Real-time detection of sports broadcasts using video content analysis. In: Rychagov, M.N., Tolstaya, E.V., Sirotenko, M.Y. (eds.) Smart Algorithms for Multimedia and Imaging, pp. 193–217. Springer International Publishing, Cham (2021)

    Chapter  MATH  Google Scholar 

  9. Bastas, G., Kaliakatsos-Papakostas, M., Paraskevopoulos, G., Kaplanoglou, P., Christantonis, K., Tsioustas, C., Mastrogiannopoulos, D., Panga, D., Fotinea, E., Katsamanis, A.: Towards a DHH accessible theater: real-time synchronization of subtitles and sign language videos with ASR and NLP solutions. In: Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments, pp. 653–661 (2022)

  10. Moores, Z.: The NERLE model—a tool for assessing the quality of intralingual subtitles at live events. Univers. Access Inf. Soc. 23(2), 589–607 (2024). https://doi.org/10.1007/s10209-023-01050-6

    Article  MATH  Google Scholar 

  11. Mkhallati, H., Cioppa, A., Giancola, S., Ghanem, B., Van Droogenbroeck, M.: Soccernet-caption: dense video captioning for soccer broadcasts commentaries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5073–5084 (2023)

  12. Masiello-Ruiz, J.M., Ruiz-Mezcua, B., Martinez, P., Gonzalez-Carrasco, I.: Synchro-Sub, an adaptive multi-algorithm framework for real-time subtitling synchronisation of multi-type TV programmes. Computing 105(7), 1467–1495 (2023)

    Article  Google Scholar 

  13. Kehkashan, T., Alsaeedi, A., Yafooz, W.M.S., Ismail, N.A., Al-Dhaqm, A.: Combinatorial analysis of deep learning and machine learning video captioning studies: a systematic literature review. IEEE Access. 12, 35048–35080 (2024). https://doi.org/10.1109/ACCESS.2024.3357980

    Article  Google Scholar 

  14. Andrews, P., Nordberg, O.E., Borch, N., Guribye, F., Fjeld, M.: Designing for automated sports commentary systems. In: Proceedings of the 2024 ACM International Conference on Interactive Media Experiences, pp. 75–93 (2024)

  15. Campos, V.P., de Araújo, T.M.U., de Souza Filho, G.L., Gonçalves, L.M.G.: CineAD: a system for automated audio description script generation for the visually impaired. Univers. Access Inf. Soc. 19, 99–111 (2020)

    Article  Google Scholar 

  16. Salem, N., Alharbi, S., Khezendar, R., Alshami, H.: Real-time glove and android application for visual and audible Arabic sign language translation. Proc. Comput. Sci. 163, 450–459 (2019)

    Article  Google Scholar 

  17. Tian, M., Guan, B., Xing, Z., Fraundorfer, F.: Efficient ego-motion estimation for multi-camera systems with decoupled rotation and translation. Ieee Access. 8, 153804–153814 (2020)

    Article  Google Scholar 

  18. Manjunath, A., Li, H., Song, S., Zhang, Z., Liu, S., Kahrobai, N., Gowda, A., Seffens, A., Zou, J., Kumar, I.: Comprehensive analysis of 2.4 million patent-to-research citations maps the biomedical innovation and translation landscape. Nat. Biotechnol. 39(6), 678–683 (2021)

    Article  Google Scholar 

  19. Chen, J., Brunner, A.-D., Cogan, J.Z., Nuñez, J.K., Fields, A.P., Adamson, B., Itzhak, D.N., Li, J.Y., Mann, M., Leonetti, M.D.: Pervasive functional translation of noncanonical human open reading frames. Science (1979) 367(6482), 1140–1146 (2020)

    Google Scholar 

  20. Li, H., Sha, J., Shi, C.: Revisiting back-translation for low-resource machine translation between Chinese and Vietnamese. IEEE Access. 8, 119931–119939 (2020)

    Article  MATH  Google Scholar 

  21. Araújo, M., Pereira, A., Benevenuto, F.: A comparative study of machine translation for multilingual sentence-level sentiment analysis. Inf. Sci. (N Y). 512, 1078–1102 (2020)

    Article  MATH  Google Scholar 

  22. Su, J., Chen, J., Jiang, H., Zhou, C., Lin, H., Ge, Y., Wu, Q., Lai, Y.: Multi-modal neural machine translation with deep semantic interactions. Inf. Sci. (N Y). 554, 47–60 (2021)

    Article  MathSciNet  Google Scholar 

  23. Liu, C.-H., Karakanta, A., Tong, A.N., Aulov, O., Soboroff, I.M., Washington, J., Zhao, X.: Introduction to the second issue on machine translation for low-resource languages. Mach. Transl. 35, 1–2 (2021)

    Article  Google Scholar 

  24. Guo, D., Zhou, W., Li, A., Li, H., Wang, M.: Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation. IEEE Trans. Image Process. 29, 1575–1590 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  25. Tao, R., Li, Z., Tao, R., Li, B.: ResAttr-GAN: unpaired deep residual attributes learning for multi-domain face image translation. IEEE Access. 7, 132594–132608 (2019)

    Article  MATH  Google Scholar 

  26. Chatzikoumi, E.: How to evaluate machine translation: a review of automated and human metrics. Nat. Lang. Eng. 26(2), 137–161 (2020)

    Article  Google Scholar 

  27. Castilho, S., Gaspari, F., Moorkens, J., Popović, M., Toral, A.: Editors’ foreword to the special issue on human factors in neural machine translation. Mach. Transl. 33(1–2), 1–7 (2019)

    Article  MATH  Google Scholar 

  28. Yuan, R., Zhang, Z., Song, P., Zhang, J., Qin, L.: Construction of virtual video scene and its visualization during sports training. IEEE Access. 8, 124999–125012 (2020)

    Article  MATH  Google Scholar 

  29. Felipe, J.L., Garcia-Unanue, J., Viejo-Romero, D., Navandar, A., Sánchez-Sánchez, J.: Validation of a video-based performance analysis system (Mediacoach®) to analyze the physical demands during matches in LaLiga. Sensors. 19(19), 4113 (2019)

    Article  Google Scholar 

  30. Jian, M., Zhang, S., Wu, L., Zhang, S., Wang, X., He, Y.: Deep key frame extraction for sport training. Neurocomputing 328, 147–156 (2019)

    Article  MATH  Google Scholar 

  31. Jiang, T.-Q., Xu, X.-M., Zhang, Q.-C., Wang, Z.: A sentiment classification model based on bi-directional LSTM with positional attention for fresh food consumer reviews. In: 2020 IEEE 20th International Conference on Software Quality, Reliability and Security Companion (QRS-C), pp. 589–594. IEEE (2020)

  32. Liu, W.L., Yang, H.: Improved simulation research of dynamic data fusion algorithm. Comput. Simul. 37(4), 294–297 (2020)

    MATH  Google Scholar 

  33. Lingxin, K., Yajun, M.: Big data adaptive migration and fusion simulation based on fuzzy matrix. Comput. Simul. 37(3), 4 (2020)

    MATH  Google Scholar 

Download references

Acknowledgements

The manuscript has been read and approved by all the authors, the requirements for authorship, as stated earlier in this document, have been met, and each author believes that the manuscript represents honest work.

Funding

This research is supported by Gansu Province Philosophy and Social Science Planning Project Periodical Achievement (2021YB019).

Author information

Authors and Affiliations

Authors

Contributions

Liu Qiang: Writing—original draft preparation, conceptualization, supervision, project administration. Zeng Zhiliang: formal analysis, methodology. Wang Lei: software, validation.

Corresponding author

Correspondence to Liu Qiang.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

All authors have been personally and actively involved in substantial work leading to the paper, and will take public responsibility for its content.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhiliang, Z., Lei, W. & Qiang, L. A method for real-time translation of online video subtitles in sports events. SIViP 19, 146 (2025). https://doi.org/10.1007/s11760-024-03606-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03606-2

Keywords

Navigation