Skip to main content
Log in

Scene text recognition and tracking to identify athletes in sport videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We present an athlete identification module forming part of a system for the personalization of sport video broadcasts. The aim of this module is the localization of athletes in the scene, their identification through the reading of names or numbers printed on their uniforms, and the labelling of frames where athletes are visible. Building upon a previously published algorithm we extract text from individual frames and read these candidates by means of an optical character recognizer (OCR). The OCR-ed text is then compared to a known list of athletes’ names (or numbers), to provide a presence score for each athlete. Text regions are tracked in subsequent frames using a template matching technique. In this way blurred or distorted text, normally unreadable by the OCR, is exploited to provide a denser labelling of the video sequences. Extensive experiments show that the method proposed is fast, robust and reliable, out-performing results of other systems in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Andrade EL, Khan E, Woods JC, Ghanbari M (2003) Player identification in interactive sport scenes using region space analysis prior information and number recognition. In: International conference on visual information engineering, pp 57–60. Guildford, UK

  2. Bertini M, Del Bimbo A, Nunziati W (2005) Player identification in soccer videos. In: 7th ACM SIGMM international workshop on multimedia information retrieval, pp 25–32. Singapore

  3. Bertini M, Del Bimbo A, Nunziati W (2006) Matching faces with textual cues in soccer videos. In: International conference on multimedia and expo, pp 537–540. Toronto, Canada

  4. Crow FC (1984) Summed-area tables for texture mapping. Comput Graph 18(3):207–212

    Article  Google Scholar 

  5. Desolneux A, Moisan L, Morel J-M (2008) From Gestalt theory to image analysis: a probabilistic approach. Springer, New York

    Book  Google Scholar 

  6. EU FP7 Project (2011) Real-time context-aware and personalized media streaming environments for large scale broadcasting applications. http://www.myedirector2012.eu. On-line; accessed 24 June 2011

  7. Ezaki N, Bulacu M, Schomaker L (2004) Text detection from natural scene images: towards a system for visually impaired persons. In: International conference on pattern recognition, pp 683–686. Cambridge, UK

  8. Jia W, He X, Piccardi M (2004) Automatic license plate recognition: a review. In: International conference on imaging science, systems and technology, pp 43–48. Las Vegas, Nevada

  9. Jung K, Kim KI, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recogn 37(5):977–997

    Article  Google Scholar 

  10. Kokaram A, Rea N, Dahyot R, Tekalp M, Bouthemyand P, Gros P, Sezan I (2006) Browsing sports video. IEEE Signal Process Mag 23(2):47–58

    Article  Google Scholar 

  11. Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process (Special Issue on Image and Video Processing for Digital Libraries) 9(1):147–156

    Google Scholar 

  12. Liang J, Doermann D, Li H (2005) Camera-based analysis of text and documents: a survey. Int J Doc Anal Recogn 7(2–3):84–104

    Article  Google Scholar 

  13. Lienhart R (2003) Video OCR: a survey and practitioner’s guide. In: Video mining, pp 155–185. Kluwer

  14. Mancas-Thillou C, Gosselin B (2007) Natural scene text understanding. In: Vision systems: segmentation and pattern recognition, pp 307–332. InTech

  15. Merino C, Mirmehdi M (2007) A framework towards realtime detection and tracking of text. In: 2nd international workshop on camera-based document analysis and recognition, pp 10–17. Curitiba, Brazil

  16. Messelodi S, Modena CM (1999) Automatic identification and skew estimation of text lines in real scene images. Pattern Recogn 32(5):791–810

    Article  Google Scholar 

  17. Mirmehdi M (ed) (2005) Special issue on camera-based text and document recognition. Int J Doc Anal Recogn 7(2–3):83–200

    Article  Google Scholar 

  18. Myers EW (1986) An O(ND) difference algorithm and its variations. Algorithmica 1(2):251–266

    Article  MathSciNet  MATH  Google Scholar 

  19. Myers GK, Burns B (2005) A robust method for tracking scene text in video. In: 1st international workshop camera-based document analysis and recognition, pp 30–35. Seoul, Korea

  20. Myers GK, Bolles R, Luong Q-T, Herson J, Aradhye H (2005) Rectification and recognition of text in 3-D scenes. Int J Doc Anal Recogn 7(4):147–158

    Article  Google Scholar 

  21. Patrikakis C, Pnevmatikakis A, Chippendale P, Nunes M, Santos Cruz R, Poslad S, Zhenchen W, Papaoulakis N, Papageorgiou P (2010) Direct your personal coverage of large athletic events. In: IEEE MultiMedia

  22. Pnevmatikakis A, Katsarakis N, Chippendale P, Andreatta C, Messelodi S, Modena CM, Tobia F (2010) Tracking for context extraction in athletic events. In: International workshop on social, adaptive and personalized multimedia interaction and access, ACM Multimedia, pp 67–72. Florence, Italy

  23. Rice SV, Jenkins FR, Nartker TA (1995) The fourth annual test of OCR accuracy. Technical report TR-95-03, Information Science Research Institute, University of Nevada, Las Vegas

  24. Saric M, Dujmic H, Papic V, Rozic N, Radic J (2009) Player number recognition in soccer video using internal contours and temporal redundancy. In: 10th WSEAS international conference on automation and information, pp 175–180. Prague, Czech Republic

  25. Sato T, Kanade T, Hughes EK, Smith MA, Satoh S (1999) Video OCR: indexing digital news libraries by recognition of superimposed captions. Multimedia Syst 7(5):385–395

    Article  Google Scholar 

  26. Satoh S, Nakamura Y, Kanade T (1999) Name-it: naming and detecting faces in news videos. IEEE Multimedia 6(1):22–35

    Article  Google Scholar 

  27. Shen H, Coughlan J (2006) Finding text in natural scenes by figure-ground segmentation. In: International conference on pattern recognition, pp 113–118. Hong Kong

  28. Smith R (2007) An overview of the Tesseract OCR engine. In: 9th international conference on document analysis and recognition, pp 629–633. Curitiba, Brazil

  29. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: International conference on computer vision and pattern recognition, pp 511–518. Kanai, Hawaii

  30. Weinman JJ, Learned-Miller E, Hanson AR (2009) Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans Pattern Anal Mach Intell 31(10):1733–1746

    Article  Google Scholar 

  31. Wu W, Chen X, Yang J (2005) Detection of text on road signs from video. IEEE Trans Intell Transport Syst 6(4):378–390

    Article  Google Scholar 

  32. Yang J, Chen M-Y, Hauptmann A (2004) Finding person X: correlating names with visual appearances. In: International conference on image and video retrieval, pp 270–278. Dublin, Ireland

  33. Ye Q, Huang Q, Jiang S, Liu Y, Gao W (2005) Jersey number detection in sports video for athlete identification. In: Visual communications and image processing, SPIE 5960, pp 1599–1606. Beijing, China

  34. Zhang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: 8th IAPR workshop on document analysis systems, pp 5–17. Nara, Japan

Download references

Acknowledgements

This work has been supported by the European Union under the Strep Project FP7 215248: My eDirector 2012. The authors would like to thank Paul Chippendale for his careful reading of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carla Maria Modena.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Messelodi, S., Modena, C.M. Scene text recognition and tracking to identify athletes in sport videos. Multimed Tools Appl 63, 521–545 (2013). https://doi.org/10.1007/s11042-011-0878-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-011-0878-y

Keywords

Navigation