Abstract
We present an athlete identification module forming part of a system for the personalization of sport video broadcasts. The aim of this module is the localization of athletes in the scene, their identification through the reading of names or numbers printed on their uniforms, and the labelling of frames where athletes are visible. Building upon a previously published algorithm we extract text from individual frames and read these candidates by means of an optical character recognizer (OCR). The OCR-ed text is then compared to a known list of athletes’ names (or numbers), to provide a presence score for each athlete. Text regions are tracked in subsequent frames using a template matching technique. In this way blurred or distorted text, normally unreadable by the OCR, is exploited to provide a denser labelling of the video sequences. Extensive experiments show that the method proposed is fast, robust and reliable, out-performing results of other systems in the literature.
Similar content being viewed by others
References
Andrade EL, Khan E, Woods JC, Ghanbari M (2003) Player identification in interactive sport scenes using region space analysis prior information and number recognition. In: International conference on visual information engineering, pp 57–60. Guildford, UK
Bertini M, Del Bimbo A, Nunziati W (2005) Player identification in soccer videos. In: 7th ACM SIGMM international workshop on multimedia information retrieval, pp 25–32. Singapore
Bertini M, Del Bimbo A, Nunziati W (2006) Matching faces with textual cues in soccer videos. In: International conference on multimedia and expo, pp 537–540. Toronto, Canada
Crow FC (1984) Summed-area tables for texture mapping. Comput Graph 18(3):207–212
Desolneux A, Moisan L, Morel J-M (2008) From Gestalt theory to image analysis: a probabilistic approach. Springer, New York
EU FP7 Project (2011) Real-time context-aware and personalized media streaming environments for large scale broadcasting applications. http://www.myedirector2012.eu. On-line; accessed 24 June 2011
Ezaki N, Bulacu M, Schomaker L (2004) Text detection from natural scene images: towards a system for visually impaired persons. In: International conference on pattern recognition, pp 683–686. Cambridge, UK
Jia W, He X, Piccardi M (2004) Automatic license plate recognition: a review. In: International conference on imaging science, systems and technology, pp 43–48. Las Vegas, Nevada
Jung K, Kim KI, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recogn 37(5):977–997
Kokaram A, Rea N, Dahyot R, Tekalp M, Bouthemyand P, Gros P, Sezan I (2006) Browsing sports video. IEEE Signal Process Mag 23(2):47–58
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process (Special Issue on Image and Video Processing for Digital Libraries) 9(1):147–156
Liang J, Doermann D, Li H (2005) Camera-based analysis of text and documents: a survey. Int J Doc Anal Recogn 7(2–3):84–104
Lienhart R (2003) Video OCR: a survey and practitioner’s guide. In: Video mining, pp 155–185. Kluwer
Mancas-Thillou C, Gosselin B (2007) Natural scene text understanding. In: Vision systems: segmentation and pattern recognition, pp 307–332. InTech
Merino C, Mirmehdi M (2007) A framework towards realtime detection and tracking of text. In: 2nd international workshop on camera-based document analysis and recognition, pp 10–17. Curitiba, Brazil
Messelodi S, Modena CM (1999) Automatic identification and skew estimation of text lines in real scene images. Pattern Recogn 32(5):791–810
Mirmehdi M (ed) (2005) Special issue on camera-based text and document recognition. Int J Doc Anal Recogn 7(2–3):83–200
Myers EW (1986) An O(ND) difference algorithm and its variations. Algorithmica 1(2):251–266
Myers GK, Burns B (2005) A robust method for tracking scene text in video. In: 1st international workshop camera-based document analysis and recognition, pp 30–35. Seoul, Korea
Myers GK, Bolles R, Luong Q-T, Herson J, Aradhye H (2005) Rectification and recognition of text in 3-D scenes. Int J Doc Anal Recogn 7(4):147–158
Patrikakis C, Pnevmatikakis A, Chippendale P, Nunes M, Santos Cruz R, Poslad S, Zhenchen W, Papaoulakis N, Papageorgiou P (2010) Direct your personal coverage of large athletic events. In: IEEE MultiMedia
Pnevmatikakis A, Katsarakis N, Chippendale P, Andreatta C, Messelodi S, Modena CM, Tobia F (2010) Tracking for context extraction in athletic events. In: International workshop on social, adaptive and personalized multimedia interaction and access, ACM Multimedia, pp 67–72. Florence, Italy
Rice SV, Jenkins FR, Nartker TA (1995) The fourth annual test of OCR accuracy. Technical report TR-95-03, Information Science Research Institute, University of Nevada, Las Vegas
Saric M, Dujmic H, Papic V, Rozic N, Radic J (2009) Player number recognition in soccer video using internal contours and temporal redundancy. In: 10th WSEAS international conference on automation and information, pp 175–180. Prague, Czech Republic
Sato T, Kanade T, Hughes EK, Smith MA, Satoh S (1999) Video OCR: indexing digital news libraries by recognition of superimposed captions. Multimedia Syst 7(5):385–395
Satoh S, Nakamura Y, Kanade T (1999) Name-it: naming and detecting faces in news videos. IEEE Multimedia 6(1):22–35
Shen H, Coughlan J (2006) Finding text in natural scenes by figure-ground segmentation. In: International conference on pattern recognition, pp 113–118. Hong Kong
Smith R (2007) An overview of the Tesseract OCR engine. In: 9th international conference on document analysis and recognition, pp 629–633. Curitiba, Brazil
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: International conference on computer vision and pattern recognition, pp 511–518. Kanai, Hawaii
Weinman JJ, Learned-Miller E, Hanson AR (2009) Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans Pattern Anal Mach Intell 31(10):1733–1746
Wu W, Chen X, Yang J (2005) Detection of text on road signs from video. IEEE Trans Intell Transport Syst 6(4):378–390
Yang J, Chen M-Y, Hauptmann A (2004) Finding person X: correlating names with visual appearances. In: International conference on image and video retrieval, pp 270–278. Dublin, Ireland
Ye Q, Huang Q, Jiang S, Liu Y, Gao W (2005) Jersey number detection in sports video for athlete identification. In: Visual communications and image processing, SPIE 5960, pp 1599–1606. Beijing, China
Zhang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: 8th IAPR workshop on document analysis systems, pp 5–17. Nara, Japan
Acknowledgements
This work has been supported by the European Union under the Strep Project FP7 215248: My eDirector 2012. The authors would like to thank Paul Chippendale for his careful reading of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Messelodi, S., Modena, C.M. Scene text recognition and tracking to identify athletes in sport videos. Multimed Tools Appl 63, 521–545 (2013). https://doi.org/10.1007/s11042-011-0878-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0878-y