Abstract
Most OCR (Optical Character Recognition) systems developed to recognize texts embedded in multimedia documents segment the text into characters before recognizing them. In this paper, we propose a novel approach able to avoid any explicit character segmentation. Using a multi-scale scanning scheme, texts extracted from videos are first represented by sequences of learnt features. Obtained representations are then used to feed a connectionist recurrent model specifically designed to take into account dependencies between successive learnt features and to recognize texts. The proposed video OCR evaluated on a database of TV news videos achieves very high recognition rates. Experiments also demonstrate that, for our recognition task, learnt feature representations perform better than hand-crafted features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Casey, R., Lecolinet, E.: A survey of methods and strategies in character segmentation. PAMI 18(7), 690–706 (2002)
Chen, D., Odobez, J., Bourlard, H.: Text detection and recognition in images and video frames. PR 37(3), 595–608 (2004)
Elagouni, K., Garcia, C., Mamalet, F., Sébillot, P.: Combining multi-scale character recognition and linguistic knowledge for natural scene text OCR. In: DAS, pp. 120–124 (2012)
Elagouni, K., Garcia, C., Sébillot, P.: A comprehensive neural-based approach for text recognition in videos using natural language processing. In: ICMR (2011)
Gers, F., Schraudolph, N., Schmidhuber, J.: Learning precise timing with lstm recurrent networks. JMLR 3(1), 115–143 (2003)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)
Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. PAMI 31(5), 855–868 (2009)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8) (1997)
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks. MIT Press (1995)
Lienhart, R., Effelsberg, W.: Automatic text segmentation and text recognition for video indexing. Multimedia Systems 8(1), 69–81 (2000)
Saidane, Z., Garcia, C.: Automatic scene text recognition using a convolutional neural network. In: ICBDAR, pp. 100–106 (2007)
Yi, J., Peng, Y., Xiao, J.: Using multiple frame integration for the text recognition of video. In: ICDAR, pp. 71–75 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elagouni, K., Garcia, C., Mamalet, F., Sébillot, P. (2012). Text Recognition in Videos Using a Recurrent Connectionist Approach. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds) Artificial Neural Networks and Machine Learning – ICANN 2012. ICANN 2012. Lecture Notes in Computer Science, vol 7553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33266-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-33266-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33265-4
Online ISBN: 978-3-642-33266-1
eBook Packages: Computer ScienceComputer Science (R0)