Abstract
In this paper, the problem of text detection and recognition in videos has been addressed. We address two major issues that make it difficult to extract information in a video captured by a moving vehicle. Video captured by a moving vehicle contains a lot of blurs caused by motion which is one of the major issues preventing accurate recognition of text. The second major issue is the orientation of the text being detected, which may not be in the same plane. We propose a novel end-to-end pipeline consisting of deep networks. Our pipeline consists of a fully convolution network to detect text, Generative Adversarial Network to remove motion blur, a rectification network which makes use of Thin Spline Transformations and a Spatial Transform network to handle text which is not straight i.e. perspective and curved, and a recognition network to recognize the text. We only deblur the region around text boxes instead of complete images. We also track the text boxes in each frame to avoid re-recognition of text in consecutive frames. This significantly improves the performance of the system, as proved by higher classification scores achieved as compared to state of the art.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR (2), pp. 366–373 (2004)
Wang, T., et al.: End-to-end text recognition with convolutional neural networks. In: ICPR, pp. 3304–3308. IEEE Computer Society (2012)
Yin, X.-C., et al.: Robust Text Detection in Natural Scene Images. CoRR. abs/1301.2628 (2013)
Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2016)
Jaderberg, M., et al.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)
Susan, S., Devi, K.M.R.: Text area segmentation from document images by novel adaptive thresholding and template matching using texture cues. Pattern Anal. Appl. 23(2), 869–881 (2020)
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651. IEEE Computer Society (2017)
Epshtein, B., et al.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970. IEEE (2010)
Shi, C., et al.: Scene text recognition using part-based tree-structured character detection. In: CVPR, pp. 2961–2968. IEEE Computer Society (2013)
Almazán, J., et al.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)
Ciresan, D.C., et al.: Multi-column deep neural network for traffic sign classification. Neural Netw. 32, 333–338 (2012)
Krizhevsky, A., et al.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Antani, S.K., et al.: Robust extraction of text in video. In: ICPR, pp. 1831–1834. IEEE Computer Society (2000)
Kupyn, O., et al.: DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. CoRR. abs/1711.07064 (2017)
Shi, B., et al.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019)
Mishra, A., et al.: Top-down and bottom-up cues for scene text recognition. In: CVPR, pp. 2687–2694. IEEE Computer Society (2012)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Yang, X., et al.: Learning to read irregular text with attention mechanisms. In: Sierra, C. (ed.) IJCAI, pp. 3280–3286. ijcai.org (2017)
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)
Graves, A., et al.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural nets. In: Proceedings of the International Conference on Machine Learning, ICML 2006 (2006)
Nah, S., et al.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: CVPR, pp. 257–265. IEEE Computer Society (2017)
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493. IEEE Computer Society (2013)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160. IEEE Computer Society (2015)
Yao, C., et al.: Incidental Scene Text Understanding: Recent Progresses on ICDAR 2015 Robust Reading Competition Challenge 4. CoRR. abs/1511.09207 (2015)
Lucas, S.M., et al.: ICDAR 2003 robust reading competitions. In: ICDAR, pp. 682–687. IEEE Computer Society (2003)
Goodfellow, I.J., et al.: Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks (2013)
Danelljan, M., et al.: Accurate scale estimation for robust visual tracking. In: Valstar, M.F., et al. (eds.) BMVC. BMVA Press (2014)
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 (2014)
Jaderberg, M., et al.: Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition. CoRR. abs/1406.2227 (2014)
i Bigorda, L.G., Karatzas, D.: TextProposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recogn. 70, 60–74 (2017)
Cheng, Z., et al.: Focusing attention: towards accurate text recognition in natural images. In: ICCV. pp. 5086–5094. IEEE Computer Society (2017)
Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: CVPR, pp. 2231–2239. IEEE Computer Society (2016)
Shi, B., et al.: Robust Scene Text Recognition with Automatic Rectification. CoRR. abs/1603.03915 (2016)
Shi, B., et al.: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. CoRR. abs/1507.05717 (2015)
Jaderberg, M., et al.: Deep structured output learning for unconstrained text recognition. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015)
Mishra, A., et al.: Scene text recognition using higher order language priors. In: Bowden, R., et al. (eds.) BMVC, pp. 1–11. BMVA Press (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Anand, S., Susan, S., Aggarwal, S., Aggarwal, S., Singla, R. (2021). Scene Text Recognition in the Wild with Motion Deblurring Using Deep Networks. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1378. Springer, Singapore. https://doi.org/10.1007/978-981-16-1103-2_9
Download citation
DOI: https://doi.org/10.1007/978-981-16-1103-2_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1102-5
Online ISBN: 978-981-16-1103-2
eBook Packages: Computer ScienceComputer Science (R0)