Abstract
Text extraction from videos is an emerging research field in the document analysis community. We propose a simple Convolutional Recurrent Neural Network to perform text recognition on both Arabic and Urdu scripts. We use a large variety of data augmentation techniques to generalize the model and prevent over-fitting. We also use a slightly improved loss function that helps the model converge faster. Using the proposed method we achieved 99.73% CRR, 88.37% WRR and 89.92% LRR on the Urdu Ticker Text dataset and 96.82% CRR, 90.41% WRR and 76.78% LRR on the AcTiVComp20 dataset. The proposed method has significantly outperformed Google Vision API on both of the datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The Urdu Ticker Text Dataset will be available publicly at https://tukl.seecs.nust.edu.pk/downloads.html.
References
Competition on superimposed text detection and recognition in Arabic news video frames. https://diuf.unifr.ch/main/diva/AcTiVComp/index.html. Accessed 23 Feb 2021
Detect text in images | cloud vision API | google cloud. https://cloud.google.com/vision/docs/ocr. Accessed 26 May 2021
Ahmad, I., Wang, X., Li, R., Ahmed, M., Ullah, R.: Line and ligature segmentation of Urdu Nastaleeq text. IEEE Access 5, 1–17 (2017)
Al-Wzwazy, H.: Handwritten digit recognition using convolutional neural networks. Int. J. Innovative Res. Comput. Commun. Eng. 4, 1101–1106 (2016)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. Publ. IEEE Neural Netw. Council 5, 157–66 (1994)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, vol. 2006, pp. 369–376 (2006)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, vol. 7 (2015)
Informatik, F., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks (2003)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, vol. 25 (2012). https://doi.org/10.1145/3065386
Melnikoff, S.J., Quigley, S.F., Russell, M.J.: Implementing a hidden Markov Model speech recognition system in programmable logic. In: Brebner, G., Woods, R. (eds.) FPL 2001. LNCS, vol. 2147, pp. 81–90. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44687-7_9
Mollah, A., Majumder, N., Basu, S., Nasipuri, M.: Design of an optical character recognition system for camera-based handheld devices. Int. J. Comput. Sci. Issues, vol. 8 (2011)
Rehman, A., Hussain, S.: Large scale font independent Urdu text recognition system (2020)
Sabbour, N., Shafait, F.: A segmentation free approach to Arabic and Urdu OCR. In: Proceedings of SPIE - The International Society for Optical Engineering, vol. 8658 (2013)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Ul-Hasan, A., Shafait, F., Breuel, T.: High-performance OCR for printed English and fraktur using lstm networks. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (2013)
Ul-Hasan, A., Ahmed, S., Rashid, S.F., Shafait, F., Breuel, T.: Offline printed Urdu Nastaleeq script recognition with bidirectional LSTM networks (2013)
Ur-Rehman, S., Tayyab, B., Naeem, M., Ul-Hasan, A., Shafait, F.: A multi-faceted OCR framework for artificial Urdu news ticker text recognition. In: 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, Vienna, Austria, 24–27 April 2018, pp. 211–216. IEEE Computer Society (2018)
Xie, Z., Sun, Z., Jin, L., Feng, Z., Zhang, S.: Fully convolutional recurrent network for handwritten Chinese text recognition (2016)
Yanikoglu, B., Sandon, P.: Off-line cursive handwriting recognition using style parameters (1970)
Yuan, T.L., Zhu, Z., Xu, K., Li, C.J., Hu, S.M.: Chinese text in the wild (2018)
Zayene, O., Hennebert, J., Ingold, R., Essoukri Ben Amara, N.: ICDAR 2017 competition on Arabic text detection and recognition in multi-resolution video frames, pp. 1460–1465 (2017)
Acknowledgement
This work has been partially funded by the Higher Education Commission of Pakistan’s grant for National Center of Artificial Intelligence (NCAI).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Rehman, A., Ul-Hasan, A., Shafait, F. (2021). High Performance Urdu and Arabic Video Text Recognition Using Convolutional Recurrent Neural Networks. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12916. Springer, Cham. https://doi.org/10.1007/978-3-030-86198-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-86198-8_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86197-1
Online ISBN: 978-3-030-86198-8
eBook Packages: Computer ScienceComputer Science (R0)