High Performance Urdu and Arabic Video Text Recognition Using Convolutional Recurrent Neural Networks

Rehman, Abdul; Ul-Hasan, Adnan; Shafait, Faisal

doi:10.1007/978-3-030-86198-8_24

Abdul Rehman¹⁰,
Adnan Ul-Hasan¹¹ &
Faisal Shafait^10,11

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12916))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1722 Accesses

Abstract

Text extraction from videos is an emerging research field in the document analysis community. We propose a simple Convolutional Recurrent Neural Network to perform text recognition on both Arabic and Urdu scripts. We use a large variety of data augmentation techniques to generalize the model and prevent over-fitting. We also use a slightly improved loss function that helps the model converge faster. Using the proposed method we achieved 99.73% CRR, 88.37% WRR and 89.92% LRR on the Urdu Ticker Text dataset and 96.82% CRR, 90.41% WRR and 76.78% LRR on the AcTiVComp20 dataset. The proposed method has significantly outperformed Google Vision API on both of the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Learning Architectures Applied on Arabic-Adapted Scripts: A Review

Multi-lingual Scene Text Detection Containing the Arabic Scripts Using an Optimal then Enhanced YOLO Model

Attention Mechanism in Convolutional Recurrent Neural Network for Improving Recognition Accuracy in Printed Devanagari Text

Notes

1.
The Urdu Ticker Text Dataset will be available publicly at https://tukl.seecs.nust.edu.pk/downloads.html.

References

Competition on superimposed text detection and recognition in Arabic news video frames. https://diuf.unifr.ch/main/diva/AcTiVComp/index.html. Accessed 23 Feb 2021
Detect text in images | cloud vision API | google cloud. https://cloud.google.com/vision/docs/ocr. Accessed 26 May 2021
Ahmad, I., Wang, X., Li, R., Ahmed, M., Ullah, R.: Line and ligature segmentation of Urdu Nastaleeq text. IEEE Access 5, 1–17 (2017)
Article Google Scholar
Al-Wzwazy, H.: Handwritten digit recognition using convolutional neural networks. Int. J. Innovative Res. Comput. Commun. Eng. 4, 1101–1106 (2016)
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. Publ. IEEE Neural Netw. Council 5, 157–66 (1994)
Article Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, vol. 2006, pp. 369–376 (2006)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, vol. 7 (2015)
Google Scholar
Informatik, F., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks (2003)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, vol. 25 (2012). https://doi.org/10.1145/3065386
Melnikoff, S.J., Quigley, S.F., Russell, M.J.: Implementing a hidden Markov Model speech recognition system in programmable logic. In: Brebner, G., Woods, R. (eds.) FPL 2001. LNCS, vol. 2147, pp. 81–90. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44687-7_9
Chapter MATH Google Scholar
Mollah, A., Majumder, N., Basu, S., Nasipuri, M.: Design of an optical character recognition system for camera-based handheld devices. Int. J. Comput. Sci. Issues, vol. 8 (2011)
Google Scholar
Rehman, A., Hussain, S.: Large scale font independent Urdu text recognition system (2020)
Google Scholar
Sabbour, N., Shafait, F.: A segmentation free approach to Arabic and Urdu OCR. In: Proceedings of SPIE - The International Society for Optical Engineering, vol. 8658 (2013)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Ul-Hasan, A., Shafait, F., Breuel, T.: High-performance OCR for printed English and fraktur using lstm networks. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (2013)
Google Scholar
Ul-Hasan, A., Ahmed, S., Rashid, S.F., Shafait, F., Breuel, T.: Offline printed Urdu Nastaleeq script recognition with bidirectional LSTM networks (2013)
Google Scholar
Ur-Rehman, S., Tayyab, B., Naeem, M., Ul-Hasan, A., Shafait, F.: A multi-faceted OCR framework for artificial Urdu news ticker text recognition. In: 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, Vienna, Austria, 24–27 April 2018, pp. 211–216. IEEE Computer Society (2018)
Google Scholar
Xie, Z., Sun, Z., Jin, L., Feng, Z., Zhang, S.: Fully convolutional recurrent network for handwritten Chinese text recognition (2016)
Google Scholar
Yanikoglu, B., Sandon, P.: Off-line cursive handwriting recognition using style parameters (1970)
Google Scholar
Yuan, T.L., Zhu, Z., Xu, K., Li, C.J., Hu, S.M.: Chinese text in the wild (2018)
Google Scholar
Zayene, O., Hennebert, J., Ingold, R., Essoukri Ben Amara, N.: ICDAR 2017 competition on Arabic text detection and recognition in multi-resolution video frames, pp. 1460–1465 (2017)
Google Scholar

Download references

Acknowledgement

This work has been partially funded by the Higher Education Commission of Pakistan’s grant for National Center of Artificial Intelligence (NCAI).

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, National University of Sciences and Technology (NUST), Islamabad, Pakistan
Abdul Rehman & Faisal Shafait
Deep Learning Laboratory, National Center of Artificial Intelligence, Lahore, Pakistan
Adnan Ul-Hasan & Faisal Shafait

Authors

Abdul Rehman
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Ul-Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Faisal Shafait
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdul Rehman .

Editor information

Editors and Affiliations

Boise State University, Boise, ID, USA
Elisa H. Barney Smith
Indian Statistical Institute, Kolkata, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rehman, A., Ul-Hasan, A., Shafait, F. (2021). High Performance Urdu and Arabic Video Text Recognition Using Convolutional Recurrent Neural Networks. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12916. Springer, Cham. https://doi.org/10.1007/978-3-030-86198-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-86198-8_24
Published: 04 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86197-1
Online ISBN: 978-3-030-86198-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

High Performance Urdu and Arabic Video Text Recognition Using Convolutional Recurrent Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning Architectures Applied on Arabic-Adapted Scripts: A Review

Multi-lingual Scene Text Detection Containing the Arabic Scripts Using an Optimal then Enhanced YOLO Model

Attention Mechanism in Convolutional Recurrent Neural Network for Improving Recognition Accuracy in Printed Devanagari Text

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

High Performance Urdu and Arabic Video Text Recognition Using Convolutional Recurrent Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning Architectures Applied on Arabic-Adapted Scripts: A Review

Multi-lingual Scene Text Detection Containing the Arabic Scripts Using an Optimal then Enhanced YOLO Model

Attention Mechanism in Convolutional Recurrent Neural Network for Improving Recognition Accuracy in Printed Devanagari Text

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation