Skip to main content

High Performance Urdu and Arabic Video Text Recognition Using Convolutional Recurrent Neural Networks

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 Workshops (ICDAR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12916))

Included in the following conference series:

Abstract

Text extraction from videos is an emerging research field in the document analysis community. We propose a simple Convolutional Recurrent Neural Network to perform text recognition on both Arabic and Urdu scripts. We use a large variety of data augmentation techniques to generalize the model and prevent over-fitting. We also use a slightly improved loss function that helps the model converge faster. Using the proposed method we achieved 99.73% CRR, 88.37% WRR and 89.92% LRR on the Urdu Ticker Text dataset and 96.82% CRR, 90.41% WRR and 76.78% LRR on the AcTiVComp20 dataset. The proposed method has significantly outperformed Google Vision API on both of the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The Urdu Ticker Text Dataset will be available publicly at https://tukl.seecs.nust.edu.pk/downloads.html.

References

  1. Competition on superimposed text detection and recognition in Arabic news video frames. https://diuf.unifr.ch/main/diva/AcTiVComp/index.html. Accessed 23 Feb 2021

  2. Detect text in images | cloud vision API | google cloud. https://cloud.google.com/vision/docs/ocr. Accessed 26 May 2021

  3. Ahmad, I., Wang, X., Li, R., Ahmed, M., Ullah, R.: Line and ligature segmentation of Urdu Nastaleeq text. IEEE Access 5, 1–17 (2017)

    Article  Google Scholar 

  4. Al-Wzwazy, H.: Handwritten digit recognition using convolutional neural networks. Int. J. Innovative Res. Comput. Commun. Eng. 4, 1101–1106 (2016)

    Google Scholar 

  5. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. Publ. IEEE Neural Netw. Council 5, 157–66 (1994)

    Article  Google Scholar 

  6. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, vol. 2006, pp. 369–376 (2006)

    Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, vol. 7 (2015)

    Google Scholar 

  8. Informatik, F., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks (2003)

    Google Scholar 

  9. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, vol. 25 (2012). https://doi.org/10.1145/3065386

  10. Melnikoff, S.J., Quigley, S.F., Russell, M.J.: Implementing a hidden Markov Model speech recognition system in programmable logic. In: Brebner, G., Woods, R. (eds.) FPL 2001. LNCS, vol. 2147, pp. 81–90. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44687-7_9

    Chapter  MATH  Google Scholar 

  11. Mollah, A., Majumder, N., Basu, S., Nasipuri, M.: Design of an optical character recognition system for camera-based handheld devices. Int. J. Comput. Sci. Issues, vol. 8 (2011)

    Google Scholar 

  12. Rehman, A., Hussain, S.: Large scale font independent Urdu text recognition system (2020)

    Google Scholar 

  13. Sabbour, N., Shafait, F.: A segmentation free approach to Arabic and Urdu OCR. In: Proceedings of SPIE - The International Society for Optical Engineering, vol. 8658 (2013)

    Google Scholar 

  14. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  15. Ul-Hasan, A., Shafait, F., Breuel, T.: High-performance OCR for printed English and fraktur using lstm networks. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (2013)

    Google Scholar 

  16. Ul-Hasan, A., Ahmed, S., Rashid, S.F., Shafait, F., Breuel, T.: Offline printed Urdu Nastaleeq script recognition with bidirectional LSTM networks (2013)

    Google Scholar 

  17. Ur-Rehman, S., Tayyab, B., Naeem, M., Ul-Hasan, A., Shafait, F.: A multi-faceted OCR framework for artificial Urdu news ticker text recognition. In: 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, Vienna, Austria, 24–27 April 2018, pp. 211–216. IEEE Computer Society (2018)

    Google Scholar 

  18. Xie, Z., Sun, Z., Jin, L., Feng, Z., Zhang, S.: Fully convolutional recurrent network for handwritten Chinese text recognition (2016)

    Google Scholar 

  19. Yanikoglu, B., Sandon, P.: Off-line cursive handwriting recognition using style parameters (1970)

    Google Scholar 

  20. Yuan, T.L., Zhu, Z., Xu, K., Li, C.J., Hu, S.M.: Chinese text in the wild (2018)

    Google Scholar 

  21. Zayene, O., Hennebert, J., Ingold, R., Essoukri Ben Amara, N.: ICDAR 2017 competition on Arabic text detection and recognition in multi-resolution video frames, pp. 1460–1465 (2017)

    Google Scholar 

Download references

Acknowledgement

This work has been partially funded by the Higher Education Commission of Pakistan’s grant for National Center of Artificial Intelligence (NCAI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdul Rehman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rehman, A., Ul-Hasan, A., Shafait, F. (2021). High Performance Urdu and Arabic Video Text Recognition Using Convolutional Recurrent Neural Networks. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12916. Springer, Cham. https://doi.org/10.1007/978-3-030-86198-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86198-8_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86197-1

  • Online ISBN: 978-3-030-86198-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics