Abstract
Signboard detection and recognition is an important task in automated context-aware marketing. Recently many scripting languages like Latin, Japanese, and Chinese have been effectively detected by several machine learning algorithms. As compared to other languages, outdoor Urdu text needs further attention in detection and recognition due to its cursive nature. Urdu detection and recognition are also difficult due to a wide variety of illuminations, low resolution, inconsistent font styles, color, and backgrounds. To overcome the deficiency of Urdu text detection from the outdoor environment, we have proposed a new Urdu-text signboard dataset with 467 ligature categories, containing a 30 + K images for recognition and 700 base images with annotation are created for detection. We also propose a methodology, that consists of 3-phases. In first phase text regions containing Urdu ligatures from shop-signboard images are detected by a faster regional convolutional neural network (FasterRCNN) using pre-trained CNNs like Alexnet and Vgg16. In the second phase detected regions from the first phase are clustered to identify unique ligatures in a dataset. Lastly in the third phase, all detected regions are recognized by 18-layer convolutional neural network trained model. The proposed system has successfully achieved the precision and recall of 87% and 96% respectively using vgg16 model for detection. For the classification of ligatures, a recognition rate of 97.50% is achieved. Recognition of ligatures was also evaluated using bilingual evaluation understudy (BLEU), and achieved an encouraging score of 0.96 on the newly developed Urdu-Signboard dataset.














Similar content being viewed by others
References
Ackley HS (2019) Methods for optical character recognition (OCR). US Patent Application No. 15/793:407
Ahmad I, Wang X, Li R, Rasheed S (2017) Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder. China Communications 14(1):146–157
Ahmed SB, Naz S, Razzak MI, Yousaf R (2017) Deep learning based isolated Arabic scene character recognition. In: 1st International Workshop on Arabic Script Analysis and Recognition (ASAR). IEEE, pp 46–51
Akram QUA, Hussain S (2017) Ligature-based font size independent OCR for Noori Nastalique writing style. 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), IEEE, pp 129–133
Ali A, Pickering MA (2019) Hybrid deep neural network for Urdu text recognition in natural images. In: 4th International Conference on Image. Vision and Computing (ICIVC), IEEE, pp 321–325
Ali A, Pickering M (2019) Urdu-text: A dataset and benchmark for Urdu text detection and recognition in natural scenes. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 323–328
Ali T, Ahmad T, Imran M (2016) UOCR: A ligature based approach for an Urdu OCR system. 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, pp 388–394
Ali A, Pickering M, Shafi K (2018) Urdu natural scene character recognition using convolutional neural networks. In: 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), IEEE edn, pp 29–34
Arafat SY, Iqbal MJ (2019) Two stream deep neural network for sequence-based Urdu ligature recognition. IEEE Access 7:159090–159099
Arafat SY, Iqbal MJ (2020) Urdu-text detection and recognition in natural scene images using deep learning. IEEE Access 8:96787–96803
Arora A, Chang CC, Rekabdar B, Povey D, Etter D, Raj D, Hadian H, Trmal J, Garcia P (2019) Using ASR methods for OCR. 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp 663–668
Baran R, Partila P, Wilk R (2018) Automated text detection and character recognition in natural scenes based on local image features and contour processing techniques. International Conference on Intelligent Human Systems Integration. Springer, pp 42–48
Beeferman D, Berger A (2000) Agglomerative clustering of a search engine query log. Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 407–416
Bhowmik S, Sarkar R, Nasipuri M, Doermann D (2018) Text and non-text separation in offline document images: a survey. International Journal on Document Analysis and Recognition (IJDAR) 21(1-2):1–20
Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp 858–867
Breuel TM, Ul-Hasan A, Al-Azawi MA, Shafait F (2013) High-performance OCR for printed English and Fraktur using LSTM networks. 12th International Conference on Document Analysis and Recognition, IEEE, pp 683–687
Chandio AA, Pickering M (2019) Convolutional Feature Fusion for Multi-Language Text Detection in Natural Scene Images. In: 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET). IEEE, pp 1–6
Chandio AA, Pickering M, Shafi K (2018) Character classification and recognition for Urdu texts in natural scene images. In: International Conference on Computing, Mathematics and Engineering Technologies (iCoMET). IEEE, pp 1–6
Chandio AA, Leghari M, Memon MA, Leghari M, Jalbani AH (2020) A database for Urdu text detection and recognition in natural scene images. Mehran University Research Journal of Engineering and Technology 39(1):47–54
Chandio AA, Asikuzzaman M, Pickering M, Leghari M (2020) Cursive-text: A comprehensive dataset for end-to-end Urdu text recognition in natural scene images. Data in Brief 105749
Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 18th International Conference on Image Processing. IEEE, pp 2609–2612
Dang S, Wen M, Mumtaz S, Li J, Li C (2020) Enabling Multi-carrier relay selection by sensing fusion and cascaded ANN for intelligent vehicular communications. IEEE Sensors Journal
Darab M, Rahmati M (2012) A hybrid approach to localize farsi text in natural scene images. Procedia Comput Sci 13:171–184
Das D, Philip J, Mathew M, Jawahar C (2019) A cost efficient approach to correct OCR errors in large document collections. In: International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 655–662
Devlin J, Cheng H, Fang H, Gupta S, Deng L, He X, Zweig G, Mitchell M (2015) Language models for image captioning: the quirks and what works. arXiv preprint:1505.01809
Din IU, Siddiqi I, Khalid S, Azam T (2017) Segmentation-free optical character recognition for printed Urdu text. EURASIP J Image Vide 2017(1):62
Dreyer M, Marcu D (2012) Hyter: Meaning-equivalent semantics for translation evaluation. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 162–171
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. Computer society conference on computer vision and pattern recognition. IEEE, pp 2963–2970
He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. IEEE International Conference on Computer Vision, pp 3047–3055
He W, Zhang X-Y, Yin F, Liu C-L (2017) Deep direct regression for Multi-oriented scene text detection. Proceedings of the IEEE International Conference on Computer Vision. IEEE, pp 745–753
Hong T, Hull JJ (1995) Algorithms for postprocessing OCR results with visual inter-word constraints. International Conference on Image Processing. IEEE, pp 312–315
Horie F, Goto H (2018) Synthetic scene character generator and multi-scale voting classifier for Japanese scene character recognition. In: International Conference on Image and Vision Computing New Zealand (IVCNZ). IEEE, pp 1–6
Hosozawa K, Wijaya RH, Linh TD, Seya H, Arai M, Maekawa T, Mizutani K (2018) Recognition of expiration dates written on food packages with open source OCR. International Journal of Computer Theory and Engineering 10(5):170–174
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint 1602.07360
Iqbal MS, Ahmad I, Bin L, Khan S, Rodrigues JJ (2020) Deep learning recognition of diseased and normal cell representation. T Emerg Telecommun T: e4017
Jamil AJ, Batool A, Malik Z, Mirza A, Siddiqi I (2016) Multilingual artificial text extraction and script identification from video images. Int J Adv Comput Sci Appl 1(7):529–539
Javed ST, Hussain S, Maqbool A, Asloob S, Jamil S, Moin H (2010) Segmentation free nastalique Urdu OCR. World Acad Sci Eng Technol 46:456–461
Khan WQ, Khan RQ (2015) Urdu optical character recognition technique using point feature matching; a generic approach. In: International Conference on Information and Communication Technologies (ICICT). IEEE, pp 1–7
Khan S, Ali H, Ullah Z, Minallah N, Maqsood S, Hafeez A (2019) Higher accurate recognition of handwritten Pashto letters through zoning feature by using K-nearest neighbour and artificial neural network. arXiv preprint:1904.03391
Khattak IU, Siddiqi I, Khalid S, Djeddi C (2015) Recognition of Urdu ligatures-a holistic approach. In: 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 71–75
Kolton A, Bentov A (2019) Location based optical character recognition (OCR). U.S. Patent and Trademark Office. US Patent No. 10,489,671
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. arXiv preprint:1611.06779
Liu Y, Jin L (2017) Deep matching prior network: toward tighter multi-oriented text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1962–1969
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single shot multibox detector. European Conference on Computer Vision, Springer, pp 21–37
Long S, He X, Ya C (2018) Scene text detection and recognition: the deep learning era. arXiv preprint:1811.04256
Mahmood A, Srivastava A (2018) A novel segmentation technique for urdu type-written text. In: Recent advances on engineering, technology and computational sciences (RAETCS). IEEE, pp 1–5
Mirza A, Fayyaz M, Seher Z, Siddiqi I (2018) Urdu caption text detection using textural features. In: 2nd Mediterranean Conference on Pattern Recognition and Artificial Intelligence. ACM, pp 70–75
Mittal A, Roy PP, Singh P, Raman B (2017) Rotation and script independent text detection from video frames using sub pixel mapping. J Vis Commun Image R 46:187–198
Naz S, Hayat K, Anwar MW, Akbar H, Razzak MI (2013) Challenges in baseline detection of cursive script languages. Science and information conference. IEEE, pp 551–556
Naz S, Umar AI, Ahmed R, Razzak MI, Rashid SF, Shafait F (2016) Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. SpringerPlus 5(1):2010
Naz S, Umar AI, Ahmad R, Siddiqi I, Ahmed SB, Razzak MI, Shafait F (2017) Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243:80–87
Neumann L, Matas J (2012) Real-time scene text localization and recognition. IEEE Conference on Computer Vision And Pattern Recognition. IEEE, pp 3538–3545
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: A method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318
Qassim H, Verma A, Feinzimer D (2018) Compressed residual-VGG16 CNN model for big data places image recognition. In: 8th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, pp 169–175
Rafeeq MJ, ur Rehman Z, Khan A, Khan IA, Jadoon W (2019) Ligature categorization based Nastaliq Urdu recognition using deep neural networks. Comput Math Organ Theory 25(2):184–195
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in neural information processing systems, pp 91–99
Rong X, Yi C, Tian Y (2017) Unambiguous text localization and retrieval for cluttered scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 5494–5502
Samaee M, Tavakoli H (2017) Farsi text localization in natural scene images. International Journal of Computer Science and Information Security 15(2):22
Sami Ur R, Tayyab BU, Naeem MF, Ul-Hasan A, Shafait FA (2018) Multi-faceted OCR Framework for artificial Urdu news ticker text recognition. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), 24–27 April 2018. pp 211–216. https://doi.org/10.1109/DAS.2018.83
Sanjrani AA, Baber J, Bakhtyar M, Noor W, Khalid M (2016) Handwritten optical character recognition system for Sindhi numerals. In: 2016 International Conference on Computing. Electronic and Electrical Engineering (ICE Cube), IEEE, pp 262–267
Shabbir S (2016) Optical character recognition system for Urdu words in nastaliq font. Int J Adv Comput Sci Appl 7(5):567–576
Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. Conference on Computer Vision and Pattern Recognition. IEEE, pp 2550–2558
Sriman B, Schomaker L (2019) Multi-script text versus non-text classification of regions in scene images. J Vis Commun Image Represent 62:23–42
Sulaiman Khan HA, Ullah Z, Minallah N, Maqsood S, Hafeez A (2018) KNN and ANN-based recognition of handwritten Pashto letters using zoning features. Machine Learning 9(10)
Sun X, Wu P, Hoi SC (2018) Face detection using deep learning: an improved faster RCNN approach. Neurocomputing 299:42–50
Tounsi M, Moalla I, Alimi AM, Lebouregois F (2015) Arabic characters recognition in natural scenes using sparse coding for feature representations. In: 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1036–1040
Unar S, Jalbani AH, Jawaid MM, Shaikh M, Chandio AA (2018) Artificial Urdu text detection and localization from individual video frames. Mehran University Research Journal of Engineering and Technology 37(2):429–438
Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 International Conference on Computer Vision. IEEE, pp 1457–1464
Wang Q, Liu M, Zhang W, Guo Y, Li T (2019) Automatic proofreading in chinese: detect and correct spelling errors in character-level with deep neural networks. CCF International Conference on Natural Language Processing and Chinese Computing. Springer, pp 349–359
Yan C, Xie H, Liu S, Yin J, Zhang Y, Dai Q (2017) Effective Uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst 19(1):220–229
Yan C, Xie H, Chen J, Zha Z, Hao X, Zhang Y, Dai Q (2018) A fast Uyghur text detector for complex background images. IEEE T Multimedia 20(12):3389–3398
Yan S, Xie Y, Wu F, Smith JS, Lu W, Zhang B (2020) Image captioning via hierarchical attention mechanism and policy gradient optimization. Signal Process 167:107329
Yao T, Pan Y, Li Y, Mei T (2019) Hierarchy parsing for image captioning. Proceedings of the IEEE International Conference on Computer Vision, pp 2621–2629
Ye Q, Doermann D (2014) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500
Zaman S, Anwar K, Khan R (2016) Image character through signal and pattern formation. In: 13th learning and technology conference (L&T). IEEE, pp 1–6
Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. Conference on Computer Vision and Pattern Recognition. IEEE, pp 2558–2567
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. Conference on Computer Vision and Pattern Recognition. IEEE, pp 4159–4167
Zhang C, Peng G, Tao Y, Fu F, Jiang W, Almpanidis G, Chen K (2019) ShopSign: a diverse scene text dataset of Chinese shop signs in street views. arXiv preprint arXiv:1903.10412
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: An efficient and accurate scene text detector. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 5551–5560
Acknowledgments
The authors would like to acknowledge Higher Education Commission (HEC) for supporting this work under their NRPU Project No. 6338. This work was also supported by FCT/MCTES through national funds and when applicable co-funded EU funds under the Project UIDB/EEA/50008/2020; and by the Brazilian National Council for Research and Development (CNPq) via Grants No. 309335/2017-5.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Arafat, S.Y., Ashraf, N., Iqbal, M.J. et al. Urdu signboard detection and recognition using deep learning. Multimed Tools Appl 81, 11965–11987 (2022). https://doi.org/10.1007/s11042-020-10175-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10175-2