Skip to main content
Log in

Benchmarked multi-script Thai scene text dataset and its multi-class detection solution

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Detecting text portion from scene images can be found to be one of the prevalent research topics. Text detection is considered challenging and non-interoperable since there could be multiple scripts in a scene image. Each of these scripts can have different properties, therefore, it is crucial to research the scene text detection based on the geographical location owing to different scripts. As no work on large-scale multi-script Thai scene text detection is found in the literature, the work conducted in this study focuses on multi-script text that includes Thai, English (Roman), Chinese or Chinese-like script, and Arabic. These scripts can generally be seen around Thailand. Thai script contains more consonants, vowels, and has numerals when compared to the Roman/ English script. Furthermore, the placement of letters, intonation marks, as well as vowels, are different from English or Chinese-like script. Hence, it could be considered challenging to detect and recognise the Thai text. This study proposed a multi-script dataset which includes the aforementioned scripts and numerals, along with a benchmarking employing Single Shot Multi-Box Detector (SSD) and Faster Regions with Convolutional Neural Networks (F-RCNN). The proposed dataset contains scene images which were recorded in Thailand. The dataset consists of 600 images, together with their manual detection annotation. This study also proposed a detection technique hypothesising a multiscript scene text detection problem as a multi-class detection problem which found to work more effective than legacy approaches. The experimental results from employing the proposed technique with the dataset achieved encouraging precision and recall rates when compared with such methods. The proposed dataset is available upon email request to the corresponding authors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Suwanwiwat H, Das A, Ferrer M, Pal U, Blumenstein M (2018) An investigation of discrete Hidden Markov Models on handwritten short answer assessment system. Proceedings of the International Conference on Pattern Recognition and Artificial Intelligence, Montréal, Canada. (https://users.encs.concordia.ca/~icprai18/ICPRAI%202018%20Proceedings.pdf)

  2. Bahlmann C, Zhu Y, Ramesh V, Pellkofer M, Koehler T (2005) A system for traffic sign detection, tracking, and recognition using color, shape, and motion information. In: IEEE proceedings. Intelligent vehicles symposium, 2005, pp 255–260

  3. Chowdhury MMA, Deb K (2013) Article: extracting and segmenting container name from container images. Int J Comput Appl 74(19):18–22. Full text available

    Google Scholar 

  4. Chumuang N, Ketcham M (2014) Intelligent handwriting thai signature recognition system based on artificial neuron network. In: TENCON 2014—2014 IEEE Region 10 conference, pp 1–6

  5. Das A, Ferrer M A, Pal U, Pal S, Diaz M, Blumenstein M (2016) Multi-script versus single-script scenarios in automatic off-line signature verification. IET Biom 5(4):305–313

    Article  Google Scholar 

  6. Das A, Suwanwiwat H, Ferrer M A, Pal U, Blumenstein M (2018) Thai automatic signature verification system employing textural features. IET Biom 7(6):615–627

    Article  Google Scholar 

  7. Das A, Suwanwiwat H, Pal U, Blumenstein M (2018) Icfhr 2020 competition on short answerassessment and thai student signature and namecomponents recognition and verification (sasigcom 2020). In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR). IEEE, pp 500–505

  8. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2010.5540041, pp 2963–2970

  9. Fahn C, Chang P (2013) Text plates detection and recognition techniques used for an autonomous robot navigating in indoor environments. In: 2013 6th IEEE conference on robotics, automation and mechatronics (RAM), pp 37–42

  10. Fung C C, Chamchong R (2010) A review of evaluation of optimal binarization technique for character segmentation in historical manuscripts. In: 2010 third international conference on knowledge discovery and data mining, pp 236–240

  11. He Z, Liu J, Ma H, Li P (2003) A new automatic extraction method of container identity codes. In: Proceedings of the 2003 IEEE International conference on intelligent transportation systems, vol 2, pp 1688–1691

  12. Jirattitichareon W, Chalidabhongse T H (2006) Automatic detection and segmentation of text in low quality thai sign images. In: APCCAS 2006—2006 IEEE Asia Pacific conference on circuits and systems, pp 1000–1003

  13. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S, Shafait F, Uchida S, Valveny E (2015) Icdar 2015 competition on robust reading. In: 2015 13th International conference on document analysis and recognition (ICDAR), pp 1156–1160

  14. Kobchaisawat T, Chalidabhongse T H (2014) Thai text localization in natural scene images using convolutional neural network. In: Signal and information processing association annual summit and conference (APSIPA), 2014 Asia-Pacific, pp 1–7

  15. Lee J, Lee P, Lee S, Yuille A, Koch C (2011) Adaboost for text detection in natural scene. In: 2011 International conference on document analysis and recognition, pp 429–434. https://doi.org/10.1109/ICDAR.2011.93

  16. Li G, Jiang D, Zhou Y, Jiang G, Kong J, Manogaran G (2019) Human lesion detection method based on image information and brain signal. IEEE Access 7:11533–11542

    Article  Google Scholar 

  17. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. Lecture Notes in Computer Science, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

  18. Long S, He X, Yao C (2020) Scene Text Detection and Recognition: The Deep Learning Era. Int J Comput Vis. https://doi.org/10.1007/s11263-020-01369-0, (https://link.springer.com/article/10.1007/s11263-020-01369-0)

  19. Lucas S M (2005) Icdar 2005 text locating competition results. In: Eighth international conference on document analysis and recognition (ICDAR’05), vol 1, pp 80–84

  20. Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) Icdar 2003 robust reading competitions. In: Proceedings of the seventh international conference on document analysis and recognition—volume 2, ICDAR ’03. IEEE Computer Society, Washington, DC, p 682. http://dl.acm.org/citation.cfm?id=938980.939531

  21. Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie JC, Lin Liu C, Ogier JM (2019) ICDAR2019 Robust Reading Challenge on Multilingual Scene Text Detection and Recognition RRC-MLT-2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, pp 1582–1587. https://doi.org/10.1109/ICDAR.2019.00254, https://ieeexplore.ieee.org/abstract/document/8978096

  22. Parkinson C, Jacobson J J, Ferguson D, Pombo S Instant translation system (U.S. Patent 9,507,772 B2, Nov. 2016)

  23. Phokharatkul P, Kimpan C (1998) Recognition of handprinted thai characters using the cavity features of character based on neural network. In: IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific conference on circuits and systems. Microelectronics and integrating systems. Proceedings (Cat. No.98EX242), pp 149–152

  24. Phokharatkul P, Kimpan C (2002) Handwritten thai character recognition using fourier descriptors and genetic neural networks. Comput Intell 18 (3):270–293

    Article  Google Scholar 

  25. Phokharatkul P, Sankhuangaw K, Somkuarnpanit S, Phaiboon S, Kimpan C (2007) Off-line hand written Thai character recognition using antminer algorithm. Int J Comput Inf Syst Control Eng 1(8):2596–2601

    Google Scholar 

  26. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, R. Garnett (eds). Advances in Neural Information Processing Systems. Curran Associates, Inc., pp. 91–99, https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf

  27. Sanguansat P, Asdornwised W, Jitapunkul S (2004) Online thai handwritten character recognition using hidden markov models and support vector machines. In: IEEE International symposium on communications and information technology, 2004. ISCIT 2004, vol 1, pp 492–497

  28. Shahab A, Shafait F, Dengel A (2011) Icdar 2011 robust reading competition challenge 2: reading text in scene images. In: 2011 International conference on document analysis and recognition, pp 1491–1496

  29. Shi C, Wang C, Xiao B, Gao S, Hu J (2014) End-to-end scene text recognition using tree-structured models. Pattern Recognit 47 (9):2853–2866. https://doi.org/10.1016/j.patcog.2014.03.023. http://www.sciencedirect.com/science/article/pii/S0031320314001216

    Article  Google Scholar 

  30. Suwanwiwat H, Das A, Ferrer M A, Pal U, Blumenstein M (2017) An automatic student verification system utilising off-line thai name components. In: 2017 International conference on digital image computing: techniques and applications (DICTA), pp 1–6

  31. Suwanwiwat H, Das A, Pal U, Blumenstein M (2018) Icfhr 2018 competition on thai student signatures and name components recognition and verification (tsncrv2018). In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR). IEEE, pp 500–505

  32. Theeramunkong T, Wongtapan C, Sinthupinyo S (2002) Offline isolated handwritten thai ocr using island-based projection with n-gram models and hidden markov models. In: Lim E P, Foo S, Khoo C, Chen H, Fox E, Urs S, Costantino T (eds) Digital libraries: people, knowledge, and technology. Springer, Berlin, pp 340–351

  33. Tzutalin: Labelimg. git code (2015). Available: https://github.com/tzutalin/labelImg, Accessed 09 Oct 2019

  34. Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 International conference on computer vision, pp 1457–1464

  35. Wong E K, Chen M (2003) A new robust algorithm for video text extraction. Pattern Recognit 36(6):1397–1406

    Article  Google Scholar 

  36. Wongsirichot T, Seekaew P, Arnon P, Żołnierek A (2011) Thai character recognition using “snakecut” technique. In: Burduk R, Kurzyński M, Woźniak M (eds) Computer recognition systems 4. Springer, Berlin, pp 747–755

  37. Woraratpanya K, Boonchukusol P, Kuroki Y, Kato Y (2013) Improved thai text detection from natural scenes. In: 2013 International conference on information technology and electrical engineering (ICITEE), pp 137–142

  38. Woraratpanya K, Pasupa K, Suttapakti U, Boonchukusol P, Titijaroonroj T, Hokking R, Kuroki Y, Kato Y (2014) Text-background decomposition for thai text localization and recognition in natural scenes. In: 2014 6th International conference on information technology and electrical engineering (ICITEE), pp 1–6

  39. Yang J, Chen X, Zhang J, Zhang Y, Waibel A (2002) Automatic detection and translation of text from natural scenes. In: 2002 IEEE International conference on acoustics, speech, and signal processing, vol 2, pp II–2101–II–2104

  40. Yin XC, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983. https://doi.org/10.1109/tpami.2013.182

    Article  Google Scholar 

  41. Yuan T L, Zhu Z, Xu K, Li C J, Mu T J, Hu S M (2019) A large chinese text dataset in the wild. J Comput Sci Technol 34(3):509–521

    Article  Google Scholar 

  42. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5551–5560. (https://openaccess.thecvf.com/content_cvpr_2017/html/Zhou_EAST_An_Efficient_CVPR_2017_paper.html)

  43. Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hemmaphan Suwanwiwat or Abhijit Das.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Hemmaphan Suwanwiwat and Abhijit Das have equal contribution.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suwanwiwat, H., Das, A., Saqib, M. et al. Benchmarked multi-script Thai scene text dataset and its multi-class detection solution. Multimed Tools Appl 80, 11843–11863 (2021). https://doi.org/10.1007/s11042-020-10143-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10143-w

Keywords

Navigation