Abstract
Detecting text portion from scene images can be found to be one of the prevalent research topics. Text detection is considered challenging and non-interoperable since there could be multiple scripts in a scene image. Each of these scripts can have different properties, therefore, it is crucial to research the scene text detection based on the geographical location owing to different scripts. As no work on large-scale multi-script Thai scene text detection is found in the literature, the work conducted in this study focuses on multi-script text that includes Thai, English (Roman), Chinese or Chinese-like script, and Arabic. These scripts can generally be seen around Thailand. Thai script contains more consonants, vowels, and has numerals when compared to the Roman/ English script. Furthermore, the placement of letters, intonation marks, as well as vowels, are different from English or Chinese-like script. Hence, it could be considered challenging to detect and recognise the Thai text. This study proposed a multi-script dataset which includes the aforementioned scripts and numerals, along with a benchmarking employing Single Shot Multi-Box Detector (SSD) and Faster Regions with Convolutional Neural Networks (F-RCNN). The proposed dataset contains scene images which were recorded in Thailand. The dataset consists of 600 images, together with their manual detection annotation. This study also proposed a detection technique hypothesising a multiscript scene text detection problem as a multi-class detection problem which found to work more effective than legacy approaches. The experimental results from employing the proposed technique with the dataset achieved encouraging precision and recall rates when compared with such methods. The proposed dataset is available upon email request to the corresponding authors.
Similar content being viewed by others
References
Suwanwiwat H, Das A, Ferrer M, Pal U, Blumenstein M (2018) An investigation of discrete Hidden Markov Models on handwritten short answer assessment system. Proceedings of the International Conference on Pattern Recognition and Artificial Intelligence, Montréal, Canada. (https://users.encs.concordia.ca/~icprai18/ICPRAI%202018%20Proceedings.pdf)
Bahlmann C, Zhu Y, Ramesh V, Pellkofer M, Koehler T (2005) A system for traffic sign detection, tracking, and recognition using color, shape, and motion information. In: IEEE proceedings. Intelligent vehicles symposium, 2005, pp 255–260
Chowdhury MMA, Deb K (2013) Article: extracting and segmenting container name from container images. Int J Comput Appl 74(19):18–22. Full text available
Chumuang N, Ketcham M (2014) Intelligent handwriting thai signature recognition system based on artificial neuron network. In: TENCON 2014—2014 IEEE Region 10 conference, pp 1–6
Das A, Ferrer M A, Pal U, Pal S, Diaz M, Blumenstein M (2016) Multi-script versus single-script scenarios in automatic off-line signature verification. IET Biom 5(4):305–313
Das A, Suwanwiwat H, Ferrer M A, Pal U, Blumenstein M (2018) Thai automatic signature verification system employing textural features. IET Biom 7(6):615–627
Das A, Suwanwiwat H, Pal U, Blumenstein M (2018) Icfhr 2020 competition on short answerassessment and thai student signature and namecomponents recognition and verification (sasigcom 2020). In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR). IEEE, pp 500–505
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2010.5540041, pp 2963–2970
Fahn C, Chang P (2013) Text plates detection and recognition techniques used for an autonomous robot navigating in indoor environments. In: 2013 6th IEEE conference on robotics, automation and mechatronics (RAM), pp 37–42
Fung C C, Chamchong R (2010) A review of evaluation of optimal binarization technique for character segmentation in historical manuscripts. In: 2010 third international conference on knowledge discovery and data mining, pp 236–240
He Z, Liu J, Ma H, Li P (2003) A new automatic extraction method of container identity codes. In: Proceedings of the 2003 IEEE International conference on intelligent transportation systems, vol 2, pp 1688–1691
Jirattitichareon W, Chalidabhongse T H (2006) Automatic detection and segmentation of text in low quality thai sign images. In: APCCAS 2006—2006 IEEE Asia Pacific conference on circuits and systems, pp 1000–1003
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S, Shafait F, Uchida S, Valveny E (2015) Icdar 2015 competition on robust reading. In: 2015 13th International conference on document analysis and recognition (ICDAR), pp 1156–1160
Kobchaisawat T, Chalidabhongse T H (2014) Thai text localization in natural scene images using convolutional neural network. In: Signal and information processing association annual summit and conference (APSIPA), 2014 Asia-Pacific, pp 1–7
Lee J, Lee P, Lee S, Yuille A, Koch C (2011) Adaboost for text detection in natural scene. In: 2011 International conference on document analysis and recognition, pp 429–434. https://doi.org/10.1109/ICDAR.2011.93
Li G, Jiang D, Zhou Y, Jiang G, Kong J, Manogaran G (2019) Human lesion detection method based on image information and brain signal. IEEE Access 7:11533–11542
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. Lecture Notes in Computer Science, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Long S, He X, Yao C (2020) Scene Text Detection and Recognition: The Deep Learning Era. Int J Comput Vis. https://doi.org/10.1007/s11263-020-01369-0, (https://link.springer.com/article/10.1007/s11263-020-01369-0)
Lucas S M (2005) Icdar 2005 text locating competition results. In: Eighth international conference on document analysis and recognition (ICDAR’05), vol 1, pp 80–84
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) Icdar 2003 robust reading competitions. In: Proceedings of the seventh international conference on document analysis and recognition—volume 2, ICDAR ’03. IEEE Computer Society, Washington, DC, p 682. http://dl.acm.org/citation.cfm?id=938980.939531
Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie JC, Lin Liu C, Ogier JM (2019) ICDAR2019 Robust Reading Challenge on Multilingual Scene Text Detection and Recognition RRC-MLT-2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, pp 1582–1587. https://doi.org/10.1109/ICDAR.2019.00254, https://ieeexplore.ieee.org/abstract/document/8978096
Parkinson C, Jacobson J J, Ferguson D, Pombo S Instant translation system (U.S. Patent 9,507,772 B2, Nov. 2016)
Phokharatkul P, Kimpan C (1998) Recognition of handprinted thai characters using the cavity features of character based on neural network. In: IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific conference on circuits and systems. Microelectronics and integrating systems. Proceedings (Cat. No.98EX242), pp 149–152
Phokharatkul P, Kimpan C (2002) Handwritten thai character recognition using fourier descriptors and genetic neural networks. Comput Intell 18 (3):270–293
Phokharatkul P, Sankhuangaw K, Somkuarnpanit S, Phaiboon S, Kimpan C (2007) Off-line hand written Thai character recognition using antminer algorithm. Int J Comput Inf Syst Control Eng 1(8):2596–2601
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, R. Garnett (eds). Advances in Neural Information Processing Systems. Curran Associates, Inc., pp. 91–99, https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf
Sanguansat P, Asdornwised W, Jitapunkul S (2004) Online thai handwritten character recognition using hidden markov models and support vector machines. In: IEEE International symposium on communications and information technology, 2004. ISCIT 2004, vol 1, pp 492–497
Shahab A, Shafait F, Dengel A (2011) Icdar 2011 robust reading competition challenge 2: reading text in scene images. In: 2011 International conference on document analysis and recognition, pp 1491–1496
Shi C, Wang C, Xiao B, Gao S, Hu J (2014) End-to-end scene text recognition using tree-structured models. Pattern Recognit 47 (9):2853–2866. https://doi.org/10.1016/j.patcog.2014.03.023. http://www.sciencedirect.com/science/article/pii/S0031320314001216
Suwanwiwat H, Das A, Ferrer M A, Pal U, Blumenstein M (2017) An automatic student verification system utilising off-line thai name components. In: 2017 International conference on digital image computing: techniques and applications (DICTA), pp 1–6
Suwanwiwat H, Das A, Pal U, Blumenstein M (2018) Icfhr 2018 competition on thai student signatures and name components recognition and verification (tsncrv2018). In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR). IEEE, pp 500–505
Theeramunkong T, Wongtapan C, Sinthupinyo S (2002) Offline isolated handwritten thai ocr using island-based projection with n-gram models and hidden markov models. In: Lim E P, Foo S, Khoo C, Chen H, Fox E, Urs S, Costantino T (eds) Digital libraries: people, knowledge, and technology. Springer, Berlin, pp 340–351
Tzutalin: Labelimg. git code (2015). Available: https://github.com/tzutalin/labelImg, Accessed 09 Oct 2019
Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 International conference on computer vision, pp 1457–1464
Wong E K, Chen M (2003) A new robust algorithm for video text extraction. Pattern Recognit 36(6):1397–1406
Wongsirichot T, Seekaew P, Arnon P, Żołnierek A (2011) Thai character recognition using “snakecut” technique. In: Burduk R, Kurzyński M, Woźniak M (eds) Computer recognition systems 4. Springer, Berlin, pp 747–755
Woraratpanya K, Boonchukusol P, Kuroki Y, Kato Y (2013) Improved thai text detection from natural scenes. In: 2013 International conference on information technology and electrical engineering (ICITEE), pp 137–142
Woraratpanya K, Pasupa K, Suttapakti U, Boonchukusol P, Titijaroonroj T, Hokking R, Kuroki Y, Kato Y (2014) Text-background decomposition for thai text localization and recognition in natural scenes. In: 2014 6th International conference on information technology and electrical engineering (ICITEE), pp 1–6
Yang J, Chen X, Zhang J, Zhang Y, Waibel A (2002) Automatic detection and translation of text from natural scenes. In: 2002 IEEE International conference on acoustics, speech, and signal processing, vol 2, pp II–2101–II–2104
Yin XC, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983. https://doi.org/10.1109/tpami.2013.182
Yuan T L, Zhu Z, Xu K, Li C J, Mu T J, Hu S M (2019) A large chinese text dataset in the wild. J Comput Sci Technol 34(3):509–521
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5551–5560. (https://openaccess.thecvf.com/content_cvpr_2017/html/Zhou_EAST_An_Efficient_CVPR_2017_paper.html)
Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Hemmaphan Suwanwiwat and Abhijit Das have equal contribution.
Rights and permissions
About this article
Cite this article
Suwanwiwat, H., Das, A., Saqib, M. et al. Benchmarked multi-script Thai scene text dataset and its multi-class detection solution. Multimed Tools Appl 80, 11843–11863 (2021). https://doi.org/10.1007/s11042-020-10143-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10143-w