Skip to main content

ICDAR 2021 Competition on Script Identification in the Wild

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Abstract

The paper presents a summary of the 1st Competition on Script Identification in the Wild (SIW 2021) organised in conjunction with 16th International Conference on Document Analysis and Recognition (ICDAR 2021). The goal of SIW is to evaluate the limits of script identification approaches through a large scale in the wild database including 13 scripts (MDIW-13 dataset) and two different scenarios (handwritten and printed). The competition includes the evaluation over three different tasks depending of the nature of the data used for training and testing. Nineteen research groups registered for SIW 2021, out of which 6 teams from both academia and industry took part in the final round and submitted a total of 166 algorithms for scoring. Submissions included a wide variety of deep-learning solutions as well as approaches based on standard image processing techniques. The performance achieved by the participants prove the elevate accuracy of deep learning methods in comparison with traditional statistical approaches. The best approach obtained classification accuracies of 99% in all three tasks with experiments over more than 50K test samples. The results suggest that there is still room for improvements, specially over handwritten samples and specific scripts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://gpds.ulpgc.es/.

  2. 2.

    56 is 224/4, and 224 is CNN’s default input width.

References

  1. Eikvil, L.: Optical character recognition, p. 26 (1993). citeseer.ist.psu.edu/142042.html

  2. Das, A., et al.: Multi-script versus single-script scenarios in automatic off-line signature verification. IET Biometr. 5(4), 305–313 (2016)

    Article  Google Scholar 

  3. Das, A., et al.: Thai automatic signature verification system employing textural features. IET Biometr. 7(6), 615–627 (2018)

    Article  Google Scholar 

  4. Ferrer, M.A., et al.: Multiple training-one test methodology for handwritten word-script identification. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 754–759. IEEE (2014)

    Google Scholar 

  5. Suwanwiwat, H., et al.: Benchmarked multi-script Thai scene text dataset and its multi-class detection solution. Multimed. Tools Appl. 80(8), 11843–11863 (2021)

    Article  Google Scholar 

  6. Keserwani, P., et al.: Zero shot learning based script identification in the wild. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 987–992. IEEE (2019)

    Google Scholar 

  7. Ghosh, D., et al.: Script recognition–a review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)

    Article  Google Scholar 

  8. Bhunia, A., et al.: Indic handwritten script identification using offline-online multi-modal deep network. Inform. Fusion 57, 1–14 (2020)

    Article  Google Scholar 

  9. Ubul, K., et al.: Script identification of multi-script documents: a survey. IEEE Access 5, 6546–6559 (2017)

    Google Scholar 

  10. Obaidullah, S.M., et al.: Handwritten Indic script identification in multi-script document images: a survey. IJPRAI 32(10), 1856012 (2018)

    Google Scholar 

  11. Brunessaux, S., et al.: The maurdor project: improving automatic processing of digital documents. In: DAS, pp. 349–354. IEEE (2014)

    Google Scholar 

  12. Singh, P.K., et al.: Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images. Multimed. Tools ad Appl. 77(7), 8441–8473 (2018)

    Article  Google Scholar 

  13. Sharma, N., et al.: ICDAR 2015 competition on video script identification (CVSI 2015). In: ICDAR, pp. 1196–1200. IEEE (2015)

    Google Scholar 

  14. Nayef, N., et al.: ICDAR 2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. In: ICDAR, pp. 1582–1587. IEEE (2019)

    Google Scholar 

  15. Nayef, N., et al.: ICDAR 2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In: ICDAR, vol. 1, pp. 1454–1459. IEEE (2017)

    Google Scholar 

  16. Kumar, D., et al.: Multi-script robust reading competition in ICDAR 2013. In: 4th International Workshop on Multilingual OCR, pp. 1–5 (2013)

    Google Scholar 

  17. Djeddi, C., et al.: ICDAR 2015 competition on multi-script writer identification and gender classification using ‘QUWI’ database. In: ICDAR, pp. 1191–1195. IEEE (2015)

    Google Scholar 

  18. Ferrer, M.A., et al.: MDIW-13 multiscript document database (2019)

    Google Scholar 

  19. Ferrer, M.A. et al.: MDIW-13: New database and benchmark for script identification

    Google Scholar 

  20. Hu, T., et al.: See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification. arXiv preprint arXiv:1901.09891 (2019)

  21. Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 834–849. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_54

    Chapter  Google Scholar 

  22. Wei, X., et al.: Mask-CNN: localizing parts and selecting descriptors for fine-grained image recognition. arXiv preprint arXiv:1605.06878 (2016)

  23. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  24. He, K., et al.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  25. Long, J., et al.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)

    Google Scholar 

  26. Chen, L., et al.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp. 801–818 (2018)

    Google Scholar 

  27. Zhang, H., et al.: ResNeSt: split-attention networks. arXiv preprint arXiv:2004.08955 (2020)

  28. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)

  29. Sun, K., et al.: Deep high-resolution representation learning for human pose estimation. In: CVPR, pp. 5693–5703 (2019)

    Google Scholar 

  30. Yuan, Y., et al.: Object-contextual representations for semantic segmentation. arXiv preprint arXiv:1909.11065 (2019)

  31. Dai, J., et al.: Deformable convolutional networks. In: CVPR, pp. 764–773 (2017)

    Google Scholar 

  32. Lee, D., et al.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: ICMLW, vol. 3 (2013)

    Google Scholar 

  33. Lin, T., et al.: Focal loss for dense object detection. In: CVPR, pp. 2980–2988 (2017)

    Google Scholar 

  34. Luo, C., et al.: Learn to augment: joint data augmentation and network optimization for text recognition. In: CVPR, pp. 13746–13755 (2020)

    Google Scholar 

  35. Huang, G., et al.: Densely connected convolutional networks. In: CVPR, pp. 2261–2269 (2017)

    Google Scholar 

  36. Woo, S., et al.: CBAM: convolutional block attention module. In: ECCV, pp. 3–19 (2018)

    Google Scholar 

  37. He, K., et al.: Deep residual learning for image recognition, pp. 2980–2988 (2017)

    Google Scholar 

  38. Le, Q., Tan, M.: EfficientNet: rethinking model scaling for convolutional neural networks, pp. 6105–6114 (2019)

    Google Scholar 

  39. Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. CoRR, abs/1905.11946 (2019)

    Google Scholar 

  40. Isola, P., et al.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhijit Das .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Das, A. et al. (2021). ICDAR 2021 Competition on Script Identification in the Wild. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86337-1_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86336-4

  • Online ISBN: 978-3-030-86337-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics