Skip to main content

ICDAR 2023 Competition on Video Text Reading for Dense and Small Text

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Abstract

Recently, video text detection, tracking and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenario, while ignore extreme video texts challenges, i.e., dense and small text in various scenarios. In this competition report, we establish a video text reading benchmark, named DSText, which focuses on dense and small text reading challenge in the video with various scenarios. Compared with the previous datasets, the proposed dataset mainly include three new challenges: 1) Dense video texts, new challenge for video text spotter. 2) High-proportioned small texts. 3) Various new scenarios, e.g., ‘Game’, ‘Sports’, etc. The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task2)). During the competition period (opened on 15th February, 2023 and closed on 20th March, 2023), a total of 24 teams participated in the three proposed tasks with around 30 valid submissions, respectively. In this article, we describe detailed statistical information of the dataset, tasks, evaluation protocols and the results summaries of the ICDAR 2023 on DSText competition. Moreover, we hope the benchmark will promise the video text research in the community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://rrc.cvc.uab.es/?ch=22 &com=introduction.

  2. 2.

    https://rrc.cvc.uab.es/?ch=3 &com=evaluation &task=1.

  3. 3.

    https://rrc.cvc.uab.es/?ch=22 &com=downloads.

  4. 4.

    https://rrc.cvc.uab.es/?ch=22 &com=introduction.

  5. 5.

    https://github.com/ageitgey/face_recognition.

  6. 6.

    https://www.tutorialspoint.com/opencv/opencv_gaussian_blur.htm.

References

  1. Yin, X.-C., Zuo, Z.-Y., Tian, S., Liu, C.-L.: Text detection, tracking and recognition in video: a comprehensive survey. IEEE Trans. Image Process. 25(6), 2752–2773 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  2. Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using lstms. In: International Conference on Machine Learning, pp. 843–852 (2015)

    Google Scholar 

  3. Dong, J., et al.: Dual encoding for video retrieval by text. IEEE Trans. Pattern Anal. Mach. Intell. 44(8), 4065–4080 (2021)

    Google Scholar 

  4. Anagnostopoulos, C.-N.E., Anagnostopoulos, I.E., Psoroulas, I.D., Loumos, V., Kayafas, E.: License plate recognition from still images and video sequences: a survey. IEEE Trans. Intell. Transp. Syst. 9(3), 377–391 (2008)

    Article  Google Scholar 

  5. Karatzas, D., et al.: Competition on robust reading. IEEE Int. Conf. Doc. Anal. Recogn. 2015, 1156–1160 (2015)

    Google Scholar 

  6. Nguyen, P.X., Wang, K., Belongie, S.: Video text detection and recognition: dataset and benchmark. In: IEEE Winter Conference on Applications of Computer Vision, pp. 776–783 (2014)

    Google Scholar 

  7. Reddy, S., Mathew, M., Gomez, L., Rusinol, M., Karatzas, D., Jawahar, C.: Roadtext-1k: text detection & recognition dataset for driving videos. In: IEEE International Conference on Robotics and Automation, pp. 11 074–11 080 (2020)

    Google Scholar 

  8. Cheng, Z., Lu, J., Niu, Y., Pu, S., Wu, F., Zhou, S.: You only recognize once: towards fast video text spotting. In: ACM International Conference on Multimedia, pp. 855–863 (2019)

    Google Scholar 

  9. Wu, W., et al.: A bilingual, Openworld video text dataset and end-to-end video text spotter with transformer. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021)

    Google Scholar 

  10. Zhou, X., Zhou, S., Yao, C., Cao, Z., Yin, Q.: Icdar 2015 text reading in the wild competition, arXiv preprintarXiv:1506.03184 (2015)

    Google Scholar 

  11. Dendorfer, P., et al.: Cvpr19 tracking and detection challenge: how crowded can it get? arXiv preprintarXiv:1906.04567 (2019)

    Google Scholar 

  12. Karatzas, D., et al.: Icdar,: robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, vol. 2013, pp. 1484–1493. IEEE (2013)

    Google Scholar 

  13. Li, Y., Huang, C., Nevatia, R.: Learning to associate: Hybridboosted multi-target tracker for crowded scene. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2009, pp. 2953–2960. IEEE (2009)

    Google Scholar 

  14. Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Workshops of European Conference on Computer Vision, pp. 17–35 (2016)

    Google Scholar 

  15. Wu, W., et al.: End-to-end video text spotting with transformer, arXiv preprintarXiv:2203.10539, (2022)

    Google Scholar 

  16. Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)

    Google Scholar 

  17. Wang, J., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)

    Article  Google Scholar 

  18. Wang, W., et al.: Internimage: exploring large-scale vision foundation models with deformable convolutions, arXiv preprintarXiv:2211.05778 (2022)

    Google Scholar 

  19. Zhang, Y., et al.: Bytetrack: multi-object tracking by associating every detection box. In: Computer Vision-ECCV,: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XXII. Springer vol. 2022, pp. 1–21 (2022). https://doi.org/10.1007/978-3-031-20047-2_1

  20. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)

    Google Scholar 

  21. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  22. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11 474–11 481 (2020)

    Google Scholar 

  23. Gao, Y., et al.: Video text tracking with a spatio-temporal complementary model. IEEE Trans. Image Process. 30, 9321–9331 (2021)

    Article  Google Scholar 

  24. Aharon, N., Orfaig, R., Bobrovsky, B.-Z.: Bot-sort: robust associations multi-pedestrian tracking, arXiv preprintarXiv:2206.14651 (2022)

    Google Scholar 

  25. Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images, arXiv preprintarXiv:1601.07140 (2016)

    Google Scholar 

  26. Shi, B., et al.: Icdar2017 competition on reading Chinese text in the wild (rctw-17). In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1429–1434. IEEE (2017)

    Google Scholar 

  27. Chng, C.K., et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-RRC-art. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1571–1576. IEEE (2019)

    Google Scholar 

  28. Sun, Y., Liu, J., Liu, W., Han, J., Ding, E., Liu, J.: Chinese street view text: large-scale Chinese text reading with partially supervised learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9086–9095 (2019)

    Google Scholar 

  29. autista, D., Atienza, R.: Scene text recognition with permuted autoregressive sequence models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, vol. 13688, pp. 178–196. Springer, Cham (2022).

    Google Scholar 

  30. Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7098–7107 (2021)

    Google Scholar 

  31. Wu, W., et al.: Real-time end-to-end video text spotter with contrastive representation learning, arXiv preprintarXiv:2207.08417 (2022)

    Google Scholar 

Download references

Acknowledgements

This competition is supported by the National Natural Science Foundation (NSFC#62225603).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weijia Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, W. et al. (2023). ICDAR 2023 Competition on Video Text Reading for Dense and Small Text. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14188. Springer, Cham. https://doi.org/10.1007/978-3-031-41679-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41679-8_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41678-1

  • Online ISBN: 978-3-031-41679-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics