Skip to main content

EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12824))

Included in the following conference series:

Abstract

An objective and fair evaluation metric is fundamental to scene text detection and recognition research. Existing metrics cannot handle properly one-to-many and many-to-one matchings that arise naturally from the bounding box granularity inconsistency issue. They also use thresholds to match the ground truth and detection boxes, which leads to unstable matching result. In this paper, we propose a novel End-to-end Evaluation Metric (EEM) to tackle these problems. EEM handles one-to-many and many-to-one matching cases more reasonably and is threshold-free. We design a simple yet effective method to find matching groups from the ground truth and detection boxes in an image. We further employ a label merging method and use normalized scores to evaluate the performance of end-to-end text recognition methods more fairly. We conduct extensive experiments on the ICDAR2015, RCTW dataset, and a new general OCR dataset covering 17 categories of real-life scenes. Experimental results demonstrate the effectiveness and fairness of the proposed evaluation metric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    If there is no GT or DT box in a matching group, the GT or DT label is an empty string.

References

  1. https://rrc.cvc.uab.es/?ch=4&com=tasks#end-to-end

  2. Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: AAAI (2018)

    Google Scholar 

  3. Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)

    Google Scholar 

  4. Gomez, R., et al.: ICDAR2017 robust reading challenge on COCO-text. In: ICDAR, vol. 01, pp. 1435–1443 (2017)

    Google Scholar 

  5. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)

    Google Scholar 

  6. He, M., et al.: ICPR2018 contest on robust reading for multi-type web images. In: ICPR, pp. 7–12 (2018)

    Google Scholar 

  7. He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: CVPR, pp. 5020–5029 (2018)

    Google Scholar 

  8. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)

    Google Scholar 

  9. Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR. pp. 1484–1493 (2013)

    Google Scholar 

  10. Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. (2019)

    Google Scholar 

  11. Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)

    Article  MathSciNet  Google Scholar 

  12. Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018)

    Google Scholar 

  13. Liu, Y., Jin, L., Xie, Z., Luo, C., Zhang, S., Xie, L.: Tightness-aware evaluation protocol for scene text detection. In: CVPR, pp. 9612–9620 (2019)

    Google Scholar 

  14. Nayef, N., et al.: ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT. In: ICDAR, vol. 01, pp. 1454–1459 (2017)

    Google Scholar 

  15. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS, pp. 91–99 (2015)

    Google Scholar 

  16. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  17. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal, Mach. Intell. (2018)

    Google Scholar 

  18. Shi, B., et al.: ICDAR2017 competition on reading Chinese text in the wild (RCTW-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017)

    Google Scholar 

  19. Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4

    Chapter  Google Scholar 

  20. Wang, K., Babenko, B., Belongie, S.J.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)

    Google Scholar 

  21. Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. In: IJDAR, vol. 8, pp. 280–296 (2006)

    Google Scholar 

  22. Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV (2019)

    Google Scholar 

  23. Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hao, J. et al. (2021). EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86337-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86336-4

  • Online ISBN: 978-3-030-86337-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics