EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition

Hao, Jiedong; Wen, Yafei; Deng, Jie; Gan, Jun; Ren, Shuai; Tan, Hui; Chen, Xiaoxin

doi:10.1007/978-3-030-86337-1_7

Jiedong Hao¹¹,
Yafei Wen¹¹,
Jie Deng¹¹,
Jun Gan¹¹,
Shuai Ren¹¹,
Hui Tan¹¹ &
…
Xiaoxin Chen¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12824))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3126 Accesses
1 Citations

Abstract

An objective and fair evaluation metric is fundamental to scene text detection and recognition research. Existing metrics cannot handle properly one-to-many and many-to-one matchings that arise naturally from the bounding box granularity inconsistency issue. They also use thresholds to match the ground truth and detection boxes, which leads to unstable matching result. In this paper, we propose a novel End-to-end Evaluation Metric (EEM) to tackle these problems. EEM handles one-to-many and many-to-one matching cases more reasonably and is threshold-free. We design a simple yet effective method to find matching groups from the ground truth and detection boxes in an image. We further employ a label merging method and use normalized scores to evaluate the performance of end-to-end text recognition methods more fairly. We conduct extensive experiments on the ICDAR2015, RCTW dataset, and a new general OCR dataset covering 17 categories of real-life scenes. Experimental results demonstrate the effectiveness and fairness of the proposed evaluation metric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
If there is no GT or DT box in a matching group, the GT or DT label is an empty string.

References

https://rrc.cvc.uab.es/?ch=4&com=tasks#end-to-end
Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: AAAI (2018)
Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Google Scholar
Gomez, R., et al.: ICDAR2017 robust reading challenge on COCO-text. In: ICDAR, vol. 01, pp. 1435–1443 (2017)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)
Google Scholar
He, M., et al.: ICPR2018 contest on robust reading for multi-type web images. In: ICPR, pp. 7–12 (2018)
Google Scholar
He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: CVPR, pp. 5020–5029 (2018)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)
Google Scholar
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR. pp. 1484–1493 (2013)
Google Scholar
Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. (2019)
Google Scholar
Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)
Article MathSciNet Google Scholar
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018)
Google Scholar
Liu, Y., Jin, L., Xie, Z., Luo, C., Zhang, S., Xie, L.: Tightness-aware evaluation protocol for scene text detection. In: CVPR, pp. 9612–9620 (2019)
Google Scholar
Nayef, N., et al.: ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT. In: ICDAR, vol. 01, pp. 1454–1459 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS, pp. 91–99 (2015)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Article Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal, Mach. Intell. (2018)
Google Scholar
Shi, B., et al.: ICDAR2017 competition on reading Chinese text in the wild (RCTW-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017)
Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Wang, K., Babenko, B., Belongie, S.J.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)
Google Scholar
Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. In: IJDAR, vol. 8, pp. 280–296 (2006)
Google Scholar
Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV (2019)
Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

VIVO AI Lab, Shenzhen, China
Jiedong Hao, Yafei Wen, Jie Deng, Jun Gan, Shuai Ren, Hui Tan & Xiaoxin Chen

Authors

Jiedong Hao
View author publications
You can also search for this author in PubMed Google Scholar
Yafei Wen
View author publications
You can also search for this author in PubMed Google Scholar
Jie Deng
View author publications
You can also search for this author in PubMed Google Scholar
Jun Gan
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Ren
View author publications
You can also search for this author in PubMed Google Scholar
Hui Tan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, J. et al. (2021). EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-86337-1_7
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86336-4
Online ISBN: 978-3-030-86337-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)