A Cost-Efficient Framework for Scene Text Detection in the Wild

Zeng, Gangyan; Zhang, Yuan; Zhou, Yu; Yang, Xiaomeng

doi:10.1007/978-3-030-89188-6_11

Gangyan Zeng¹²,
Yuan Zhang¹²,
Yu Zhou^13,14 &
…
Xiaomeng Yang^13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13031))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

2372 Accesses
4 Citations

Abstract

Scene text detection in the wild is a hot research area in the field of computer vision, which has achieved great progress with the aid of deep learning. However, training deep text detection models needs large amounts of annotations such as bounding boxes and quadrangles, which is laborious and expensive. Although synthetic data is easier to acquire, the model trained on this data has large performance gap with that trained on real data because of domain shift. To address this problem, we propose a novel two-stage framework for cost-efficient scene text detection. Specifically, in order to unleash the power of synthetic data, we design an unsupervised domain adaptation scheme consisting of Entropy-aware Global Transfer (EGT) and Text Region Transfer (TRT) to pre-train the model. Furthermore, we utilize minimal actively annotated and enhanced pseudo labeled real samples to fine-tune the model, aiming at saving the annotation cost. In this framework, both the diversity of the synthetic data and the reality of the unlabeled real data are fully exploited. Extensive experiments on various benchmarks show that the proposed framework significantly outperforms the baseline, and achieves desirable performance with even a few labeled real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

A Text-Specific Domain Adaptive Network for Scene Text Detection in the Wild

Article 30 August 2023

Leveraging Text Localization for Scene Text Removal via Text-Aware Masked Image Modeling

References

Chen, D., et al.: Cross-domain scene text detection via pixel and image-level adaptation. In: ICONIP, pp. 135–143 (2019)
Google Scholar
Chen, Y., Wang, W., Zhou, Y., Yang, F., Yang, D., Wang, W.: Self-training for domain adaptive scene text detection. In: ICPR, pp. 850–857 (2021)
Google Scholar
Chen, Y., Zhou, Y., Yang, D., Wang, W.: Constrained relation network for character detection in scene images. In: PRICAI, pp. 137–149 (2019)
Google Scholar
Chen, Y., Li, W., Sakaridis, C., Dai, D., Van Gool, L.: Domain adaptive faster r-cnn for object detection in the wild. In: CVPR, pp. 3339–3348 (2018)
Google Scholar
Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: AAAI, vol. 32 (2018)
Google Scholar
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML, pp. 1180–1189 (2015)
Google Scholar
Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: CVPR, pp. 2066–2073 (2012)
Google Scholar
Guo, Y., Zhou, Y., Qin, X., Wang, W.: Which and where to focus: a simple yet accurate framework for arbitrary-shaped nearby text detection in scene images. In: ICANN (2021)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR, pp. 2315–2324 (2016)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017)
Google Scholar
Karatzas, D., et al.: Icdar 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)
Google Scholar
Karatzas, D., et al.: Icdar 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)
Google Scholar
Leng, Y., Xu, X., Qi, G.: Combining active learning and semi-supervised learning to construct svm classifier. Knowl.-Based Syst. 44, 121–131 (2013)
Article Google Scholar
Li, W., Luo, D., Fang, B., Zhou, Y., Wang, W.: Video 3d sampling for self-supervised representation learning. arXiv preprint arXiv:2107.03578 (2021)
Li, X., et al.: Dense semantic contrast for self-supervised visual representation learning. In: ACM MM (2021)
Google Scholar
Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. TIP 27(8), 3676–3690 (2018)
MathSciNet MATH Google Scholar
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018)
Google Scholar
Luo, D., Fang, B., Zhou, Y., Zhou, Y., Wu, D., Wang, W.: Exploring relations in untrimmed videos for self-supervised learning. arXiv preprint arXiv:2008.02711 (2020)
Luo, D., et al.: Video cloze procedure for self-supervised spatio-temporal learning. In: AAAI, pp. 11701–11708 (2020)
Google Scholar
Pise, N.N., Kulkarni, P.: A survey of semi-supervised learning methods. In: CIS, vol. 2, pp. 30–34 (2008)
Google Scholar
Qiao, Z., Qin, X., Zhou, Y., Yang, F., Wang, W.: Gaussian constrained attention network for scene text recognition. In: ICPR, pp. 3328–3335 (2021)
Google Scholar
Qiao, Z., et al.: PIMNet: a parallel, iterative and mimicking network for scene text recognition. In: ACM MM (2021)
Google Scholar
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In: CVPR, pp. 13528–13537 (2020)
Google Scholar
Qin, X., et al.: Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection. In: ACM MM (2021)
Google Scholar
Qin, X., Zhou, Y., Guo, Y., Wu, D., Wang, W.: Fc 2 rn: a fully convolutional corner refinement network for accurate multi-oriented scene text detection. In: ICASSP. pp. 4350–4354 (2021)
Google Scholar
Qin, X., Zhou, Y., Yang, D., Wang, W.: Curved text detection in natural scene images with semi-and weakly-supervised learning. In: ICDAR, pp. 559–564 (2019)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. TPAMI 39(6), 1137–1149 (2016)
Article Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39(11), 2298–2304 (2016)
Article Google Scholar
Tian, S., Lu, S., Li, C.: Wetext: scene text detection under weak supervision. In: ICCV, pp. 1492–1500 (2017)
Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: ECCV, pp. 56–72 (2016)
Google Scholar
Wang, K., Zhang, D., Li, Y., Zhang, R., Lin, L.: Cost-effective active learning for deep image classification. TCSVT 27(12), 2591–2600 (2016)
Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: CVPR, pp. 9336–9345 (2019)
Google Scholar
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: ICCV, pp. 8440–8449 (2019)
Google Scholar
Wang, X., Wen, J., Alam, S., Jiang, Z., Wu, Y.: Semi-supervised learning combining transductive support vector machine with active learning. Neurocomputing 173, 1288–1298 (2016)
Article Google Scholar
Wu, W., et al.: Synthetic-to-real unsupervised domain adaptation for scene text detection in the wild. In: ACCV (2020)
Google Scholar
Yang, D., Zhou, Y., Wang, W.: Multi-view correlation distillation for incremental object detection. arXiv preprint arXiv:2107.01787 (2021)
Yang, D., Zhou, Y., Wu, D., Ma, C., Yang, F., Wang, W.: Two-level residual distillation based triple network for incremental object detection. arXiv preprint arXiv:2007.13428 (2020)
Yao, Y., Liu, C., Luo, D., Zhou, Y., Ye, Q.: Video playback rate perception for self-supervised spatio-temporal representation learning. In: CVPR, pp. 6548–6557 (2020)
Google Scholar
Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. TPAMI 37(7), 1480–1500 (2014)
Article Google Scholar
Yoo, D., Kweon, I.S.: Learning loss for active learning. In: CVPR, pp. 93–102 (2019)
Google Scholar
Zeng, G., Zhang, Y., Zhou, Y., Yang, X.: Beyond OCR + VQA: involving OCR into the flow for robust and accurate TextVQA. In: ACM MM (2021)
Google Scholar
Zhan, F., Lu, S., Xue, C.: Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In: ECCV, pp. 249–266 (2018)
Google Scholar
Zhan, F., Xue, C., Lu, S.: Ga-dan: geometry-aware domain adaptation network for scene text detection and recognition. In: ICCV, pp. 9105–9115 (2019)
Google Scholar
Zhang, Y., Liu, C., Zhou, Y., Wang, W., Wang, W., Ye, Q.: Progressive cluster purification for unsupervised feature learning. In: ICPR, pp. 8476–8483 (2021)
Google Scholar
Zhang, Y., Zhou, Y., Wang, W.: Exploring instance relations for unsupervised feature embedding. arXiv preprint arXiv:2105.03341 (2021)
Zheng, Y., Huang, D., Liu, S., Wang, Y.: Cross-domain object detection through coarse-to-fine feature adaptation. In: CVPR, pp. 13766–13775 (2020)
Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: CVPR, pp. 5551–5560 (2017)
Google Scholar

Download references

Acknowledgments

This work is supported by the Open Research Project of the State Key Laboratory of Media Convergence and Communication, Communication University of China, China (No. SKLMCC2020KF004), the Beijing Municipal Science & Technology Commission (Z191100007119002), the Key Research Program of Frontier Sciences, CAS, Grant NO ZDBS-LY-7024, and the National Natural Science Foundation of China (No. 62006221).

Author information

Authors and Affiliations

Communication University of China, Beijing, China
Gangyan Zeng & Yuan Zhang
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Yu Zhou & Xiaomeng Yang
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Yu Zhou & Xiaomeng Yang

Authors

Gangyan Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaomeng Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Zhou .

Editor information

Editors and Affiliations

MIMOS Berhad, Kuala Lumpur, Malaysia
Duc Nghia Pham
Sirindhorn International Institute of Science and Technology, Thammasat University, Mueang Pathum Thani, Thailand
Thanaruk Theeramunkong
Data61, CSIRO, Brisbane, QLD, Australia
Guido Governatori
Department of Philosophy, Tsinghua University, Beijing, China
Fenrong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, G., Zhang, Y., Zhou, Y., Yang, X. (2021). A Cost-Efficient Framework for Scene Text Detection in the Wild. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13031. Springer, Cham. https://doi.org/10.1007/978-3-030-89188-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-89188-6_11
Published: 25 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89187-9
Online ISBN: 978-3-030-89188-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Cost-Efficient Framework for Scene Text Detection in the Wild

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

A Text-Specific Domain Adaptive Network for Scene Text Detection in the Wild

Leveraging Text Localization for Scene Text Removal via Text-Aware Masked Image Modeling

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Cost-Efficient Framework for Scene Text Detection in the Wild

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

A Text-Specific Domain Adaptive Network for Scene Text Detection in the Wild

Leveraging Text Localization for Scene Text Removal via Text-Aware Masked Image Modeling

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation