A Text-Specific Domain Adaptive Network for Scene Text Detection in the Wild

He, Xuan; Yuan, Jin; Li, Mengyao; Wang, Runmin; Wang, Haidong; Li, Zhiyong

doi:10.1007/s10489-023-04873-1

A Text-Specific Domain Adaptive Network for Scene Text Detection in the Wild

Published: 30 August 2023

Volume 53, pages 26827–26839, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xuan He ORCID: orcid.org/0000-0001-9720-5915¹,
Jin Yuan¹,
Mengyao Li¹,
Runmin Wang²,
Haidong Wang¹ &
…
Zhiyong Li¹

351 Accesses
2 Citations
Explore all metrics

Abstract

Scene text detection has drawn increasing attention due to its potential scalability to large-scale applications. Currently, a well-trained scene text detection model on a source domain usually has unsatisfactory performance when it is migrated to e large domain shift between them. To bridge this gap, this paper proposes a novel network integrates both text-specific Faster R-CNN (ts-FRCNN) and domain adaptation (ts-DA) into one framework. Compared to conventional FRCNN, ts-FRCNN designs a text-specific RPN to generate more accurate region proposals by considering the inherent characters of scene text, as well as text-specific RoI pooling to extract purer and sufficient fine-grained text features by adopting an adaptive asymmetric griding strategy. Compared to conventional domain adaptation, ts-DA adopts a triple-level alignment strategy to reduce the domain shift at the image, word and character levels, and builds a triple-consistency regularization among them, which significantly promotes domain-invariant text feature learning. We conduct extensive experiments on three representative transfer learning tasks: common-to-extreme scenes, real-to-real scenes and synthetic-to-real scenes. The experimental results demonstrate that our model consistently outperforms the previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TextFuse: Fusing Deep Scene Text Detection Models for Enhanced Performance

Article 07 August 2023

Cross-Domain Scene Text Detection via Pixel and Image-Level Adaptation

Not All Texts Are the Same: Dynamically Querying Texts for Scene Text Detection

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Sun C, Ai Y, Wang S, Zhang W (2021) Mask-guided ssd for small-object detection. Appl Intell 51:3311–3322
Article Google Scholar
Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell 51:6400–6429
Article Google Scholar
Serradilla O, Zugasti E, Rodriguez J, Zurutuza U (2022) Deep learning models for predictive maintenance: a survey, comparison, challenges and prospects. Appl Intell 52(10):10934–10964
Article Google Scholar
Y. Liu, D. Jiang, C. Xu, Y. Sun, G. Jiang, B. Tao, X. Tong, M. Xu, G. Li, J. Yun, (2022) Deep learning based 3d target detection for indoor scenes, Appl Intell 1–14
Jhaldiyal A, Chaudhary N (2023) Semantic segmentation of 3d lidar data using deep learning: a review of projection-based methods. Appl Intell 53(6):6844–6855
Article Google Scholar
Lin H, Yang P, Zhang F (2020) Review of scene text detection and recognition. Archives of computational methods in engineering 27(2):433–454
Article Google Scholar
He W, Zhang X-Y, Yin F, Luo Z, Ogier J-M, Liu C-L (2020) Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognition 98
Wang Y, Xie H, Zha Z, Tian Y, Fu Z, Zhang Y (2020) R-net: A relationship network for efficient and accurate scene text detection. IEEE Transactions on Multimedia 23:1316–1329
Article Google Scholar
Wang S, Liu Y, He Z, Wang Y, Tang Z (2020) A quadrilateral scene text detector with two-stage network architecture. Pattern Recognition 102 107230
Wu Q, Luo W, Chai Z, Guo G (2022) Scene text detection by adaptive feature selection with text scale-aware loss. Appl Intell 52(1):514–529
Article Google Scholar
X. Ma, K. He, D. Zhang, D. Li, (2021) Pieed: Position information enhanced encoder-decoder framework for scene text recognition, Appl Intell 1–10
S. Xia, J. Kou, N. Liu, T. Yin, (2022) Scene text recognition based on two-stage attention and multi-branch feature fusion module, Appl Intell 1–14
Wu X, Tang B, Zhao M, Wang J, Guo Y (2023) Str transformer: a cross-domain transformer for scene text recognition. Appl Intell 53(3):3444–3458
Article Google Scholar
W. Wu, N. Lu, E. Xie, Synthetic-to-real unsupervised domain adaptation for scene text detection in the wild, in: ACCV, 2020
F. Zhan, C. Xue, S. Lu, Ga-dan: Geometry-aware domain adaptation network for scene text detection and recognition, in: ICCV, 2019
Y. Chen, W. Wang, Y. Zhou, F. Yang, D. Yang, W. Wang, (2021) Self-training for domain adaptive scene text detection, in: ICPR, IEEE, pp. 850–857
G. Zeng, Y. Zhang, Y. Zhou, X. Yang, (2021) A cost-efficient framework for scene text detection in the wild, in: PRICAI, Springer, pp. 139–153
Z. Tian, C. Xue, J. Zhang, S. Lu, (2022) Domain adaptive scene text detection via subcategorization, arXiv:2212.00377
Khan T, Sarkar R, Mollah AF (2021) Deep learning approaches to scene text detection: a comprehensive review. Artif. Intell. Rev 54:3239–3298
Article Google Scholar
Liao M, Zou Z, Wan Z, Yao C, Bai X (2022) Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1):919–931
Article Google Scholar
Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X (2019) Textfield: Learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing 28(11):5566–5579
Article MathSciNet Google Scholar
Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90:337–345
Article Google Scholar
Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Doc Anal Recognit 22:143–162
Article Google Scholar
B. Shi, X. Bai, S. Belongie, (2017) Detecting oriented text in natural images by linking segments, in: CVPR
Tang J, Yang Z, Wang Y, Zheng Q, Xu Y, Bai X (2019) Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition 96:106954
Article Google Scholar
J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, X. Xue, (2018) Arbitrary-oriented scene text detection via rotation proposals, IEEE Transactions on Multimedia 3111–3122
M.Liao, Z. Zhu, B. Shi, G.-s. Xia, X. Bai, (2018) Rotation-sensitive regression for oriented scene text detection, in: CVPR
X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang, (2017) East: An efficient and accurate scene text detector, in: CVPR
Ma C, Sun L, Zhong Z, Huo Q (2021) Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111:107684
Article Google Scholar
Zhang S, Liu Y, Jin L, Wei Z, Shen C (2020) Opmp: An omnidirectional pyramid mask proposal network for arbitrary-shape scene text detection. IEEE Transactions on Multimedia 23:454–467
Naiemi F, Ghods V, Khalesi H (2021) A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Systems with Applications 170:114549
Article Google Scholar
C.-K. ChÃC. S. Chan, C.-L. Liu, (2020) Total-text: toward orientation robustness in scene text detection. Int J Doc Anal Recognit 23(1):31–52
W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, S. Shao, (2019) Shape robust text detection with progressive scale expansion network, in: CVPR
H. Wang, P. Lu, H. Zhang, M. Yang, X. Bai, Y. Xu, M. He, Y. Wang, W. Liu, 2020 All you need is boundary: Toward arbitrary-shaped text spotting, in: AAAI
Y. Liu, H. Chen, C. Shen, T. He, L. Jin, L. Wang, (2020) Abcnet: Real-time scene text spotting with adaptive bezier-curve network, in: CVPR
Wang X, Yi Y, Peng J, Wang K (2022) Arbitrary-shaped scene text detection by predicting distance map. Appl Intell 52(12):14374–14386
Article Google Scholar
M. Liao, Z. Wan, C. Yao, K. Chen, X. Bai, (2020) Real-time scene text detection with differentiable binarization, in: AAAI
Zhu Y, Du J (2021) Textmountain: Accurate scene text detection via instance segmentation. Pattern Recognition 110 107336
Sun X, Xv H, Dong J, Zhou H, Chen C, Li Q (2020) Few-shot learning for domain-specific fine-grained image classification. IEEE Transactions on Industrial Electronics 68(4):3588–3598
Article Google Scholar
G. Yang, M. Ding, Y. Zhang, (2022) Bi-directional class-wise adversaries for unsupervised domain adaptation, Appl Intell 1–17
J. Zhao, X. Zhou, G. Shi, N. Xiao, K. Song, J. Zhao, R. Hao, K. Li, (2022) Semantic consistency generative adversarial network for cross-modality domain adaptation in ultrasound thyroid nodule classification, Appl Intell 1–15
D.-q. Xu, M.-a. Li, (2022) A dual alignment-based multi-source domain adaptation framework for motor imagery eeg classification, Appl Intell 1–23
Kang G, Wei Y, Yang Y, Zhuang Y, Hauptmann A (2020) Pixel-level cycle association: A new perspective for domain adaptive semantic segmentation. Adv Neural Inf Process Syst 33:3569–3580
Google Scholar
Zhang L, Wang X, Yang D, Sanford T, Harmon S, Turkbey B, Wood BJ, Roth H, Myronenko A, Xu D et al (2020) Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation. IEEE Transactions on Medical Imaging 39(7):2531–2540
Article Google Scholar
Wang Q, Gao J, Li X (2019) Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes. IEEE Transactions on Image Processing 28(9):4376–4386
Article MathSciNet Google Scholar
H. Chen, Y. Jiang, M. Loew, H. Ko, (2022) Unsupervised domain adaptation based covid-19 ct infection segmentation network, Appl Intell 1–14
Chen C, Wang G (2021) Iosuda: an unsupervised domain adaptation with input and output space alignment for joint optic disc and cup segmentation. Appl Intell 51:3880–3898
Article Google Scholar
Flores CF, Gonzalez-Garcia A, van de Weijer J, Raducanu B (2019) Saliency for fine-grained object recognition in domains with scarce training data. Pattern Recognition 94:62–73
Article Google Scholar
Song K, Wei X-S, Shu X, Song R-J, Lu J (2020) Bi-modal progressive mask attention for fine-grained recognition. IEEE Transactions on Image Processing 29:7006–7018
Article Google Scholar
Wei X-S, Song Y-Z, Mac Aodha O, Wu J, Peng Y, Tang J, Yang J, Belongie S (2021) Fine-grained image analysis with deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(12):8927–8948
Article Google Scholar
Wang X, Tang J, Tan S (2022) Three-way enhanced part-aware network for fine-grained sketch-based image retrieval. Appl Intell 52(10):10901–10916
Article Google Scholar
Xia W, Yang Y, Xue J-H (2020) Unsupervised multi-domain multimodal image-to-image translation with explicit domain-constrained disentanglement. Neural Networks 131:50–63
Article Google Scholar
Tan DS, Lin Y-X, Hua K-L (2020) Incremental learning of multi-domain image-to-image translations. IEEE Transactions on Circuits and Systems for Video Technology 31(4):1526–1539
G. Wang, H. Shi, Y. Chen, B. Wu, (2022) Unsupervised image-to-image translation via long-short cycle-consistent adversarial networks, Appl Intell 1–17
W. Li, X. Liu, Y. Yuan, (2022) Scan++: Enhanced semantic conditioned adaptation for domain adaptive object detection, IEEE Transactions on Multimedia
P. Oza, V. A. Sindagi, V. V. Sharmini, V. M. Patel, (2023) Unsupervised domain adaptation of object detectors: A survey, IEEE Transactions on Pattern Analysis and Machine Intelligence
Yin G, Yu M, Wang M, Hu Y, Zhang Y (2022) Research on highway vehicle detection based on faster r-cnn and domain adaptation. Appl Intell 52(4):3483–3498
Article Google Scholar
Li S, Huang J, Hua X-S, Zhang L (2021) Category dictionary guided unsupervised domain adaptation for object detection. AAAI 35:1949–1957
Article Google Scholar
J. Deng, W. Li, Y. Chen, L. Duan, (2021) Unbiased mean teacher for cross-domain object detection, in: CVPR, pp. 4091–4101
Y.-J. Li, X. Dai, C.-Y. Ma, Y.-C. Liu, K. Chen, B. Wu, Z. He, K. Kitani, P. Vajda, (2022) Cross-domain adaptive teacher for object detection, in: CVPR, pp. 7581–7590
Wang J, Shen T, Tian Y, Wang Y, Gou C, Wang X, Yao F, Sun C (2022) A parallel teacher for synthetic-to-real domain adaptation of traffic object detection. IEEE Transactions on Intelligent Vehicles 7(3):441–455
Article Google Scholar
Shi X, Li Z, Yu H (2021) Adaptive threshold cascade faster rcnn for domain adaptive object detection. Multimed Tools Appl 80:25291–25308
Article Google Scholar
L. Zhao, L. Wang, (2022) Task-specific inconsistency alignment for domain adaptive object detection, in: CVPR, pp. 14217–14226
D. Liu, C. Zhang, Y. Song, H. Huang, C. Wang, M. Barnett, W. Cai, (2022) Decompose to adapt: Cross-domain object detection via feature disentanglement, IEEE Transactions on Multimedia
Shan Y, Lu WF, Chew CM (2019) Pixel and feature level based domain adaptation for object detection in autonomous driving. Neurocomputing 367:31–38
Article Google Scholar
R. Ramamonjison, A. Banitalebi-Dehkordi, X. Kang, X. Bai, Y. Zhang, (2021) Simrod: A simple adaptation method for robust object detection, in: ICCV, pp. 3570–3579
Munir MA, Khan MH, Sarfraz M, Ali M (2021) Ssal: Synergizing between self-training and adversarial learning for domain adaptive object detection. Adv. Neural Inf. Process. Syst 34:22770–22782
Google Scholar
Y. Chen, W. Li, C. Sakaridis, D. Dai, V. L. Gool, (2018) Domain adaptive faster r-cnn for object detection in the wild, in: CVPR
C. Li, D. Du, L. Zhang, L. Wen, T. Luo, Y. Wu, P. Zhu, (2020) Spatial attention pyramid network for unsupervised domain adaptation, in: ECCV, Springer, pp. 481–497
Y. Zhang, Z. Wang, Y. Mao, (2021) Rpn prototype alignment for domain adaptive object detector, in: CVPR, pp. 12425–12434
W. Li, X. Liu, Y. Yuan, (2022) Sigma: Semantic-complete graph matching for domain adaptive object detection, in: CVPR, pp. 5291–5300
Y. Ganin, S. V. Lempitsky, (2015) Unsupervised domain adaptation by backpropagation, in: ICML
S. Ren, K. He, B. R. Girshick, J. Sun, (2017) Faster r-cnn: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence. 1137–1149
X. He, R. Wang, X. Li, X. Chen, C. Guo, L. Wen, C. Gao, L. Liu, (2019) Htstl: Head-and-tail search network with scale-transfer layer for traffic sign text detection, IEEE Access 118333–118342
N. Nayef, F. Yin, I. Bizid, H. Choi, Y. Feng, D. Karatzas, Z. Luo, U. Pal, C. Rigaud, J. Chazalon, W. Khlif, M. M. Luqman, J.-C. Burie, C.-L. Liu, J.-M. Ogier, (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification - rrc-mlt, in: ICDAR
D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, G. i. L. Bigorda, R. S. Mestre, J. Mas, F. D. Mota, A. J. AlmazÂ n, P. d. l. L. Heras, (2013) Icdar 2013 robust reading competition, in: ICDAR
A. Gupta, A. Vedaldi, A. Zisserman, (2016) Synthetic data for text localisation in natural images, in: CVPR
F. Zhan, S. Lu, C. Xue, (2018) Verisimilar image synthesis for accurate detection and recognition of texts in scenes, in: ECCV, pp. 249–266
D. Chen, L. Lu, Y. Lu, R. Yu, S. Wang, L. Zhang, T. Liu, (2019) Cross-domain scene text detection via pixel and image-level adaptation, in: ICONIP, Springer, pp. 135–143

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (No.U21A20518, No.61976086).

Author information

Authors and Affiliations

Computer Science and Electronic Engineering, Hunan University, Hunan, China
Xuan He, Jin Yuan, Mengyao Li, Haidong Wang & Zhiyong Li
Information Science and Engineering, Hunan Normal University, Hunan, China
Runmin Wang

Authors

Xuan He
View author publications
You can also search for this author inPubMed Google Scholar
Jin Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Mengyao Li
View author publications
You can also search for this author inPubMed Google Scholar
Runmin Wang
View author publications
You can also search for this author inPubMed Google Scholar
Haidong Wang
View author publications
You can also search for this author inPubMed Google Scholar
Zhiyong Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhiyong Li.

Ethics declarations

Competing of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, X., Yuan, J., Li, M. et al. A Text-Specific Domain Adaptive Network for Scene Text Detection in the Wild. Appl Intell 53, 26827–26839 (2023). https://doi.org/10.1007/s10489-023-04873-1

Download citation

Accepted: 07 July 2023
Published: 30 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10489-023-04873-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Text-Specific Domain Adaptive Network for Scene Text Detection in the Wild

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TextFuse: Fusing Deep Scene Text Detection Models for Enhanced Performance

Cross-Domain Scene Text Detection via Pixel and Image-Level Adaptation

Not All Texts Are the Same: Dynamically Querying Texts for Scene Text Detection

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now