ABSTRACT
Scene text detection methods based on deep learning have achieved remarkable success. To address the laborious and time-consuming process of manually annotating datasets, a large amount of synthetic data has been created and utilized. However, due to the domain discrepancy between synthetic and real scene data, models trained on synthetic data may suffer from performance degradation when applied to real scenes. In order to tackle the domain shift issue between synthetic and real scene data, we propose the Orthogonal Feature Alignment Network (OFAN) specifically designed for text objects. OFAN incorporates an orthogonal feature enhancement module to strengthen the edge features of text instances, emphasizing the text objects, and employs adversarial training for text instance alignment across domains. Additionally, a multi-transform self-training mixture technique is utilized to further improve the detection performance of the model in the target domain, mitigating the adverse effects of false positives and false negatives. We extensively evaluate OFAN on four benchmark datasets, and the experimental results demonstrate the effectiveness of our approach.
- Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., & Liang, J. (2017). East: an efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 5551-5560).Google ScholarCross Ref
- Zhu, Y., Chen, J., Liang, L., Kuang, Z., **, L., & Zhang, W. (2021). Fourier contour embedding for arbitrary-shaped text detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3123-3131).Google ScholarCross Ref
- Wang, W., **e, E., Li, X., Hou, W., Lu, T., Yu, G., & Shao, S. (2019). Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9336-9345).Google ScholarCross Ref
- Wang, W., **e, E., Song, X., Zang, Y., Wang, W., Lu, T., ... & Shen, C. (2019). Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8440-8449).Google ScholarCross Ref
- Wang, Y., **e, H., Zha, Z. J., **ng, M., Fu, Z., & Zhang, Y. (2020). Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11753-11762).Google ScholarCross Ref
- Li, J., Xu, R., Ma, J., Zou, Q., Ma, J., & Yu, H. (2023). Domain adaptive object detection for autonomous driving under foggy weather. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 612-622).Google ScholarCross Ref
- Xu, C. D., Zhao, X. R., **, X., & Wei, X. S. (2020). Exploring categorical regularization for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11724-11733).Google ScholarCross Ref
- Kim, S., Choi, J., Kim, T., & Kim, C. (2019). Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6092-6101).Google ScholarCross Ref
- Maurya, J., Ranipa, K. R., Yamaguchi, O., Shibata, T., & Kobayashi, D. (2023, January). Domain Adaptation using Self-Training with Mixup for One-Stage Object Detection. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 4178-4187). IEEE.Google ScholarCross Ref
- Li, Y. J., Dai, X., Ma, C. Y., Liu, Y. C., Chen, K., Wu, B., ... & Vajda, P. (2022). Cross-domain adaptive teacher for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7581-7590).Google ScholarCross Ref
- Inoue, N., Furuta, R., Yamasaki, T., & Aizawa, K. (2018). Cross-domain weakly-supervised object detection through progressive domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5001-5009).Google ScholarCross Ref
- Zhan, F., Xue, C., & Lu, S. (2019). Ga-dan: Geometry-aware domain adaptation network for scene text detection and recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9105-9115).Google ScholarCross Ref
- Wu, W., Lu, N., **e, E., Wang, Y., Yu, W., Yang, C., & Zhou, H. (2020). Synthetic-to-real unsupervised domain adaptation for scene text detection in the wild. In Proceedings of the Asian Conference on Computer Vision.Google Scholar
- Deng, J., Luo, X., Zheng, J., Dang, W., & Li, W. (2022). Text Enhancement Network for Cross-Domain Scene Text Detection. IEEE Signal Processing Letters, 29, 2203-2207.Google Scholar
- Chen, D., Lu, L., Lu, Y., Yu, R., Wang, S., Zhang, L., & Liu, T. (2019). Cross-domain scene text detection via pixel and image-level adaptation. In Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12–15, 2019, Proceedings, Part V 26 (pp. 135-143). Springer International Publishing.Google ScholarCross Ref
- Zheng, J. (2022, January). Multiple-level alignment for cross-domain scene text detection. In 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE) (pp. 671-675). IEEE.Google ScholarCross Ref
- Mattolin, G., Zanella, L., Ricci, E., & Wang, Y. (2023). ConfMix: Unsupervised Domain Adaptation for Object Detection via Confidence-based Mixing. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 423-433).Google ScholarCross Ref
- Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., & He, K. (2018). Data distillation: Towards omni-supervised learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4119-4128).Google ScholarCross Ref
- Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2315-2324).Google ScholarCross Ref
- Zhan, F., Lu, S., & Xue, C. (2018). Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 249-266).Google ScholarDigital Library
- Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., ... & Valveny, E. (2015, August). ICDAR 2015 competition on robust reading. In 2015 13th international conference on document analysis and recognition (ICDAR) (pp. 1156-1160). IEEE.Google Scholar
- Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L. G., Mestre, S. R., ... & De Las Heras, L. P. (2013, August). ICDAR 2013 robust reading competition. In 2013 12th international conference on document analysis and recognition (pp. 1484-1493). IEEE.Google Scholar
- Sun, B., & Saenko, K. (2016). Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14 (pp. 443-450). Springer International Publishing.Google Scholar
Index Terms
- Orthogonal Feature Alignment Network for Cross-Domain Text Detection
Recommendations
Cross-domain mapping learning for transductive zero-shot learning
AbstractZero-shot learning (ZSL) aims to learn a projection function from a visual feature space to a semantic embedding space or reverse. The main challenge of ZSL is the domain shift problem where the unseen test data has a large gap with ...
Highlights- Our general algorithm can extend inductive ZSL methods to transductive scenarios.
AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications. Unfortunately, it has received much less attention than supervised object detection. Models that try to address this task tend to suffer from a ...
Cross-Domain Semi-Supervised Learning Using Feature Formulation
Semi-Supervised Learning (SSL) traditionally makes use of unlabeled samples In this paper, sample and instance are interchangeable terms. by including them into the training set through an automated labeling process. Such a primitive Semi-Supervised ...
Comments