skip to main content
10.1145/3647649.3647697acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicigpConference Proceedingsconference-collections
research-article

Orthogonal Feature Alignment Network for Cross-Domain Text Detection

Published:03 May 2024Publication History

ABSTRACT

Scene text detection methods based on deep learning have achieved remarkable success. To address the laborious and time-consuming process of manually annotating datasets, a large amount of synthetic data has been created and utilized. However, due to the domain discrepancy between synthetic and real scene data, models trained on synthetic data may suffer from performance degradation when applied to real scenes. In order to tackle the domain shift issue between synthetic and real scene data, we propose the Orthogonal Feature Alignment Network (OFAN) specifically designed for text objects. OFAN incorporates an orthogonal feature enhancement module to strengthen the edge features of text instances, emphasizing the text objects, and employs adversarial training for text instance alignment across domains. Additionally, a multi-transform self-training mixture technique is utilized to further improve the detection performance of the model in the target domain, mitigating the adverse effects of false positives and false negatives. We extensively evaluate OFAN on four benchmark datasets, and the experimental results demonstrate the effectiveness of our approach.

References

  1. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., & Liang, J. (2017). East: an efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 5551-5560).Google ScholarGoogle ScholarCross RefCross Ref
  2. Zhu, Y., Chen, J., Liang, L., Kuang, Z., **, L., & Zhang, W. (2021). Fourier contour embedding for arbitrary-shaped text detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3123-3131).Google ScholarGoogle ScholarCross RefCross Ref
  3. Wang, W., **e, E., Li, X., Hou, W., Lu, T., Yu, G., & Shao, S. (2019). Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9336-9345).Google ScholarGoogle ScholarCross RefCross Ref
  4. Wang, W., **e, E., Song, X., Zang, Y., Wang, W., Lu, T., ... & Shen, C. (2019). Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8440-8449).Google ScholarGoogle ScholarCross RefCross Ref
  5. Wang, Y., **e, H., Zha, Z. J., **ng, M., Fu, Z., & Zhang, Y. (2020). Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11753-11762).Google ScholarGoogle ScholarCross RefCross Ref
  6. Li, J., Xu, R., Ma, J., Zou, Q., Ma, J., & Yu, H. (2023). Domain adaptive object detection for autonomous driving under foggy weather. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 612-622).Google ScholarGoogle ScholarCross RefCross Ref
  7. Xu, C. D., Zhao, X. R., **, X., & Wei, X. S. (2020). Exploring categorical regularization for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11724-11733).Google ScholarGoogle ScholarCross RefCross Ref
  8. Kim, S., Choi, J., Kim, T., & Kim, C. (2019). Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6092-6101).Google ScholarGoogle ScholarCross RefCross Ref
  9. Maurya, J., Ranipa, K. R., Yamaguchi, O., Shibata, T., & Kobayashi, D. (2023, January). Domain Adaptation using Self-Training with Mixup for One-Stage Object Detection. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 4178-4187). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  10. Li, Y. J., Dai, X., Ma, C. Y., Liu, Y. C., Chen, K., Wu, B., ... & Vajda, P. (2022). Cross-domain adaptive teacher for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7581-7590).Google ScholarGoogle ScholarCross RefCross Ref
  11. Inoue, N., Furuta, R., Yamasaki, T., & Aizawa, K. (2018). Cross-domain weakly-supervised object detection through progressive domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5001-5009).Google ScholarGoogle ScholarCross RefCross Ref
  12. Zhan, F., Xue, C., & Lu, S. (2019). Ga-dan: Geometry-aware domain adaptation network for scene text detection and recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9105-9115).Google ScholarGoogle ScholarCross RefCross Ref
  13. Wu, W., Lu, N., **e, E., Wang, Y., Yu, W., Yang, C., & Zhou, H. (2020). Synthetic-to-real unsupervised domain adaptation for scene text detection in the wild. In Proceedings of the Asian Conference on Computer Vision.Google ScholarGoogle Scholar
  14. Deng, J., Luo, X., Zheng, J., Dang, W., & Li, W. (2022). Text Enhancement Network for Cross-Domain Scene Text Detection. IEEE Signal Processing Letters, 29, 2203-2207.Google ScholarGoogle Scholar
  15. Chen, D., Lu, L., Lu, Y., Yu, R., Wang, S., Zhang, L., & Liu, T. (2019). Cross-domain scene text detection via pixel and image-level adaptation. In Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12–15, 2019, Proceedings, Part V 26 (pp. 135-143). Springer International Publishing.Google ScholarGoogle ScholarCross RefCross Ref
  16. Zheng, J. (2022, January). Multiple-level alignment for cross-domain scene text detection. In 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE) (pp. 671-675). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  17. Mattolin, G., Zanella, L., Ricci, E., & Wang, Y. (2023). ConfMix: Unsupervised Domain Adaptation for Object Detection via Confidence-based Mixing. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 423-433).Google ScholarGoogle ScholarCross RefCross Ref
  18. Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., & He, K. (2018). Data distillation: Towards omni-supervised learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4119-4128).Google ScholarGoogle ScholarCross RefCross Ref
  19. Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2315-2324).Google ScholarGoogle ScholarCross RefCross Ref
  20. Zhan, F., Lu, S., & Xue, C. (2018). Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 249-266).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., ... & Valveny, E. (2015, August). ICDAR 2015 competition on robust reading. In 2015 13th international conference on document analysis and recognition (ICDAR) (pp. 1156-1160). IEEE.Google ScholarGoogle Scholar
  22. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L. G., Mestre, S. R., ... & De Las Heras, L. P. (2013, August). ICDAR 2013 robust reading competition. In 2013 12th international conference on document analysis and recognition (pp. 1484-1493). IEEE.Google ScholarGoogle Scholar
  23. Sun, B., & Saenko, K. (2016). Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14 (pp. 443-450). Springer International Publishing.Google ScholarGoogle Scholar

Index Terms

  1. Orthogonal Feature Alignment Network for Cross-Domain Text Detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing
      January 2024
      480 pages
      ISBN:9798400716720
      DOI:10.1145/3647649

      Copyright © 2024 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 May 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format