Skip to main content

Background-Insensitive Scene Text Recognition with Text Semantic Segmentation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13685))

Included in the following conference series:

Abstract

Scene Text Recognition (STR) has many important applications in computer vision. Complex backgrounds continue to be a big challenge for STR because they interfere with text feature extraction. Many existing methods use attentional regions, bounding boxes or polygons to reduce such interference. However, the text regions located by these methods still contain much undesirable background interference. In this paper, we propose a Background-Insensitive approach BINet by explicitly leveraging the text Semantic Segmentation (SSN) to extract texts more accurately. SSN is trained on a set of existing segmentation data, whose volume is only 0.03% of STR training data. This prevents the large-scale pixel-level annotations of the STR training data. To effectively utilize the segmentation cues, we design new segmentation refinement and embedding blocks for refining text-masks and reinforcing visual features. Additionally, we propose an efficient pipeline that utilizes Synthetic Initialization (SI) for STR models trained only on real data (1.7% of STR training data), instead of on both synthetic and real data from scratch. Experiments show that the proposed method can recognize text from complex backgrounds more effectively, achieving state-of-the-art performance on several public datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Al-Zaidy, R., Fung, B.C., Youssef, A.M., Fortin, F.: Mining criminal networks from unstructured text documents. Digit. Investig. 8(3–4), 147–160 (2012)

    Article  Google Scholar 

  2. Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid hmm maxout models. arXiv preprint arXiv:1310.1811 (2013)

  3. Atienza, R.: Vision transformer for fast and efficient scene text recognition. arXiv preprint arXiv:2105.08582 (2021)

  4. Baek, J., et al.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4715–4723 (2019)

    Google Scholar 

  5. Baek, J., Matsui, Y., Aizawa, K.: What if we only use real datasets for scene text recognition? Toward scene text recognition with fewer labels. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3113–3122 (2021)

    Google Scholar 

  6. Bao, W., Lai, W.S., Ma, C., Zhang, X., Gao, Z., Yang, M.H.: Depth-aware video frame interpolation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3703–3712 (2019)

    Google Scholar 

  7. Bartz, C., Bethge, J., Yang, H., Meinel, C.: Kiss: keeping it simple for scene text recognition. arXiv preprint arXiv:1911.08400 (2019)

  8. Bau, D., et al.: Seeing what a GAN cannot generate. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4502–4511 (2019)

    Google Scholar 

  9. Bhunia, A.K., Sain, A., Kumar, A., Ghose, S., Chowdhury, P.N., Song, Y.Z.: Joint visual semantic reasoning: multi-stage decoder for text recognition. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14940–14949 (2021)

    Google Scholar 

  10. Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotooCR: reading text in uncontrolled conditions. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 785–792 (2013)

    Google Scholar 

  11. Chen, X., Wang, T., Zhu, Y., Jin, L., Luo, C.: Adaptive embedding gate for attention-based scene text recognition. Neurocomputing 381, 261–271 (2020)

    Article  Google Scholar 

  12. Chen, Y., Li, V.O., Cho, K., Bowman, S.R.: A stable and effective learning strategy for trainable greedy decoding. arXiv preprint arXiv:1804.07915 (2018)

  13. Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: Aon: towards arbitrarily-oriented text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5571–5579 (2018)

    Google Scholar 

  14. Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)

    Google Scholar 

  15. Chng, C.K., et al.: ICDAR 2019 robust reading challenge on arbitrary-shaped text-RRC-art. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1571–1576. IEEE (2019)

    Google Scholar 

  16. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  17. Diaz-Escobar, J., Kober, V.: Natural scene text detection and segmentation using phase-based regions and character retrieval. In: Mathematical Problems in Engineering 2020 (2020)

    Google Scholar 

  18. Engelmann, F., Kontogianni, T., Hermans, A., Leibe, B.: Exploring spatial context for 3D semantic segmentation of point clouds. In: IEEE International Conference on Computer Vision workshops, pp. 716–724 (2017)

    Google Scholar 

  19. Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7098–7107 (2021)

    Google Scholar 

  20. Fang, S., Xie, H., Zha, Z.J., Sun, N., Tan, J., Zhang, Y.: Attention and language ensemble for scene text recognition with convolutional sequence modeling. In: ACM International Conference on Multimedia, pp. 248–256 (2018)

    Google Scholar 

  21. Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728 (2018)

  22. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 27 (2014)

    Google Scholar 

  23. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)

    Google Scholar 

  24. Hong, T., Hull, J.J.: Visual inter-word relations and their use in OCR postprocessing. In: Proceedings of 3rd International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 442–445. IEEE (1995)

    Google Scholar 

  25. Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: GTC: guided training of CTC towards efficient and accurate scene text recognition. In: Association for the Advancement of Artificial Intelligence (AAAI), vol. 34, pp. 11005–11012 (2020)

    Google Scholar 

  26. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903 (2014)

  27. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)

  28. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision (IJCV) 116(1), 1–20 (2016)

    Article  MathSciNet  Google Scholar 

  29. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34

    Chapter  Google Scholar 

  30. Jung, S., Lee, U., Jung, J., Shim, D.H.: Real-time traffic sign recognition system with deep convolutional neural network. In: International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), pp. 31–34. IEEE (2016)

    Google Scholar 

  31. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

    Google Scholar 

  32. Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1484–1493. IEEE (2013)

    Google Scholar 

  33. Krishnan, P., Kovvuri, R., Pang, G., Vassilev, B., Hassner, T.: Textstylebrush: transfer of text aesthetics from a single example. arXiv preprint arXiv:2106.08385 (2021)

  34. Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 703–718. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_45

    Chapter  Google Scholar 

  35. Laina, I., Rupprecht, C., Navab, N.: Towards unsupervised image captioning with shared multimodal embeddings. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7414–7424 (2019)

    Google Scholar 

  36. Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2231–2239 (2016)

    Google Scholar 

  37. Lee, D.H., et al.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning, International Conference on Machine Learning (ICML), vol. 3, p. 896 (2013)

    Google Scholar 

  38. Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: Association for the Advancement of Artificial Intelligence (AAAI), vol. 33, pp. 8610–8617 (2019)

    Google Scholar 

  39. Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask TextSpotter v3: segmentation proposal network for robust scene text spotting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 706–722. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_41

    Chapter  Google Scholar 

  40. Liao, M., et al.: Scene text recognition from two-dimensional perspective. In: Association for the Advancement of Artificial Intelligence (AAAI), vol. 33, pp. 8714–8721 (2019)

    Google Scholar 

  41. Litman, R., Anschel, O., Tsiper, S., Litman, R., Mazor, S., Manmatha, R.: Scatter: selective context attentional scene text recognizer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11962–11972 (2020)

    Google Scholar 

  42. Liu, W., Chen, C., Wong, K.Y.K.: Char-net: A character-aware neural network for distorted scene text recognition. In: Association for the Advancement of Artificial Intelligence (AAAI) (2018)

    Google Scholar 

  43. Liu, W., Chen, C., Wong, K.Y.K., Su, Z., Han, J.: Star-net: a spatial attention residue network for scene text recognition. In: British Machine Vision Conference (BMVC), vol. 2, p. 7 (2016)

    Google Scholar 

  44. Liu, X., Kawanishi, T., Wu, X., Kashino, K.: Scene text recognition with CNN classifier and WFST-based word labeling. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3999–4004. IEEE (2016)

    Google Scholar 

  45. Looije, R., Neerincx, M.A., Cnossen, F.: Persuasive robotic assistant for health self-management of older adults: design and evaluation of social behaviors. Int. J. Hum.-Comput. Stud. (IJHCS) 68(6), 386–397 (2010)

    Article  Google Scholar 

  46. Luo, C., Lin, Q., Liu, Y., Jin, L., Shen, C.: Separating content from style using adversarial learning for recognizing text in the wild. Int. J. Comput. Vision (IJCV) 129(4), 960–976 (2021)

    Article  MathSciNet  Google Scholar 

  47. Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order language priors. In: British Machine Vision Conference (BMVC). BMVA (2012)

    Google Scholar 

  48. Mishra, A., Alahari, K., Jawahar, C.: Enhancing energy minimization framework for scene text recognition with top-down cues. Comput. Vision Image Underst. (CVIU) 145, 30–42 (2016)

    Article  Google Scholar 

  49. Mou, Y., et al.: PlugNet: degradation aware scene text recognition supervised by a pluggable super-resolution unit. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 158–174. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_10

    Chapter  Google Scholar 

  50. Nayef, N., et al.: ICDAR 2019 robust reading challenge on multi-lingual scene text detection and recognition-RRC-MLT-2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1582–1587. IEEE (2019)

    Google Scholar 

  51. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_60

    Chapter  Google Scholar 

  52. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  53. Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 569–576 (2013)

    Google Scholar 

  54. Qiao, Z., et al.: PimNet: a parallel, iterative and mimicking network for scene text recognition. In: ACM International Conference on Multimedia, pp. 2046–2055 (2021)

    Google Scholar 

  55. Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13528–13537 (2020)

    Google Scholar 

  56. Ramesh, A., et al.: Zero-shot text-to-image generation. In: International Conference on Machine Learning (ICML), pp. 8821–8831. PMLR (2021)

    Google Scholar 

  57. Ren, W., et al.: Deep video dehazing with semantic segmentation. IEEE Trans. Image Process. (TIP) 28(4), 1895–1908 (2018)

    Article  MathSciNet  Google Scholar 

  58. Risnumawan, A., Shivakumara, P., Chan, C.S., Tan, C.L.: A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 41(18), 8027–8048 (2014)

    Article  Google Scholar 

  59. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  60. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4168–4176 (2016)

    Google Scholar 

  61. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 41(9), 2035–2048 (2018)

    Article  Google Scholar 

  62. Shi, B., et al.: ICDAR 2017 competition on reading Chinese text in the wild (RCTW-17). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1429–1434. IEEE (2017)

    Google Scholar 

  63. Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: IEEE/CVF International Conference on Computer Vision (ICCV), vol. 3, pp. 1470–1470. IEEE Computer Society (2003)

    Google Scholar 

  64. Su, B., Lu, S.: Accurate scene text recognition based on recurrent neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9003, pp. 35–48. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16865-4_3

    Chapter  Google Scholar 

  65. Sun, Y., et al.: ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1557–1562. IEEE (2019)

    Google Scholar 

  66. Tchapmi, L., Choy, C., Armeni, I., Gwak, J., Savarese, S.: SegCloud: semantic segmentation of 3D point clouds. In: 2017 International Conference on 3D Vision (3DV), pp. 537–547. IEEE (2017)

    Google Scholar 

  67. Tewel, Y., Shalev, Y., Schwartz, I., Wolf, L.: Zero-shot image-to-text generation for visual-semantic arithmetic. arXiv preprint arXiv:2111.14447 (2021)

  68. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 5998–6008 (2017)

    Google Scholar 

  69. Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016)

  70. Wan, Z., He, M., Chen, H., Bai, X., Yao, C.: TextScanner: reading characters in order for robust scene text recognition. In: Association for the Advancement of Artificial Intelligence (AAAI), vol. 34, pp. 12120–12127 (2020)

    Google Scholar 

  71. Wang, J., Li, X., Yang, J.: Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1788–1797 (2018)

    Google Scholar 

  72. Wang, J., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2020)

    Google Scholar 

  73. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1457–1464. IEEE (2011)

    Google Scholar 

  74. Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_43

    Chapter  Google Scholar 

  75. Wang, S., Wang, Y., Qin, X., Zhao, Q., Tang, Z.: Scene text recognition via gated cascade attention. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1018–1023. IEEE (2019)

    Google Scholar 

  76. Wang, T., et al.: Decoupled attention network for text recognition. In: Association for the Advancement of Artificial Intelligence (AAAI), vol. 34, pp. 12216–12224 (2020)

    Google Scholar 

  77. Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 606–615 (2018)

    Google Scholar 

  78. Xu, X., Zhang, Z., Wang, Z., Price, B., Wang, Z., Shi, H.: Rethinking text segmentation: a novel dataset and a text-specific refinement approach. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12045–12055 (2021)

    Google Scholar 

  79. Yan, R., Peng, L., Xiao, S., Yao, G.: Primitive representation learning for scene text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 284–293 (2021)

    Google Scholar 

  80. Yang, M., et al.: Symmetry-constrained rectification network for scene text recognition. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9147–9156 (2019)

    Google Scholar 

  81. Yang, X., He, D., Zhou, Z., Kifer, D., Giles, C.L.: Learning to read irregular text with attention mechanisms. In: International Joint Conference on Artificial Intelligence (IJCAI), vol. 1, p. 3 (2017)

    Google Scholar 

  82. Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4042–4049 (2014)

    Google Scholar 

  83. Ye, J., Chen, Z., Liu, J., Du, B.: TextFuseNet: scene text detection with richer fused features. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 516–522 (2020)

    Google Scholar 

  84. Yu, D., et al.: Towards accurate scene text recognition with semantic reasoning networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12113–12122 (2020)

    Google Scholar 

  85. Yue, X., Kuang, Z., Lin, C., Sun, H., Zhang, W.: RobustScanner: dynamically enhancing positional clues for robust text recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 135–151. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_9

    Chapter  Google Scholar 

  86. Zhan, F., Lu, S.: ESIR: end-to-end scene text recognition via iterative image rectification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2059–2068 (2019)

    Google Scholar 

  87. Zhang, H., Yao, Q., Yang, M., Xu, Y., Bai, X.: AutoSTR: efficient backbone search for scene text recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 751–767. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_44

    Chapter  Google Scholar 

  88. Zhang, R., et al.: ICDAR 2019 robust reading challenge on reading Chinese text on signboard. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1577–1581. IEEE (2019)

    Google Scholar 

  89. Zhang, X., Wei, Y., Yang, Y., Huang, T.S.: SG-ONE: similarity guidance network for one-shot semantic segmentation. IEEE Trans. Cybern. 50(9), 3855–3865 (2020)

    Article  Google Scholar 

  90. Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2740–2749 (2019)

    Google Scholar 

  91. Zhang, Y., Gueguen, L., Zharkov, I., Zhang, P., Seifert, K., Kadlec, B.: Uber-text: a large-scale dataset for optical character recognition from street-level imagery. In: IEEE International Conference on Computer Vision workshops, vol. 2017, p. 5 (2017)

    Google Scholar 

  92. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2223–2232 (2017)

    Google Scholar 

  93. Zhu, Y., Wang, S., Huang, Z., Chen, K.: Text recognition in images based on transformer with hierarchical attention. In: IEEE International Conference on Image Processing (ICIP), pp. 1945–1949. IEEE (2019)

    Google Scholar 

Download references

Acknowledgment

The work is supported by XSEDE Program of National Science Foundation, and Aspire-II Research Program in University of South Carolina. This work used GPUs provided by the NSF MRI-2018966.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, L., Wu, Z., Wu, X., Wilsbacher, G., Wang, S. (2022). Background-Insensitive Scene Text Recognition with Text Semantic Segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13685. Springer, Cham. https://doi.org/10.1007/978-3-031-19806-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19806-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19805-2

  • Online ISBN: 978-3-031-19806-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics