Background-Insensitive Scene Text Recognition with Text Semantic Segmentation

Zhao, Liang; Wu, Zhenyao; Wu, Xinyi; Wilsbacher, Greg; Wang, Song

doi:10.1007/978-3-031-19806-9_10

Liang Zhao¹²,
Zhenyao Wu¹²,
Xinyi Wu¹²,
Greg Wilsbacher¹² &
…
Song Wang¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13685))

Included in the following conference series:

European Conference on Computer Vision

2044 Accesses
4 Citations

Abstract

Scene Text Recognition (STR) has many important applications in computer vision. Complex backgrounds continue to be a big challenge for STR because they interfere with text feature extraction. Many existing methods use attentional regions, bounding boxes or polygons to reduce such interference. However, the text regions located by these methods still contain much undesirable background interference. In this paper, we propose a Background-Insensitive approach BINet by explicitly leveraging the text Semantic Segmentation (SSN) to extract texts more accurately. SSN is trained on a set of existing segmentation data, whose volume is only 0.03% of STR training data. This prevents the large-scale pixel-level annotations of the STR training data. To effectively utilize the segmentation cues, we design new segmentation refinement and embedding blocks for refining text-masks and reinforcing visual features. Additionally, we propose an efficient pipeline that utilizes Synthetic Initialization (SI) for STR models trained only on real data (1.7% of STR training data), instead of on both synthetic and real data from scratch. Experiments show that the proposed method can recognize text from complex backgrounds more effectively, achieving state-of-the-art performance on several public datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Al-Zaidy, R., Fung, B.C., Youssef, A.M., Fortin, F.: Mining criminal networks from unstructured text documents. Digit. Investig. 8(3–4), 147–160 (2012)
Article Google Scholar
Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid hmm maxout models. arXiv preprint arXiv:1310.1811 (2013)
Atienza, R.: Vision transformer for fast and efficient scene text recognition. arXiv preprint arXiv:2105.08582 (2021)
Baek, J., et al.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4715–4723 (2019)
Google Scholar
Baek, J., Matsui, Y., Aizawa, K.: What if we only use real datasets for scene text recognition? Toward scene text recognition with fewer labels. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3113–3122 (2021)
Google Scholar
Bao, W., Lai, W.S., Ma, C., Zhang, X., Gao, Z., Yang, M.H.: Depth-aware video frame interpolation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3703–3712 (2019)
Google Scholar
Bartz, C., Bethge, J., Yang, H., Meinel, C.: Kiss: keeping it simple for scene text recognition. arXiv preprint arXiv:1911.08400 (2019)
Bau, D., et al.: Seeing what a GAN cannot generate. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4502–4511 (2019)
Google Scholar
Bhunia, A.K., Sain, A., Kumar, A., Ghose, S., Chowdhury, P.N., Song, Y.Z.: Joint visual semantic reasoning: multi-stage decoder for text recognition. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14940–14949 (2021)
Google Scholar
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotooCR: reading text in uncontrolled conditions. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 785–792 (2013)
Google Scholar
Chen, X., Wang, T., Zhu, Y., Jin, L., Luo, C.: Adaptive embedding gate for attention-based scene text recognition. Neurocomputing 381, 261–271 (2020)
Article Google Scholar
Chen, Y., Li, V.O., Cho, K., Bowman, S.R.: A stable and effective learning strategy for trainable greedy decoding. arXiv preprint arXiv:1804.07915 (2018)
Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: Aon: towards arbitrarily-oriented text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5571–5579 (2018)
Google Scholar
Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)
Google Scholar
Chng, C.K., et al.: ICDAR 2019 robust reading challenge on arbitrary-shaped text-RRC-art. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1571–1576. IEEE (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Diaz-Escobar, J., Kober, V.: Natural scene text detection and segmentation using phase-based regions and character retrieval. In: Mathematical Problems in Engineering 2020 (2020)
Google Scholar
Engelmann, F., Kontogianni, T., Hermans, A., Leibe, B.: Exploring spatial context for 3D semantic segmentation of point clouds. In: IEEE International Conference on Computer Vision workshops, pp. 716–724 (2017)
Google Scholar
Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7098–7107 (2021)
Google Scholar
Fang, S., Xie, H., Zha, Z.J., Sun, N., Tan, J., Zhang, Y.: Attention and language ensemble for scene text recognition with convolutional sequence modeling. In: ACM International Conference on Multimedia, pp. 248–256 (2018)
Google Scholar
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728 (2018)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 27 (2014)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)
Google Scholar
Hong, T., Hull, J.J.: Visual inter-word relations and their use in OCR postprocessing. In: Proceedings of 3rd International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 442–445. IEEE (1995)
Google Scholar
Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: GTC: guided training of CTC towards efficient and accurate scene text recognition. In: Association for the Advancement of Artificial Intelligence (AAAI), vol. 34, pp. 11005–11012 (2020)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903 (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision (IJCV) 116(1), 1–20 (2016)
Article MathSciNet Google Scholar
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Chapter Google Scholar
Jung, S., Lee, U., Jung, J., Shim, D.H.: Real-time traffic sign recognition system with deep convolutional neural network. In: International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), pp. 31–34. IEEE (2016)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Google Scholar
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1484–1493. IEEE (2013)
Google Scholar
Krishnan, P., Kovvuri, R., Pang, G., Vassilev, B., Hassner, T.: Textstylebrush: transfer of text aesthetics from a single example. arXiv preprint arXiv:2106.08385 (2021)
Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 703–718. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_45
Chapter Google Scholar
Laina, I., Rupprecht, C., Navab, N.: Towards unsupervised image captioning with shared multimodal embeddings. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7414–7424 (2019)
Google Scholar
Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2231–2239 (2016)
Google Scholar
Lee, D.H., et al.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning, International Conference on Machine Learning (ICML), vol. 3, p. 896 (2013)
Google Scholar
Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: Association for the Advancement of Artificial Intelligence (AAAI), vol. 33, pp. 8610–8617 (2019)
Google Scholar
Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask TextSpotter v3: segmentation proposal network for robust scene text spotting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 706–722. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_41
Chapter Google Scholar
Liao, M., et al.: Scene text recognition from two-dimensional perspective. In: Association for the Advancement of Artificial Intelligence (AAAI), vol. 33, pp. 8714–8721 (2019)
Google Scholar
Litman, R., Anschel, O., Tsiper, S., Litman, R., Mazor, S., Manmatha, R.: Scatter: selective context attentional scene text recognizer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11962–11972 (2020)
Google Scholar
Liu, W., Chen, C., Wong, K.Y.K.: Char-net: A character-aware neural network for distorted scene text recognition. In: Association for the Advancement of Artificial Intelligence (AAAI) (2018)
Google Scholar
Liu, W., Chen, C., Wong, K.Y.K., Su, Z., Han, J.: Star-net: a spatial attention residue network for scene text recognition. In: British Machine Vision Conference (BMVC), vol. 2, p. 7 (2016)
Google Scholar
Liu, X., Kawanishi, T., Wu, X., Kashino, K.: Scene text recognition with CNN classifier and WFST-based word labeling. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3999–4004. IEEE (2016)
Google Scholar
Looije, R., Neerincx, M.A., Cnossen, F.: Persuasive robotic assistant for health self-management of older adults: design and evaluation of social behaviors. Int. J. Hum.-Comput. Stud. (IJHCS) 68(6), 386–397 (2010)
Article Google Scholar
Luo, C., Lin, Q., Liu, Y., Jin, L., Shen, C.: Separating content from style using adversarial learning for recognizing text in the wild. Int. J. Comput. Vision (IJCV) 129(4), 960–976 (2021)
Article MathSciNet Google Scholar
Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order language priors. In: British Machine Vision Conference (BMVC). BMVA (2012)
Google Scholar
Mishra, A., Alahari, K., Jawahar, C.: Enhancing energy minimization framework for scene text recognition with top-down cues. Comput. Vision Image Underst. (CVIU) 145, 30–42 (2016)
Article Google Scholar
Mou, Y., et al.: PlugNet: degradation aware scene text recognition supervised by a pluggable super-resolution unit. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 158–174. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_10
Chapter Google Scholar
Nayef, N., et al.: ICDAR 2019 robust reading challenge on multi-lingual scene text detection and recognition-RRC-MLT-2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1582–1587. IEEE (2019)
Google Scholar
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_60
Chapter Google Scholar
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 569–576 (2013)
Google Scholar
Qiao, Z., et al.: PimNet: a parallel, iterative and mimicking network for scene text recognition. In: ACM International Conference on Multimedia, pp. 2046–2055 (2021)
Google Scholar
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13528–13537 (2020)
Google Scholar
Ramesh, A., et al.: Zero-shot text-to-image generation. In: International Conference on Machine Learning (ICML), pp. 8821–8831. PMLR (2021)
Google Scholar
Ren, W., et al.: Deep video dehazing with semantic segmentation. IEEE Trans. Image Process. (TIP) 28(4), 1895–1908 (2018)
Article MathSciNet Google Scholar
Risnumawan, A., Shivakumara, P., Chan, C.S., Tan, C.L.: A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 41(18), 8027–8048 (2014)
Article Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(11), 2298–2304 (2016)
Article Google Scholar
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4168–4176 (2016)
Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 41(9), 2035–2048 (2018)
Article Google Scholar
Shi, B., et al.: ICDAR 2017 competition on reading Chinese text in the wild (RCTW-17). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1429–1434. IEEE (2017)
Google Scholar
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: IEEE/CVF International Conference on Computer Vision (ICCV), vol. 3, pp. 1470–1470. IEEE Computer Society (2003)
Google Scholar
Su, B., Lu, S.: Accurate scene text recognition based on recurrent neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9003, pp. 35–48. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16865-4_3
Chapter Google Scholar
Sun, Y., et al.: ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1557–1562. IEEE (2019)
Google Scholar
Tchapmi, L., Choy, C., Armeni, I., Gwak, J., Savarese, S.: SegCloud: semantic segmentation of 3D point clouds. In: 2017 International Conference on 3D Vision (3DV), pp. 537–547. IEEE (2017)
Google Scholar
Tewel, Y., Shalev, Y., Schwartz, I., Wolf, L.: Zero-shot image-to-text generation for visual-semantic arithmetic. arXiv preprint arXiv:2111.14447 (2021)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 5998–6008 (2017)
Google Scholar
Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016)
Wan, Z., He, M., Chen, H., Bai, X., Yao, C.: TextScanner: reading characters in order for robust scene text recognition. In: Association for the Advancement of Artificial Intelligence (AAAI), vol. 34, pp. 12120–12127 (2020)
Google Scholar
Wang, J., Li, X., Yang, J.: Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1788–1797 (2018)
Google Scholar
Wang, J., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2020)
Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1457–1464. IEEE (2011)
Google Scholar
Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_43
Chapter Google Scholar
Wang, S., Wang, Y., Qin, X., Zhao, Q., Tang, Z.: Scene text recognition via gated cascade attention. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1018–1023. IEEE (2019)
Google Scholar
Wang, T., et al.: Decoupled attention network for text recognition. In: Association for the Advancement of Artificial Intelligence (AAAI), vol. 34, pp. 12216–12224 (2020)
Google Scholar
Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 606–615 (2018)
Google Scholar
Xu, X., Zhang, Z., Wang, Z., Price, B., Wang, Z., Shi, H.: Rethinking text segmentation: a novel dataset and a text-specific refinement approach. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12045–12055 (2021)
Google Scholar
Yan, R., Peng, L., Xiao, S., Yao, G.: Primitive representation learning for scene text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 284–293 (2021)
Google Scholar
Yang, M., et al.: Symmetry-constrained rectification network for scene text recognition. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9147–9156 (2019)
Google Scholar
Yang, X., He, D., Zhou, Z., Kifer, D., Giles, C.L.: Learning to read irregular text with attention mechanisms. In: International Joint Conference on Artificial Intelligence (IJCAI), vol. 1, p. 3 (2017)
Google Scholar
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4042–4049 (2014)
Google Scholar
Ye, J., Chen, Z., Liu, J., Du, B.: TextFuseNet: scene text detection with richer fused features. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 516–522 (2020)
Google Scholar
Yu, D., et al.: Towards accurate scene text recognition with semantic reasoning networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12113–12122 (2020)
Google Scholar
Yue, X., Kuang, Z., Lin, C., Sun, H., Zhang, W.: RobustScanner: dynamically enhancing positional clues for robust text recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 135–151. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_9
Chapter Google Scholar
Zhan, F., Lu, S.: ESIR: end-to-end scene text recognition via iterative image rectification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2059–2068 (2019)
Google Scholar
Zhang, H., Yao, Q., Yang, M., Xu, Y., Bai, X.: AutoSTR: efficient backbone search for scene text recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 751–767. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_44
Chapter Google Scholar
Zhang, R., et al.: ICDAR 2019 robust reading challenge on reading Chinese text on signboard. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1577–1581. IEEE (2019)
Google Scholar
Zhang, X., Wei, Y., Yang, Y., Huang, T.S.: SG-ONE: similarity guidance network for one-shot semantic segmentation. IEEE Trans. Cybern. 50(9), 3855–3865 (2020)
Article Google Scholar
Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2740–2749 (2019)
Google Scholar
Zhang, Y., Gueguen, L., Zharkov, I., Zhang, P., Seifert, K., Kadlec, B.: Uber-text: a large-scale dataset for optical character recognition from street-level imagery. In: IEEE International Conference on Computer Vision workshops, vol. 2017, p. 5 (2017)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2223–2232 (2017)
Google Scholar
Zhu, Y., Wang, S., Huang, Z., Chen, K.: Text recognition in images based on transformer with hierarchical attention. In: IEEE International Conference on Image Processing (ICIP), pp. 1945–1949. IEEE (2019)
Google Scholar

Download references

Acknowledgment

The work is supported by XSEDE Program of National Science Foundation, and Aspire-II Research Program in University of South Carolina. This work used GPUs provided by the NSF MRI-2018966.

Author information

Authors and Affiliations

University of South Carolina, Columbia, SC, 29201, USA
Liang Zhao, Zhenyao Wu, Xinyi Wu, Greg Wilsbacher & Song Wang

Authors

Liang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xinyi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Greg Wilsbacher
View author publications
You can also search for this author in PubMed Google Scholar
Song Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Song Wang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, L., Wu, Z., Wu, X., Wilsbacher, G., Wang, S. (2022). Background-Insensitive Scene Text Recognition with Text Semantic Segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13685. Springer, Cham. https://doi.org/10.1007/978-3-031-19806-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-19806-9_10
Published: 20 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19805-2
Online ISBN: 978-3-031-19806-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Background-Insensitive Scene Text Recognition with Text Semantic Segmentation