EAFormer: Scene Text Segmentation with Edge-Aware Transformers

Yu, Haiyang; Fu, Teng; Li, Bin; Xue, Xiangyang

doi:10.1007/978-3-031-72698-9_24

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15083))

Included in the following conference series:

European Conference on Computer Vision

235 Accesses

Abstract

Scene text segmentation aims at cropping texts from scene images, which is usually used to help generative models edit or remove texts. The existing text segmentation methods tend to involve various text-related supervisions for better performance. However, most of them ignore the importance of text edges, which are significant for downstream applications. In this paper, we propose Edge-Aware Transformers, termed EAFormer, to segment texts more accurately, especially at the edge of texts. Specifically, we first design a text edge extractor to detect edges and filter out edges of non-text areas. Then, we propose an edge-guided encoder to make the model focus more on text edges. Finally, an MLP-based decoder is employed to predict text masks. We have conducted extensive experiments on commonly-used benchmarks to verify the effectiveness of EAFormer. The experimental results demonstrate that the proposed method can perform better than previous methods, especially on the segmentation of text edges. Considering that the annotations of several benchmarks (e.g., COCO_TS and MLT_S) are not accurate enough to fairly evaluate our methods, we have relabeled these datasets. Through experiments, we observe that our method can achieve a higher performance improvement when more accurate annotations are used for training. The code and datasets are available at https://hyangyu.github.io/EAFormer/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HelixNet: Dual Helix Cooperative Decoders for Scene Text Removal

Leveraging Text Localization for Scene Text Removal via Text-Aware Masked Image Modeling

Adaptive Segmentation Network for Scene Text Detection

References

Andreini, P., et al.: A two-stage gan for high-resolution retinal image generation and segmentation. Electronics 11(1), 60 (2021)
Article Google Scholar
Bai, B., Yin, F., Liu, C.L.: A seed-based segmentation method for scene text extraction. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 262–266. IEEE (2014)
Google Scholar
Bonechi, S., Andreini, P., Bianchini, M., Scarselli, F.: COCO_TS dataset: pixel–level annotations based on weak supervision for scene text segmentation. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11729, pp. 238–250. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30508-6_20
Chapter Google Scholar
Bonechi, S., Bianchini, M., Scarselli, F., Andreini, P.: Weak supervision for generating pixel-level annotations in scene text segmentation. Pattern Recogn. Lett. 138, 1–7 (2020)
Article Google Scholar
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Article Google Scholar
Chen, J., Li, J., Pan, D., Zhu, Q., Mao, Z.: Edge-guided multiscale segmentation of satellite multispectral imagery. IEEE Trans. Geosci. Remote Sens. 50(11), 4513–4520 (2012)
Article Google Scholar
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Chapter Google Scholar
Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)
Google Scholar
Cong, R., Zhang, Y., Yang, N., Li, H., Zhang, X., Li, R., Chen, Z., Zhao, Y., Kwong, S.: Boundary guided semantic learning for real-time covid-19 lung infection segmentation system. IEEE Trans. Consum. Electron. 68(4), 376–386 (2022)
Article Google Scholar
Conrad, B., Chen, P.I.: Two-stage seamless text erasing on real-world scene images. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 1309–1313. IEEE (2021)
Google Scholar
Dai, Y., et al.: Fused text segmentation networks for multi-oriented scene text detection. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3604–3609. IEEE (2018)
Google Scholar
Du, X., Zhou, Z., Zheng, Y., Ma, T., Wu, X., Jin, C.: Modeling stroke mask for end-to-end text erasing. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6151–6159 (2023)
Google Scholar
Ess, A., Müller, T., Grabner, H., Van Gool, L.: Segmentation-based urban traffic scene understanding. In: BMVC, vol. 1, p. 2. Citeseer (2009)
Google Scholar
Fu, J., Liu, J., Jiang, J., Li, Y., Bao, Y., Lu, H.: Scene segmentation with dual relation-aware attention network. IEEE Trans. Neural Networks Learn. Syst. 32(6), 2547–2560 (2020)
Article Google Scholar
Fujisawa, H., Nakano, Y., Kurino, K.: Segmentation methods for character recognition: from segmentation to document structure analysis. Proc. IEEE 80(7), 1079–1092 (1992)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 745–753 (2017)
Google Scholar
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans. Image Process. 27(11), 5406–5419 (2018)
Article MathSciNet Google Scholar
Karatzas, D., et al.: Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE (2013)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)
Google Scholar
Liu, X., Samarabandu, J.: Multiscale edge-based text extraction from complex images. In: 2006 IEEE International Conference on Multimedia and Expo, pp. 1721–1724. IEEE (2006)
Google Scholar
Liu, Z., Li, J., Song, R., Wu, C., Liu, W., Li, Z., Li, Y.: Edge guided context aggregation network for semantic segmentation of remote sensing imagery. Remote Sensing 14(6), 1353 (2022)
Article Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2018)
Google Scholar
Lyu, G., Liu, K., Zhu, A., Uchida, S., Iwana, B.K.: Fetnet: feature erasing and transferring network for scene text removal. Pattern Recogn. 140, 109531 (2023)
Article Google Scholar
Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7553–7563 (2018)
Google Scholar
Ma, H., Yang, H., Huang, D.: Boundary guided context aggregation for semantic segmentation. arXiv preprint arXiv:2110.14587 (2021)
Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
Article Google Scholar
Ma, J., Jin, L., Zhang, J., Jiang, J., Xue, Y., He, M.: Textsrnet: scene text super-resolution based on contour prior and atrous convolution. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 3252–3258. IEEE (2022)
Google Scholar
Mustafa, W.A., Kader, M.M.M.A.: Binarization of document image using optimum threshold modification. In: Journal of Physics: Conference Series, vol. 1019, p. 012022. IOP Publishing (2018)
Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Article Google Scholar
Pack, C., Soh, L.K., Lorang, E.: Perceptual cue-guided adaptive image downscaling for enhanced semantic segmentation on large document images. Int. J. Document Anal. Recogn. (IJDAR), 1–17 (2023)
Google Scholar
Ren, Y., Zhang, J., Chen, B., Zhang, X., Jin, L.: Looking from a higher-level perspective: attention and recognition enhanced multi-scale scene text segmentation. In: Proceedings of the Asian Conference on Computer Vision, pp. 3138–3154 (2022)
Google Scholar
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000)
Article Google Scholar
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Google Scholar
Shu, R., Zhao, C., Feng, S., Zhu, L., Miao, D.: Text-enhanced scene image super-resolution via stroke mask and orthogonal attention. IEEE Trans. Circuits Syst. Video Technol. (2023)
Google Scholar
Su, B., Lu, S., Tan, C.L.: Binarization of historical document images using the local maximum and minimum. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 159–166 (2010)
Google Scholar
Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recogn. 96, 106954 (2019)
Article Google Scholar
Tang, Y., Wu, X.: Scene text detection and segmentation based on cascaded convolution neural networks. IEEE Trans. Image Process. 26(3), 1509–1520 (2017)
Article MathSciNet Google Scholar
Vo, Q.N., Kim, S.H., Yang, H.J., Lee, G.: Binarization of degraded document images based on hierarchical deep supervised network. Pattern Recogn. 74, 568–586 (2018)
Article Google Scholar
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)
Article Google Scholar
Wang, X., Wu, C., Yu, H., Li, B., Xue, X.: Textformer: component-aware text segmentation with transformer. In: 2023 IEEE International Conference on Multimedia and Expo (ICME), pp. 1877–1882. IEEE (2023)
Google Scholar
Wu, Y., Natarajan, P., Rawls, S., AbdAlmageed, W.: Learning document image binarization from data. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3763–3767. IEEE (2016)
Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021)
Google Scholar
Xu, X., Zhang, Z., Wang, Z., Price, B., Wang, Z., Shi, H.: Rethinking text segmentation: a novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12045–12055 (2021)
Google Scholar
Xu, X., Qi, Z., Ma, J., Zhang, H., Shan, Y., Qie, X.: Bts: a bi-lingual benchmark for text segmentation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19152–19162 (2022)
Google Scholar
Yin, X., Li, X., Ni, P., Xu, Q., Kong, D.: A novel real-time edge-guided lidar semantic segmentation network for unstructured environments. Remote Sensing 15(4), 1093 (2023)
Article Google Scholar
Yu, C., Wang, J., Gao, C., Yu, G., Shen, C., Sang, N.: Context prior for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12416–12425 (2020)
Google Scholar
Yu, H., Wang, X., Niu, K., Li, B., Xue, X.: Scene text segmentation with text-focused transformers. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 2898–2907 (2023)
Google Scholar
Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 173–190. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_11
Chapter Google Scholar
Zdenek, J., Nakayama, H.: Erasing scene text with weak supervision. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2238–2246 (2020)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Google Scholar
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)
Google Scholar
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Google Scholar
Zhou, Y., Feild, J., Learned-Miller, E., Wang, R.: Scene text segmentation via inverse rendering. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 457–461. IEEE (2013)
Google Scholar
Zu, X., Yu, H., Li, B., Xue, X.: Weakly-supervised text instance segmentation. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 1915–1923 (2023)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 62176060), STCSM project (No. 22511105000), Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0103), and the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning.

Author information

Authors and Affiliations

Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China
Haiyang Yu, Teng Fu, Bin Li & Xiangyang Xue

Authors

Haiyang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Teng Fu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyang Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Li .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 21721 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, H., Fu, T., Li, B., Xue, X. (2025). EAFormer: Scene Text Segmentation with Edge-Aware Transformers. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15083. Springer, Cham. https://doi.org/10.1007/978-3-031-72698-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-72698-9_24
Published: 26 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72697-2
Online ISBN: 978-3-031-72698-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

EAFormer: Scene Text Segmentation with Edge-Aware Transformers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

HelixNet: Dual Helix Cooperative Decoders for Scene Text Removal

Leveraging Text Localization for Scene Text Removal via Text-Aware Masked Image Modeling

Adaptive Segmentation Network for Scene Text Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 21721 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

EAFormer: Scene Text Segmentation with Edge-Aware Transformers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

HelixNet: Dual Helix Cooperative Decoders for Scene Text Removal

Leveraging Text Localization for Scene Text Removal via Text-Aware Masked Image Modeling

Adaptive Segmentation Network for Scene Text Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 21721 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation