FTDNet: Joint Semantic Learning for Scene Text Detection in Adverse Weather Conditions

Tian, Jiakun; Zhou, Gang; Liu, Yangxin; Deng, En; Jia, Zhenhong

doi:10.1007/978-3-031-41734-4_9

Jiakun Tian¹¹,
Gang Zhou¹¹,
Yangxin Liu¹¹,
En Deng¹¹ &
…
Zhenhong Jia¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

International Conference on Document Analysis and Recognition

667 Accesses

Abstract

In recent years, convolutional neural network (CNN)-based scene text detection methods have been extensively studied and obtained successful results in public datasets. However, scene text detection in adverse weather conditions suffers from poor visibility. In this paper, we use a multi-task learning approach to resolve this issue. We construct a foggy text detection network (FTDNet) composed of dual subnetworks: a text detection subnetwork and a visibility enhancement subnetwork. We employ DBNet as the text detection subnetwork, which shares the feature extraction layers for both two subnetworks. And we design a feature visibility enhancement (FVE) module for visibility enhancement subnetwork. In order to enable joint learning of multi-task networks, a novelty loss function called the mask dehazing loss is applied. This method achieved state-of-the-art results in terms of detection on both synthetic datasets and real-to-world datasets.

Supported by National Natural Science Foundation of China under grant No. 62166040, 62261053, 62137002, Natural Science Foundation of Xinjiang Autonomous Region under grant No. 2021D01C057, and the National Key R &D Program of China under grant No. 2021ZD0113601.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Cai, B., Xu, X., Jia, K., Qing, C., Tao, D.: Dehazenet: an end-to-end system for single image haze removal. IEEE Trans. Image Process. 25(11), 5187–5198 (2016)
Article MathSciNet MATH Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Dai, P., Zhang, S., Zhang, H., Cao, X.: Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7393–7402 (2021)
Google Scholar
Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. Int. J. Comput. Vision 130, 1961–1977 (2022)
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification (2015). arXiv e-prints arXiv:1502.01852. https://doi.org/10.48550/arXiv.1502.01852
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2017)
Google Scholar
Huang, S.C., Le, T.H., Jaw, D.W.: Dsnet: joint semantic learning for object detection in inclement weather conditions. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2623–2633 (2020)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Karatzas, D., et al.: Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Google Scholar
Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7482–7491 (2018)
Google Scholar
Li, B., Peng, X., Wang, Z., Xu, J., Feng, D.: Aod-net: all-in-one dehazing network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4770–4778 (2017)
Google Scholar
Li, B., et al.: Benchmarking single-image dehazing and beyond. IEEE Trans. Image Process. 28(1), 492–505 (2018)
Article MathSciNet MATH Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11474–11481 (2020)
Google Scholar
Liao, M., Zhu, Z., Shi, B., Xia, G.s., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918 (2018)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Liu, X., Ma, Y., Shi, Z., Chen, J.: Griddehazenet: attention-based multi-scale network for image dehazing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7314–7323 (2019)
Google Scholar
Ma, J.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
Article MathSciNet Google Scholar
McCartney, E.J.: Optics of the atmosphere: scattering by molecules and particles. New York (1976)
Google Scholar
Nayef, N., et al.: Icdar 2019 robust reading challenge on multi-lingual scene text detection and recognition-rrc-mlt-2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1582–1587. IEEE (2019)
Google Scholar
Nayef, N., et al.: Icdar 2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1454–1459. IEEE (2017)
Google Scholar
Qin, X., Wang, Z., Bai, Y., Xie, X., Jia, H.: Ffa-net: feature fusion attention network for single image dehazing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11908–11915 (2020)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 1–9 (2015)
Google Scholar
Ren, W., Pan, J., Zhang, H., Cao, X., Yang, M.H.: Single image dehazing via multi-scale convolutional neural networks with holistic edges. Int. J. Comput. Vision 128, 240–259 (2020)
Article Google Scholar
Sheng, T., Chen, J., Lian, Z.: Centripetaltext: an efficient text instance representation for scene text detection. Adv. Neural Inf. Process. Syst. 34, 335–346 (2021)
Google Scholar
Shi, B., et al.: Icdar 2017 competition on reading Chinese text in the wild (rctw-17). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1429–1434. IEEE (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Sun, Y., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-LSVT. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1557–1562. IEEE (2019)
Google Scholar
Ullah, H., et al.: Light-dehazenet: a novel lightweight cnn architecture for single image dehazing. IEEE Trans. Image Process. 30, 8968–8982 (2021)
Article Google Scholar
Wang, C.Y., Yeh, I.H., Liao, H.Y.M.: You only learn one representation: unified network for multiple tasks (2021). arXiv preprint arXiv:2105.04206
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Wang, W., et al.: Pan++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5349–5367 (2021)
Google Scholar
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
Google Scholar
Wu, H., et al.: Contrastive learning for compact single image dehazing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10551–10560 (2021)
Google Scholar
Zhang, H., Patel, V.M.: Densely connected pyramid dehazing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2018)
Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Signal Detection and Processing, School of Information Science and Engineering, Xinjiang University, Ürümqi, China
Jiakun Tian, Gang Zhou, Yangxin Liu, En Deng & Zhenhong Jia

Authors

Jiakun Tian
View author publications
You can also search for this author in PubMed Google Scholar
Gang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yangxin Liu
View author publications
You can also search for this author in PubMed Google Scholar
En Deng
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhong Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Zhou .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, J., Zhou, G., Liu, Y., Deng, E., Jia, Z. (2023). FTDNet: Joint Semantic Learning for Scene Text Detection in Adverse Weather Conditions. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-41734-4_9
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

FTDNet: Joint Semantic Learning for Scene Text Detection in Adverse Weather Conditions