Abstract
In recent years, convolutional neural network (CNN)-based scene text detection methods have been extensively studied and obtained successful results in public datasets. However, scene text detection in adverse weather conditions suffers from poor visibility. In this paper, we use a multi-task learning approach to resolve this issue. We construct a foggy text detection network (FTDNet) composed of dual subnetworks: a text detection subnetwork and a visibility enhancement subnetwork. We employ DBNet as the text detection subnetwork, which shares the feature extraction layers for both two subnetworks. And we design a feature visibility enhancement (FVE) module for visibility enhancement subnetwork. In order to enable joint learning of multi-task networks, a novelty loss function called the mask dehazing loss is applied. This method achieved state-of-the-art results in terms of detection on both synthetic datasets and real-to-world datasets.
Supported by National Natural Science Foundation of China under grant No. 62166040, 62261053, 62137002, Natural Science Foundation of Xinjiang Autonomous Region under grant No. 2021D01C057, and the National Key R &D Program of China under grant No. 2021ZD0113601.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cai, B., Xu, X., Jia, K., Qing, C., Tao, D.: Dehazenet: an end-to-end system for single image haze removal. IEEE Trans. Image Process. 25(11), 5187–5198 (2016)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Dai, P., Zhang, S., Zhang, H., Cao, X.: Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7393–7402 (2021)
Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3cl: intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. Int. J. Comput. Vision 130, 1961–1977 (2022)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification (2015). arXiv e-prints arXiv:1502.01852. https://doi.org/10.48550/arXiv.1502.01852
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2017)
Huang, S.C., Le, T.H., Jaw, D.W.: Dsnet: joint semantic learning for object detection in inclement weather conditions. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2623–2633 (2020)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Karatzas, D., et al.: Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7482–7491 (2018)
Li, B., Peng, X., Wang, Z., Xu, J., Feng, D.: Aod-net: all-in-one dehazing network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4770–4778 (2017)
Li, B., et al.: Benchmarking single-image dehazing and beyond. IEEE Trans. Image Process. 28(1), 492–505 (2018)
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11474–11481 (2020)
Liao, M., Zhu, Z., Shi, B., Xia, G.s., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918 (2018)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Liu, X., Ma, Y., Shi, Z., Chen, J.: Griddehazenet: attention-based multi-scale network for image dehazing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7314–7323 (2019)
Ma, J.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
McCartney, E.J.: Optics of the atmosphere: scattering by molecules and particles. New York (1976)
Nayef, N., et al.: Icdar 2019 robust reading challenge on multi-lingual scene text detection and recognition-rrc-mlt-2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1582–1587. IEEE (2019)
Nayef, N., et al.: Icdar 2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1454–1459. IEEE (2017)
Qin, X., Wang, Z., Bai, Y., Xie, X., Jia, H.: Ffa-net: feature fusion attention network for single image dehazing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11908–11915 (2020)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 1–9 (2015)
Ren, W., Pan, J., Zhang, H., Cao, X., Yang, M.H.: Single image dehazing via multi-scale convolutional neural networks with holistic edges. Int. J. Comput. Vision 128, 240–259 (2020)
Sheng, T., Chen, J., Lian, Z.: Centripetaltext: an efficient text instance representation for scene text detection. Adv. Neural Inf. Process. Syst. 34, 335–346 (2021)
Shi, B., et al.: Icdar 2017 competition on reading Chinese text in the wild (rctw-17). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1429–1434. IEEE (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Sun, Y., et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-LSVT. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1557–1562. IEEE (2019)
Ullah, H., et al.: Light-dehazenet: a novel lightweight cnn architecture for single image dehazing. IEEE Trans. Image Process. 30, 8968–8982 (2021)
Wang, C.Y., Yeh, I.H., Liao, H.Y.M.: You only learn one representation: unified network for multiple tasks (2021). arXiv preprint arXiv:2105.04206
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Wang, W., et al.: Pan++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5349–5367 (2021)
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
Wu, H., et al.: Contrastive learning for compact single image dehazing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10551–10560 (2021)
Zhang, H., Patel, V.M.: Densely connected pyramid dehazing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2018)
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tian, J., Zhou, G., Liu, Y., Deng, E., Jia, Z. (2023). FTDNet: Joint Semantic Learning for Scene Text Detection in Adverse Weather Conditions. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-41734-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)