A two-stage image process for water level recognition via dual-attention CornerNet and CTransformer

Qiu, Run; Cai, Zhaohui; Chang, Zhuoqing; Liu, Shubo; Tu, Guoqing

doi:10.1007/s00371-022-02501-6

A two-stage image process for water level recognition via dual-attention CornerNet and CTransformer

Original article
Published: 09 May 2022

Volume 39, pages 2933–2952, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Run Qiu¹,
Zhaohui Cai¹,
Zhuoqing Chang¹,
Shubo Liu¹ &
…
Guoqing Tu¹

511 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Image processing-based water level detectors have promising practical application value in intelligent agriculture and early water logging alerts. However, water level recognition based on image processing faces illumination, shooting angle, and sediment contamination challenges. In addition, due to the influence of water surface reflection, it is not easy to extract the water level ruler (WLR) on the water surface accurately. This paper proposes a novel dual-attention CornerNet for WLR image extraction and CTransformer for WLR sequence recognition. First, a dual-attention mechanism to obtain the global information is introduced to better predict semantic segmentation feature maps and corner information. Then, asymmetric convolution Resnet-50 is used to extract multi-local information to effectively recognize inconsistent character sizes caused by different shooting angles of WLRs. Recently, the design of vision backbone using self-attention becomes an exciting topic. In this work, an improved CTransformer is designed to retain sufficient global context information and extract more differentiated features for sequence recognition via multi-head self-attention. Evaluation using our in-house dataset shows that the proposed framework achieves an F-score of 91.37 in the detection stage and the accuracy of human estimation error within 0.3 cm in the recognition stage is 95.37%, respectively. The proposed method is also evaluated on several benchmarks. Experiment results demonstrate that the method in this paper is superior to the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PSAGNet: A Water Body Extraction Method for High Resolution Remote Sensing Images

A window-based multi-scale attention model for slope collapse detection

Article 24 November 2023

Feature augmentation and scale penalty for tiny floating detection

Article 23 September 2023

References

AshifuddinMondal, M., Rehena, Z.: Iot based intelligent agriculture field monitoring system. In: 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 625–629. IEEE (2018)
Gupta, S., Malhotra, V., Vashisht, V.: Water irrigation and flood prevention using IOT. In: 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 260–265. IEEE (2020)
Moy de Vitry, M., Kramer, S., Wegner, J.D., Leitão, J.P.: Scalable flood level trend monitoring with surveillance cameras using a deep convolutional neural network. Hydrol. Earth Syst. Sci. 23(11), 4621–4634 (2019)
Article Google Scholar
Tu, Z., Xie, W., Qin, Q., Poppe, R., Veltkamp, R.C., Li, B., Yuan, J.: Multi-stream CNN: learning representations based on human-related regions for action recognition. Pattern Recognit. 79, 32–43 (2018)
Article Google Scholar
Etter, S., Strobl, B., van Meerveld, I., Seibert, J.: Quality and timing of crowd-based water level class observations. Hydrol. Process. 34(22), 4365–4378 (2020)
Article Google Scholar
Chen, G., Bai, K., Lin, Z., Liao, X., Liu, S., Lin, Z., Zhang, Q., Jia, X.: Method on water level ruler reading recognition based on image processing. Signal Image Video Process. 15(1), 33–41 (2021)
Article Google Scholar
Huayong, L., Hua, Y.: Research on application of the scale extraction of water-level ruler based on image recognition technology. Yellow River 37(3), 28–30 (2015)
Google Scholar
Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7553–7563 (2018)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in neural information processing systems, 2017, pp. 5998–6008.
Tu, Z., Li, H., Zhang, D., Dauwels, J., Li, B., Yuan, J.: Action-stage emphasized spatiotemporal VLAD for video action recognition. IEEE Trans. on Image Process. 28, 2799–2812 (2019)
Article MathSciNet MATH Google Scholar
Chen, Y., Tu, Z., Ge, L., Zhang, D., Chen, R., Yuan, J.: So-handnet: self-organizing network for 3d hand pose estimation with semi-supervised learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6961–6970 (2019)
Lin, F., Yu, Z., Jin, Q., You, A.: Semantic segmentation and scale recognition–based water-level monitoring algorithm. J. Coast. Res. (2020). https://doi.org/10.2112/JCR-SI105-039.1
Article Google Scholar
Liao, M., Shi, M., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence (2017)
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Tu, Z., Xie, W., Dauwels, J., Li, B., Yuan, J.: Semantic cues enhanced multimodality multistream CNN for action recognition. IEEE Trans. Circuits Syst. Video Technol. 29, 1423–1437 (2018)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2017)
Wang, X., Chen, K., Huang, Z., Yao, C., Liu, W.: Point linking network for object detection (2017)
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: deconvolutional single shot detector (2017)
Zhang, J., Zhu, Y., Du, J., Dai, L.: Radical analysis network for zero-shot learning in printed Chinese character recognition. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2018)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition (2014)
Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2231–2239 (2016)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 414, 2035–2048 (2018)
Google Scholar
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13528–13537 (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need (2017)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.: An image is worth 16x16 words: transformers for image recognition at scale (2020)
Milletari, F., Navab, N., Ahmadi, S.-A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international conference on 3D vision (3DV), pp. 565–571. IEEE (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Chang, Z., Liu, S., Xiong, X., Cai, Z., Tu, G.: A survey of recent advances in edge-computing-powered artificial intelligence of things. IEEE Internet Things J. 8, 13849–13875 (2021)
Article Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE (2013)
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)
Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition (2014)
He, M., Liu, Y., Yang, Z., Zhang, S., Luo, C., Gao, F., Zheng, Q., Wang, Y., Zhang, X., Jin, L.: ICPR2018 contest on robust reading for multi-type web images. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 7–12. IEEE (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014)
Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction (2016)
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: fast oriented text spotting with a unified network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676–5685 (2018)
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: European Conference on Computer Vision, pp. 56–72. Springer (2016)
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–36 (2018)
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
Jobson, D.J., Rahman, Z., Woodell, G.A.: A multiscale retinex for bridging the gap between color images and the human observation of scenes (1997)
Das, D.K., Shit, S., Ray, D.N., Majumder, S.: CGAN: closure-guided attention network for salient object detection. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02222-2
Article Google Scholar
Zhang, Y., Han, S., Zhang, Z., Wang, J., Bi, H.: CF-GAN: cross-domain feature fusion generative adversarial network for text-to-image synthesis. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02404-6
Article Google Scholar

Download references

Acknowledgements

The research funding was supported by Major National Science and Technology Projects under Grant No. 2017ZX07108-001 and The Wuhan Frontier Project on Applied Foundations under Grant No. 2020020601012266.

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, People’s Republic of China
Run Qiu, Zhaohui Cai, Zhuoqing Chang, Shubo Liu & Guoqing Tu

Authors

Run Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohui Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoqing Chang
View author publications
You can also search for this author in PubMed Google Scholar
Shubo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guoqing Tu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaohui Cai.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qiu, R., Cai, Z., Chang, Z. et al. A two-stage image process for water level recognition via dual-attention CornerNet and CTransformer. Vis Comput 39, 2933–2952 (2023). https://doi.org/10.1007/s00371-022-02501-6

Download citation

Accepted: 10 April 2022
Published: 09 May 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00371-022-02501-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A two-stage image process for water level recognition via dual-attention CornerNet and CTransformer

Abstract

Access this article

Similar content being viewed by others

PSAGNet: A Water Body Extraction Method for High Resolution Remote Sensing Images

A window-based multi-scale attention model for slope collapse detection

Feature augmentation and scale penalty for tiny floating detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A two-stage image process for water level recognition via dual-attention CornerNet and CTransformer

Abstract

Access this article

Similar content being viewed by others

PSAGNet: A Water Body Extraction Method for High Resolution Remote Sensing Images

A window-based multi-scale attention model for slope collapse detection

Feature augmentation and scale penalty for tiny floating detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation