Abstract
Image processing-based water level detectors have promising practical application value in intelligent agriculture and early water logging alerts. However, water level recognition based on image processing faces illumination, shooting angle, and sediment contamination challenges. In addition, due to the influence of water surface reflection, it is not easy to extract the water level ruler (WLR) on the water surface accurately. This paper proposes a novel dual-attention CornerNet for WLR image extraction and CTransformer for WLR sequence recognition. First, a dual-attention mechanism to obtain the global information is introduced to better predict semantic segmentation feature maps and corner information. Then, asymmetric convolution Resnet-50 is used to extract multi-local information to effectively recognize inconsistent character sizes caused by different shooting angles of WLRs. Recently, the design of vision backbone using self-attention becomes an exciting topic. In this work, an improved CTransformer is designed to retain sufficient global context information and extract more differentiated features for sequence recognition via multi-head self-attention. Evaluation using our in-house dataset shows that the proposed framework achieves an F-score of 91.37 in the detection stage and the accuracy of human estimation error within 0.3 cm in the recognition stage is 95.37%, respectively. The proposed method is also evaluated on several benchmarks. Experiment results demonstrate that the method in this paper is superior to the existing methods.
Similar content being viewed by others
References
AshifuddinMondal, M., Rehena, Z.: Iot based intelligent agriculture field monitoring system. In: 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 625–629. IEEE (2018)
Gupta, S., Malhotra, V., Vashisht, V.: Water irrigation and flood prevention using IOT. In: 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 260–265. IEEE (2020)
Moy de Vitry, M., Kramer, S., Wegner, J.D., Leitão, J.P.: Scalable flood level trend monitoring with surveillance cameras using a deep convolutional neural network. Hydrol. Earth Syst. Sci. 23(11), 4621–4634 (2019)
Tu, Z., Xie, W., Qin, Q., Poppe, R., Veltkamp, R.C., Li, B., Yuan, J.: Multi-stream CNN: learning representations based on human-related regions for action recognition. Pattern Recognit. 79, 32–43 (2018)
Etter, S., Strobl, B., van Meerveld, I., Seibert, J.: Quality and timing of crowd-based water level class observations. Hydrol. Process. 34(22), 4365–4378 (2020)
Chen, G., Bai, K., Lin, Z., Liao, X., Liu, S., Lin, Z., Zhang, Q., Jia, X.: Method on water level ruler reading recognition based on image processing. Signal Image Video Process. 15(1), 33–41 (2021)
Huayong, L., Hua, Y.: Research on application of the scale extraction of water-level ruler based on image recognition technology. Yellow River 37(3), 28–30 (2015)
Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7553–7563 (2018)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in neural information processing systems, 2017, pp. 5998–6008.
Tu, Z., Li, H., Zhang, D., Dauwels, J., Li, B., Yuan, J.: Action-stage emphasized spatiotemporal VLAD for video action recognition. IEEE Trans. on Image Process. 28, 2799–2812 (2019)
Chen, Y., Tu, Z., Ge, L., Zhang, D., Chen, R., Yuan, J.: So-handnet: self-organizing network for 3d hand pose estimation with semi-supervised learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6961–6970 (2019)
Lin, F., Yu, Z., Jin, Q., You, A.: Semantic segmentation and scale recognition–based water-level monitoring algorithm. J. Coast. Res. (2020). https://doi.org/10.2112/JCR-SI105-039.1
Liao, M., Shi, M., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence (2017)
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Tu, Z., Xie, W., Dauwels, J., Li, B., Yuan, J.: Semantic cues enhanced multimodality multistream CNN for action recognition. IEEE Trans. Circuits Syst. Video Technol. 29, 1423–1437 (2018)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2017)
Wang, X., Chen, K., Huang, Z., Yao, C., Liu, W.: Point linking network for object detection (2017)
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: deconvolutional single shot detector (2017)
Zhang, J., Zhu, Y., Du, J., Dai, L.: Radical analysis network for zero-shot learning in printed Chinese character recognition. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2018)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition (2014)
Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2231–2239 (2016)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 414, 2035–2048 (2018)
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13528–13537 (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need (2017)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.: An image is worth 16x16 words: transformers for image recognition at scale (2020)
Milletari, F., Navab, N., Ahmadi, S.-A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international conference on 3D vision (3DV), pp. 565–571. IEEE (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Chang, Z., Liu, S., Xiong, X., Cai, Z., Tu, G.: A survey of recent advances in edge-computing-powered artificial intelligence of things. IEEE Internet Things J. 8, 13849–13875 (2021)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE (2013)
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)
Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition (2014)
He, M., Liu, Y., Yang, Z., Zhang, S., Luo, C., Gao, F., Zheng, Q., Wang, Y., Zhang, X., Jin, L.: ICPR2018 contest on robust reading for multi-type web images. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 7–12. IEEE (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014)
Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction (2016)
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: fast oriented text spotting with a unified network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676–5685 (2018)
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: European Conference on Computer Vision, pp. 56–72. Springer (2016)
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–36 (2018)
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
Jobson, D.J., Rahman, Z., Woodell, G.A.: A multiscale retinex for bridging the gap between color images and the human observation of scenes (1997)
Das, D.K., Shit, S., Ray, D.N., Majumder, S.: CGAN: closure-guided attention network for salient object detection. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02222-2
Zhang, Y., Han, S., Zhang, Z., Wang, J., Bi, H.: CF-GAN: cross-domain feature fusion generative adversarial network for text-to-image synthesis. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02404-6
Acknowledgements
The research funding was supported by Major National Science and Technology Projects under Grant No. 2017ZX07108-001 and The Wuhan Frontier Project on Applied Foundations under Grant No. 2020020601012266.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qiu, R., Cai, Z., Chang, Z. et al. A two-stage image process for water level recognition via dual-attention CornerNet and CTransformer. Vis Comput 39, 2933–2952 (2023). https://doi.org/10.1007/s00371-022-02501-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02501-6