Skip to main content
Log in

CorFormer: a hybrid transformer-CNN architecture for corrosion segmentation on metallic surfaces

  • Research
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The importance of periodic corrosion inspection in steel structures cannot be overstated. However, current manual inspection approaches are fraught with challenges: they are time-consuming, subjective, and pose risks. To address these limitations, extensive research has been conducted over the past decade gauging the feasibility of Convolutional Neural Networks (CNNs) for automation of corrosion inspection. Meanwhile, Transformer networks have recently emerged as powerful tools in computer vision due to their ability to model intricate global relationships. In this paper, a novel hybrid architecture, dubbed CorFormer, is proposed for effective and efficient automation of corrosion inspection. The CorFormer network fuses Transformer and CNN layers at different stages of the encoder, which captures global context through Transformer layers while leveraging the inherent inductive bias of CNNs. To bridge the semantic gap between features generated by Transformer and CNN layers, a Semantic Gap Merger (SGM) module is introduced after each feature merge operation. The encoder is complemented by a hierarchical decoder, able to decrypt complex features at large and small scales. CorFormer is compared against state-of-the-art CNN and Transformer architectures for corrosion segmentation, and is found to outperform the best alternative by 2.7% in terms of Intersection over Union (IoU) across 10 validation data splits. Furthermore, it enables real-time inspection at an impressive rate of 28 frames per second. Rigorous statistical tests provide support for the findings presented in this study, and an extensive ablation study validates all design choices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  1. Koch, G., Varney, J., Thompson, N., Moghissi, O., Gould, M., Payer, J.: International measures of prevention, application, and economics of corrosion technologies study. NACE International IMPACT Report (2016)

  2. The Federal Highway Administration of The United States Department of Transportation: National Bridge Inspection Standard. The Federal Highway Administration of The United States Department of Transportation

  3. Pidaparti, R.M., Aghazadeh, B.S., Whitfield, A., Rao, A., Mercier, G.P.: Classification of corrosion defects in NIAl bronze through image analysis. Corros. Sci. 52(11), 3661–3666 (2010)

    Article  Google Scholar 

  4. Chun, P., Funatani, K., Furukwa, S., Ohga, M.: Grade classification of corrosion damage on the surface of weathering steel members by digital image processing. In: Proceedings of the Thirteenth East Asia-Pacific Conference on Structural Engineering and Construction (EASEC-13), p. 4 (2013)

  5. Shen, H.-K., Chen, P.-H., Chang, L.-M.: Automated steel bridge coating rust defect recognition method based on color and texture feature. Autom. Constr. 31, 338–356 (2013)

    Article  MATH  Google Scholar 

  6. Jahanshahi, M.R., Masri, S.F.: Parametric performance evaluation of wavelet-based corrosion detection algorithms for condition assessment of civil infrastructure systems. J. Comput. Civ. Eng. 27(4), 345–357 (2012)

    Article  MATH  Google Scholar 

  7. Ghanta, S., Karp, T., Lee, S.: Wavelet domain detection of rust in steel bridge images. In: Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference On, pp. 1033–1036. IEEE (2011)

  8. Jahanshahi, M.R., Kelly, J.S., Masri, S.F., Sukhatme, G.S.: A survey and evaluation of promising approaches for automatic image-based defect detection of bridge structures. Struct. Infrastruct. Eng. 5(6), 455–486 (2009)

    Article  MATH  Google Scholar 

  9. Liao, K.-W., Lee, Y.-T.: Detection of rust defects on steel bridge coatings via digital image recognition. Autom. Constr. 71, 294–306 (2016)

    Article  MATH  Google Scholar 

  10. Son, H., Hwang, N., Kim, C., Kim, C.: Rapid and automated determination of rusted surface areas of a steel bridge for robotic maintenance systems. Autom. Constr. 42, 13–24 (2014)

    Article  MATH  Google Scholar 

  11. Shen, H.-K., Chen, P.-H., Chang, L.-M.: Human-visual-perception-like intensity recognition for color rust images based on artificial neural network. Autom. Constr. 90, 178–187 (2018)

    Article  MATH  Google Scholar 

  12. Khan, A., Rauf, Z., Sohail, A., Khan, A.R., Asif, H., Asif, A., Farooq, U.: A survey of the vision transformers and their CNN-transformer based variants. Artif. Intell. Rev. 56(Suppl 3), 2917–2970 (2023)

    Article  MATH  Google Scholar 

  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  14. Tulbure, A.-A., Tulbure, A.-A., Dulf, E.-H.: A review on modern defect detection models using DCNNs-deep convolutional neural networks. J. Adv. Res. 35, 33–48 (2022)

    Article  Google Scholar 

  15. Zhu, J., Zhang, C., Qi, H., Lu, Z.: Vision-based defects detection for bridges using transfer learning and convolutional neural networks. Struct. Infrastruct. Eng. 16(7), 1037–1049 (2020)

    Article  MATH  Google Scholar 

  16. Subedi, A., Tang, W., Mondal, T.G., Wu, R.-T., Jahanshahi, M.R.: Ensemble-based deep learning for autonomous bridge component and damage segmentation leveraging nested reg-unet. Smart Struct. Syst. 31(4), 335–349 (2023)

    Google Scholar 

  17. Chen, F.-C., Jahanshahi, M.R.: ARF-crack: rotation invariant deep fully convolutional network for pixel-level crack detection. Mach. Vis. Appl. 31(6), 47 (2020)

    Article  MATH  Google Scholar 

  18. Yang, Y., Yang, S., Zhao, Q., Cao, H., Peng, X.: Weakly supervised collaborative localization learning method for sewer pipe defect detection. Mach. Vis. Appl. 35(5), 1–15 (2024)

    Article  MATH  Google Scholar 

  19. Cha, Y.-J., Choi, W., Suh, G., Mahmoudkhani, S., Büyüköztürk, O.: Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput. Aided Civ. Infrastruct. Eng. 33(9), 731–747 (2018)

    Article  Google Scholar 

  20. Bastian, B.T., Jaspreeth, N., Ranjith, S.K., Jiji, C.: Visual inspection and characterization of external corrosion in pipelines using deep neural network. NDT & E Int. 107, 102134 (2019)

    Article  Google Scholar 

  21. Atha, D.J., Jahanshahi, M.R.: Evaluation of deep learning approaches based on convolutional neural networks for corrosion detection. Struct. Health Monit. 17(5), 1110–1128 (2018)

    Article  MATH  Google Scholar 

  22. Liu, L., Tan, E., Zhen, Y., Yin, X.J., Cai, Z.Q.: Ai-facilitated coating corrosion assessment system for productivity enhancement. In: 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 606–610. IEEE (2018)

  23. Nguyen, T., Ozaslan, T., Miller, I.D., Keller, J., Loianno, G., Taylor, C.J., Lee, D.D., Kumar, V., Harwood, J.H., Wozencraft, J.: U-Net for MAV-based penstock inspection: an investigation of focal loss in multi-class segmentation for corrosion identification. arXiv preprint arXiv:1809.06576 (2018)

  24. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Springer (2015)

  25. Nash, W., Drummond, T., Birbilis, N.: Quantity beats quality for semantic segmentation of corrosion in images. arXiv preprint arXiv:1807.03138 (2018)

  26. Hoskere, V., Narazaki, Y., Hoang, T., Spencer Jr, B.: Vision-based structural inspection using multiscale deep convolutional neural networks. arXiv preprint arXiv:1805.01055 (2018)

  27. Duy, L.D., Anh, N.T., Son, N.T., Tung, N.V., Duong, N.B., Khan, M.H.R.: Deep learning in semantic segmentation of rust in images. In: Proceedings of the 2020 9th International Conference on Software and Computer Applications, pp. 129–132 (2020)

  28. Huang, J., Liu, Q., Xiang, L., Li, G., Zhang, Y., Chen, W.: A lightweight residual model for corrosion segmentation with local contextual information. Appl. Sci. (2022). https://doi.org/10.3390/app12189095

    Article  MATH  Google Scholar 

  29. Zhu, T., Zhu, S., Zheng, T., Ding, H., Song, W., Li, C.: HEU-Net: hybrid attention residual block-based network with external skip connections for metal corrosion semantic segmentation. Vis. Comput. 40, 1–15 (2023)

  30. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  31. Safa, A., Mohamed, A., Issam, B., Mohamed-Yassine, H.: SegFormer: semantic segmentation based tranformers for corrosion detection. In: 2023 International Conference on Networking and Advanced Systems (ICNAS), pp. 1–6. IEEE (2023)

  32. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021)

    MATH  Google Scholar 

  33. Sookpong, S., Phimsiri, S., Tosawadi, T., Choppradit, P., Suttichaya, V., Utintu, C., Thamwiwatthana, E.: Comparison of corrosion segmentation techniques on oil and gas offshore critical assets. In: 2023 20th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 1–5. IEEE (2023)

  34. Chen, Z., Duan, Y., Wang, W., He, J., Lu, T., Dai, J., Qiao, Y.: Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534 (2022)

  35. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network, pp. 2881–2890 (2017)

  36. Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al.: Swin transformer V2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)

  37. Tan, M., Le, Q.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)

  38. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  39. Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)

  40. Chaurasia, A., Culurciello, E.: LinkNet: exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2017)

  41. Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ade20k dataset. Int. J. Comput. Vis. 127, 302–321 (2019)

    Article  Google Scholar 

  42. Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 418–434 (2018)

  43. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

  44. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  45. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3, pp. 240–248. Springer (2017)

  46. Soomro, T.A., Afifi, A.J., Gao, J., Hellwich, O., Paul, M., Zheng, L.: Strided u-net model: retinal vessels segmentation using dice loss. In: 2018 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8. IEEE (2018)

  47. Zhang, Y., Liu, S., Li, C., Wang, J.: Rethinking the dice loss for deep learning lesion segmentation in medical images. J. Shanghai Jiaotong Univ. (Sci.) 26, 93–102 (2021)

    Article  MATH  Google Scholar 

  48. Lu, Y., Zhou, J.H., Guan, C.: Minimizing hybrid dice loss for highly imbalanced 3d neuroimage segmentation. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 1059–1062. IEEE (2020)

  49. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)

    Article  MATH  Google Scholar 

  50. Nadeau, C., Bengio, Y.: Inference for the generalization error. In: Advances in Neural Information Processing Systems, vol. 12 (1999)

  51. Corani, G., Benavoli, A., Demšar, J., Mangili, F., Zaffalon, M.: Statistical comparison of classifiers through Bayesian hierarchical modelling. Mach. Learn. 106, 1817–1837 (2017)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

A.S. conceptualized and implemented the architecture, performed the experiments, and wrote the manuscript. C.Q. collected the dataset, and wrote part of the introduction and literature review sections. R.S. helped C.Q. with data collection and literature review, along with helping edit the manuscript. M.R.J. provided guidance throughout the duration of the project, from conception to completion, as well as helped edit the manuscript.

Corresponding author

Correspondence to Abhishek Subedi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Subedi, A., Qian, C., Sadeghian, R. et al. CorFormer: a hybrid transformer-CNN architecture for corrosion segmentation on metallic surfaces. Machine Vision and Applications 36, 45 (2025). https://doi.org/10.1007/s00138-025-01663-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-025-01663-2

Keywords