Skip to main content
Log in

A novel hybrid attention gate based on vision transformer for the detection of surface defects

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Many advanced models have been proposed for automatic surface defect inspection. Although CNN-based methods have achieved superior performance among these models, it is limited to extracting global semantic details due to the locality of the convolution operation. In addition, global semantic details can achieve high success for detecting surface defects. Recently, inspired by the success of Transformer, which has powerful abilities to model global semantic details with global self-attention mechanisms, some researchers have started to apply Transformer-based methods in many computer-vision challenges. However, as many researchers notice, transformers lose spatial details while extracting semantic features. To alleviate these problems, in this paper, a transformer-based Hybrid Attention Gate (HAG) model is proposed to extract both global semantic features and spatial features. The HAG model consists of Transformer (Trans), channel Squeeze-spatial Excitation (sSE), and merge process. The Trans model extracts global semantic features and the sSE extracts spatial features. The merge process which consists of different versions such as concat, add, max, and mul allows these two different models to be combined effectively. Finally, four versions based on HAG-Feature Fusion Network (HAG-FFN) were developed using the proposed HAG model for the detection of surface defects. The four different datasets were used to test the performance of the proposed HAG-FFN versions. In the experimental studies, the proposed model produced 83.83%, 79.34%, 76.53%, and 81.78% mIoU scores for MT, MVTec-Texture, DAGM, and AITEX datasets. These results show that the proposed HAGmax-FFN model provided better performance than the state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The code of this study is available on request from the corresponding author.

References

  1. Cao, J., Yang, G., Yang, X.: A pixel-level segmentation convolutional neural network based on Deep Feature Fusion for Surface defect detection. IEEE Trans. Instrum. Meas. 70 (2021). https://doi.org/10.1109/TIM.2020.3033726

  2. Hanbay, K., Golgiyaz, S., Talu, M.F.: Real time fabric defect detection system on Matlab and C++/Opencv platforms. In: 2017 International Artificial Intelligence and Data Processing Symposium (IDAP). pp. 1–8. IEEE, Malatya (2017)

  3. Silvén, O., Niskanen, M., Kauppinen, H.: Wood inspection with non-supervised clustering. Mach. Vis. Appl. 13, 275–285 (2003). https://doi.org/10.1007/s00138-002-0084-z

    Article  Google Scholar 

  4. Aslam, Y., Santhi, N., Ramasamy, N., Ramar, K.: Localization and segmentation of metal cracks using deep learning. J. Ambient Intell. Humaniz. Comput. 1, 3 (2020). https://doi.org/10.1007/s12652-020-01803-8

    Article  Google Scholar 

  5. Hocenski, Ž., Vasilić, S., Hocenski, V.: Improved canny edge detector in ceramic tiles defect detection. In: IECON Proceedings (Industrial Electronics Conference). pp. 3328–3331. IEEE Computer Society (2006)

  6. Jawahar, M., Jani Anbarasi, L., Graceline Jasmine, S., Narendra, M., Venba, R., Karthik, V.: A Machine Learning-Based Multi-feature Extraction Method for Leather Defect Classification. Lecture Notes in Networks and Systems. 173 LNNS, 189–202 (2021). https://doi.org/10.1007/978-981-33-4305-4_15

  7. Ren, R., Hung, T., Tan, K.C.: A generic deep-learning-based Approach for Automated Surface Inspection. IEEE Trans. Cybern. 48, 929–940 (2018). https://doi.org/10.1109/TCYB.2017.2668395

    Article  Google Scholar 

  8. Dong, H., Song, K., He, Y., Xu, J., Yan, Y., Meng, Q.: PGA-Net: Pyramid feature Fusion and global context attention network for automated surface defect detection. IEEE Trans. Industr Inf. 16, 7448–7458 (2020). https://doi.org/10.1109/TII.2019.2958826

    Article  Google Scholar 

  9. Qiu, L., Wu, X., Yu, Z.: A high-efficiency fully Convolutional Networks for Pixel-wise Surface defect detection. IEEE Access. 7, 15884–15893 (2019). https://doi.org/10.1109/ACCESS.2019.2894420

    Article  Google Scholar 

  10. Hanbay, K., Talu, M.F., Özgüven, Ö.F.: Fabric defect detection systems and methods—A systematic literature review. Optik (Stuttg). 127, 11960–11973 (2016). https://doi.org/10.1016/j.ijleo.2016.09.110

    Article  Google Scholar 

  11. Bhatt, P.M., Malhan, R.K., Rajendran, P., Shah, B.C., Thakar, S., Yoon, Y.J., Gupta, S.K.: Image-Based Surface Defect Detection Using Deep Learning: A Review, (2021)

  12. Cao, G., Ruan, S., Peng, Y., Huang, S., Kwok, N.: Large-complex-surface defect detection by hybrid gradient threshold segmentation and image Registration. IEEE Access. 6, 36235–36246 (2018). https://doi.org/10.1109/ACCESS.2018.2842028

    Article  Google Scholar 

  13. Wakaf, Z., Jalab, H.A.: Defect detection based on extreme edge of defective region histogram. J. King Saud Univ. - Comput. Inform. Sci. 30, 33–40 (2018). https://doi.org/10.1016/j.jksuci.2016.11.001

    Article  Google Scholar 

  14. Cui, D., Xia, K.: Dimension reduction and defect recognition of Strip Surface defects based on Intelligent Information Processing. Arab. J. Sci. Eng. 2017 43(12), 43, 6729–6736 (2017). https://doi.org/10.1007/S13369-017-2825-3

    Article  Google Scholar 

  15. Makaremi, M., Razmjooy, N., Ramezani, M.: A new method for detecting texture defects based on modified local binary pattern. Signal. Image Video Process. 12, 1395–1401 (2018). https://doi.org/10.1007/s11760-018-1294-9

    Article  Google Scholar 

  16. Boroujeni, H.S., Charkari, N.M.: Robust moving shadow detection with hierarchical mixture of MLP experts. Signal. Image Video Process. 8, 1291–1305 (2014). https://doi.org/10.1007/s11760-012-0358-5

    Article  Google Scholar 

  17. Hamouche, K., Rasolofondraibe, L., Chiementin, X., Felkaoui, A.: Localization of defects in rolling element bearings by dynamic classification based on meta-analysis of indicators: Supervised real-time OPTICS method. Arab. J. Sci. Eng. 1–21 (2022). https://doi.org/10.1007/S13369-021-06528-0/TABLES/5

  18. Uzen, H., Turkoglu, M., Hanbay, D.: Texture defect classification with multiple pooling and filter ensemble based on deep neural network. Expert Syst. Appl. 175, 114838 (2021). https://doi.org/10.1016/j.eswa.2021.114838

    Article  Google Scholar 

  19. Firat, H., Hanbay, D.: 4CF-Net: New 3D convolutional neural network for spectral spatial classification of hyperspectral remote sensing images. J. Fac. Eng. Archit. Gazi Univ. 37, 439–454 (2021)

    Google Scholar 

  20. Turkoglu, M., Hanbay, D.: Plant recognition system based on deep features and color-LBP method. In: 27th Signal Processing and Communications Applications Conference, SIU 2019. Institute of Electrical and Electronics Engineers Inc. (2019)

  21. Tan, M., Le, Q.: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. 36th International Conference on Machine Learning, ICML 2019. 2019-June, 10691–10700 Accessed: Mar. 04, 2021. [Online]. Available: (2019). http://arxiv.org/abs/1905.11946

  22. Tabernik, D., Šela, S., Skvarč, J., Skočaj, D.: Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 31, 759–776 (2020). https://doi.org/10.1007/S10845-019-01476-X/FIGURES/17

    Article  Google Scholar 

  23. Wang, J., Lv, P., Wang, H., Shi, C.: SAR-U-Net: Squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver segmentation in computed tomography. Comput. Methods Programs Biomed. 208, 106268 (2021). https://doi.org/10.1016/J.CMPB.2021.106268

    Article  Google Scholar 

  24. Roy, A.G., Navab, N., Wachinger, C.: Recalibrating fully convolutional networks with spatial and Channel Squeeze and excitation blocks. IEEE Trans. Med. Imaging. 38, 540–549 (2019). https://doi.org/10.1109/TMI.2018.2867261

    Article  Google Scholar 

  25. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-Excitation Networks. IEEE Trans Pattern Anal Mach Intell. 42, 2011–2023 Accessed: Jun. 18, 2021. [Online]. Available: (2017). http://arxiv.org/abs/1709.01507

  26. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Accessed: Oct. 18, 2021. [Online]. Available: https://arxiv.org/abs/2010.11929v2 (2020)

  27. Wang, W., Su, C.: Automatic classification of Reinforced concrete bridge defects using the Hybrid Network. Arab. J. Sci. Eng. 1–11 (2022). https://doi.org/10.1007/S13369-021-06474-X/FIGURES/8

  28. Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. Accessed: Oct. 12, 2021. [Online]. Available: (2021). https://arxiv.org/abs/2102.04306v1

  29. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. (2021), Accessed: Oct. 12, 2021. [Online]. Available: http://arxiv.org/abs/2103.14030

  30. Lin, A., Chen, B., Xu, J., Zhang, Z., Lu, G.: DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation. Accessed: Oct. 12, 2021. [Online]. Available: (2021). https://arxiv.org/abs/2106.06716v1

  31. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: Inverted Residuals and Linear Bottlenecks. IEEE Computer Society Accessed: Mar. 04, 2021. [Online]. Available: (2018). http://arxiv.org/abs/1801.04381

  32. Masci, J., Meier, U., Ciresan, D., Schmidhuber, J., Fricout, G.: Steel defect classification with Max-Pooling Convolutional Neural Networks. Proceedings of the International Joint Conference on Neural Networks. (2012). https://doi.org/10.1109/IJCNN.2012.6252468

  33. Weimer, D., Scholz-Reiter, B., Shpitalni, M.: Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Ann. Manuf. Technol. 65, 417–420 (2016). https://doi.org/10.1016/j.cirp.2016.04.072

    Article  Google Scholar 

  34. Racki, D., Tomazevic, D., Skocaj, D.: A Compact Convolutional Neural Network for Textured Surface Anomaly Detection. Proceedings – 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018-January, 1331–1339 (2018). (2018). https://doi.org/10.1109/WACV.2018.00150

  35. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  36. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, Real-Time object detection. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2016–December, 779–788 (2015)

    Google Scholar 

  37. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single shot MultiBox detector. Lecture Notes Comput. Sci. (Including Subser. Lecture Notes Artif. Intell. Lecture Notes Bioinformatics). 9905 LNCS, 21–37 (2015). https://doi.org/10.1007/978-3-319-46448-0_2

    Article  Google Scholar 

  38. Mujeeb, A., Dai, W., Erdt, M., Sourin, A.: Unsupervised surface defect detection using deep autoencoders and data augmentation. In: Proceedings – 2018 International Conference on Cyberworlds, CW 2018. pp. 391–398. Institute of Electrical and Electronics Engineers Inc. (2018)

  39. Yuan, H., Chen, H., Liu, S., Lin, J., Luo, X.: A deep convolutional neural network for detection of rail surface defect. 2019 IEEE Veh. Power Propuls. Conf. VPPC 2019 - Proc. (2019). https://doi.org/10.1109/VPPC46532.2019.8952236

    Article  Google Scholar 

  40. Cha, Y.J., Choi, W., Suh, G., Mahmoudkhani, S., Büyüköztürk, O.: Autonomous Structural Visual Inspection using region-based deep learning for detecting multiple damage types. Computer-Aided Civ. Infrastruct. Eng. 33, 731–747 (2018). https://doi.org/10.1111/mice.12334

    Article  Google Scholar 

  41. Li, J., Su, Z., Geng, J., Yin, Y.: Real-Time Detection of Steel Strip Surface Defects Based on Improved YOLO Detection Network. IFAC-PapersOnLine. 51, 76–81 (2018). https://doi.org/10.1016/j.ifacol.2018.09.412

    Article  Google Scholar 

  42. Li, Y., Huang, H., Xie, Q., Yao, L., Chen, Q.: Research on a surface defect detection Algorithm based on MobileNet-SSD. Appl. Sci. 8, 1678 (2018). https://doi.org/10.3390/app8091678

    Article  Google Scholar 

  43. Huang, Y., Qiu, C., Yuan, K.: Surface defect saliency of magnetic tile. Vis. Comput. 36, 85–96 (2020). https://doi.org/10.1007/s00371-018-1588-5

    Article  Google Scholar 

  44. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention Is All You Need. Adv Neural Inf Process Syst. 2017-December, 5999–6009 Accessed: Nov. 27, 2021. [Online]. Available: (2017). https://arxiv.org/abs/1706.03762v5

  45. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference. 1, 4171–4186 Accessed: Jan. 27, 2022. [Online]. Available: https://arxiv.org/abs/1810.04805v2 (2018)

  46. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H., Ai, F.: Training data-efficient image transformers & distillation through attention. (2020), Accessed: Jan. 27, 2022. [Online]. Available: https://arxiv.org/abs/2012.12877v2

  47. Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. Accessed: Jan. 27, 2022. [Online]. Available: https://arxiv.org/abs/2102.12122v2 (2021)

  48. Farahani, M., Gharachorloo, M., Farahani, M., Manthouri, M.: ParsBERT: Transformer-based Model for Persian Language understanding. Neural Process. Lett. 53, 3831–3847 (2021). https://doi.org/10.1007/S11063-021-10528-4/TABLES/9

    Article  Google Scholar 

  49. Xu, G., Wu, X., Zhang, X., He, X.: LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation. Accessed: Jan. 06, 2022. [Online]. Available: (2021). https://arxiv.org/abs/2107.08623v1

  50. Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. (2021), Accessed: Oct. 12, 2021. [Online]. Available: http://arxiv.org/abs/2105.05537

  51. Firat, H., Hanbay, D.: Classification of Hyperspectral Images Using 3D CNN Based ResNet50. 2021 29th Signal Processing and Communications Applications Conference (SIU). 1–4 (2021). https://doi.org/10.1109/SIU53274.2021.9477899

  52. Baheti, B., Innani, S., Gajre, S., Talbar, S.: Eff-UNet: A Novel Architecture for Semantic Segmentation in Unstructured Environment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 358–359 (2020)

  53. Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: MVTEC ad-A comprehensive real-world dataset for unsupervised anomaly detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 9584–9592. IEEE Computer Society (2019)

  54. Silvestre-Blanes, J., Albero-Albero, T., Miralles, I., Pérez-Llorens, R., Moreno, J.: A Public Fabric Database for Defect Detection Methods and results. Autex Res. J. Vol. 19 (2019). https://doi.org/10.2478/aut-2019-0035

  55. Wieler, M., Hahn, T.: Weakly Supervised Learning for Industrial Optical Inspection | Heidelberg Collaboratory for Image Processing (HCI), https://hci.iwr.uni-heidelberg.de/content/weakly-supervised-learning-industrial-optical-inspection

  56. Oktay, O., Schlemper, J., Folgoc, L., le, Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., Glocker, B., Rueckert, D.: Attention U-Net: Learning Where to Look for the Pancreas. Accessed: Oct. 20, 2021. [Online]. Available: https://arxiv.org/abs/1804.03999v3 (2018)

  57. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature Pyramid Networks for Object Detection. Accessed: May 05, 2021. [Online]. Available: (2016). http://arxiv.org/abs/1612.03144

  58. Chaurasia, A., Culurciello, E.: LinkNet: Exploiting encoder representations for efficient semantic segmentation. 2017 IEEE Visual Communications and Image Processing, VCIP 2018-January, 1–4 (2018). (2017). https://doi.org/10.1109/VCIP.2017.8305148

  59. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. 2017-January, 6230–6239 (2017). https://doi.org/10.1109/CVPR.2017.660

  60. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 234–241. Springer (2015)

  61. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017). https://doi.org/10.1109/TPAMI.2016.2644615

    Article  Google Scholar 

Download references

Funding

This work was supported by the Inonu University Scientific Research Projects Coordination [Grant Number FDK-2021-2725].

Author information

Authors and Affiliations

Authors

Contributions

All authors read and approved the final manuscript. All authors’ individual contributions are as follows: Hüseyin Uzen: Methodology, Software, Writing- Original Draft Preparation, Visualization. Muammer Turkoglu: Discussed the results, Writing - Original Draft Preparation, Validation, Formal analysis. Dursun Ozturk: Reviewing and Editing, Validation, Supervision. Davut HANBAY: Supervision, Validation, Formal analysis.

Corresponding author

Correspondence to Hüseyin Üzen.

Ethics declarations

Ethical approval

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Üzen, H., Turkoglu, M., Ozturk, D. et al. A novel hybrid attention gate based on vision transformer for the detection of surface defects. SIViP 18, 6835–6851 (2024). https://doi.org/10.1007/s11760-024-03355-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-024-03355-2

Keywords

Navigation