A novel hybrid attention gate based on vision transformer for the detection of surface defects

Üzen, Hüseyin; Turkoglu, Muammer; Ozturk, Dursun; Hanbay, Davut

doi:10.1007/s11760-024-03355-2

A novel hybrid attention gate based on vision transformer for the detection of surface defects

Original Paper
Published: 17 June 2024

Volume 18, pages 6835–6851, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Hüseyin Üzen¹,
Muammer Turkoglu²,
Dursun Ozturk³ &
…
Davut Hanbay⁴

311 Accesses
Explore all metrics

Abstract

Many advanced models have been proposed for automatic surface defect inspection. Although CNN-based methods have achieved superior performance among these models, it is limited to extracting global semantic details due to the locality of the convolution operation. In addition, global semantic details can achieve high success for detecting surface defects. Recently, inspired by the success of Transformer, which has powerful abilities to model global semantic details with global self-attention mechanisms, some researchers have started to apply Transformer-based methods in many computer-vision challenges. However, as many researchers notice, transformers lose spatial details while extracting semantic features. To alleviate these problems, in this paper, a transformer-based Hybrid Attention Gate (HAG) model is proposed to extract both global semantic features and spatial features. The HAG model consists of Transformer (Trans), channel Squeeze-spatial Excitation (sSE), and merge process. The Trans model extracts global semantic features and the sSE extracts spatial features. The merge process which consists of different versions such as concat, add, max, and mul allows these two different models to be combined effectively. Finally, four versions based on HAG-Feature Fusion Network (HAG-FFN) were developed using the proposed HAG model for the detection of surface defects. The four different datasets were used to test the performance of the proposed HAG-FFN versions. In the experimental studies, the proposed model produced 83.83%, 79.34%, 76.53%, and 81.78% mIoU scores for MT, MVTec-Texture, DAGM, and AITEX datasets. These results show that the proposed HAG^max-FFN model provided better performance than the state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A dual-structure attention-based multi-level feature fusion network for automatic surface defect detection

Article 01 July 2023

TAFFNet: Two-Stage Attention-Based Feature Fusion Network for Surface Defect Detection

Article 02 August 2022

A novel micro-defect classification system based on attention enhancement

Article 07 January 2023

Data availability

The code of this study is available on request from the corresponding author.

References

Cao, J., Yang, G., Yang, X.: A pixel-level segmentation convolutional neural network based on Deep Feature Fusion for Surface defect detection. IEEE Trans. Instrum. Meas. 70 (2021). https://doi.org/10.1109/TIM.2020.3033726
Hanbay, K., Golgiyaz, S., Talu, M.F.: Real time fabric defect detection system on Matlab and C++/Opencv platforms. In: 2017 International Artificial Intelligence and Data Processing Symposium (IDAP). pp. 1–8. IEEE, Malatya (2017)
Silvén, O., Niskanen, M., Kauppinen, H.: Wood inspection with non-supervised clustering. Mach. Vis. Appl. 13, 275–285 (2003). https://doi.org/10.1007/s00138-002-0084-z
Article Google Scholar
Aslam, Y., Santhi, N., Ramasamy, N., Ramar, K.: Localization and segmentation of metal cracks using deep learning. J. Ambient Intell. Humaniz. Comput. 1, 3 (2020). https://doi.org/10.1007/s12652-020-01803-8
Article Google Scholar
Hocenski, Ž., Vasilić, S., Hocenski, V.: Improved canny edge detector in ceramic tiles defect detection. In: IECON Proceedings (Industrial Electronics Conference). pp. 3328–3331. IEEE Computer Society (2006)
Jawahar, M., Jani Anbarasi, L., Graceline Jasmine, S., Narendra, M., Venba, R., Karthik, V.: A Machine Learning-Based Multi-feature Extraction Method for Leather Defect Classification. Lecture Notes in Networks and Systems. 173 LNNS, 189–202 (2021). https://doi.org/10.1007/978-981-33-4305-4_15
Ren, R., Hung, T., Tan, K.C.: A generic deep-learning-based Approach for Automated Surface Inspection. IEEE Trans. Cybern. 48, 929–940 (2018). https://doi.org/10.1109/TCYB.2017.2668395
Article Google Scholar
Dong, H., Song, K., He, Y., Xu, J., Yan, Y., Meng, Q.: PGA-Net: Pyramid feature Fusion and global context attention network for automated surface defect detection. IEEE Trans. Industr Inf. 16, 7448–7458 (2020). https://doi.org/10.1109/TII.2019.2958826
Article Google Scholar
Qiu, L., Wu, X., Yu, Z.: A high-efficiency fully Convolutional Networks for Pixel-wise Surface defect detection. IEEE Access. 7, 15884–15893 (2019). https://doi.org/10.1109/ACCESS.2019.2894420
Article Google Scholar
Hanbay, K., Talu, M.F., Özgüven, Ö.F.: Fabric defect detection systems and methods—A systematic literature review. Optik (Stuttg). 127, 11960–11973 (2016). https://doi.org/10.1016/j.ijleo.2016.09.110
Article Google Scholar
Bhatt, P.M., Malhan, R.K., Rajendran, P., Shah, B.C., Thakar, S., Yoon, Y.J., Gupta, S.K.: Image-Based Surface Defect Detection Using Deep Learning: A Review, (2021)
Cao, G., Ruan, S., Peng, Y., Huang, S., Kwok, N.: Large-complex-surface defect detection by hybrid gradient threshold segmentation and image Registration. IEEE Access. 6, 36235–36246 (2018). https://doi.org/10.1109/ACCESS.2018.2842028
Article Google Scholar
Wakaf, Z., Jalab, H.A.: Defect detection based on extreme edge of defective region histogram. J. King Saud Univ. - Comput. Inform. Sci. 30, 33–40 (2018). https://doi.org/10.1016/j.jksuci.2016.11.001
Article Google Scholar
Cui, D., Xia, K.: Dimension reduction and defect recognition of Strip Surface defects based on Intelligent Information Processing. Arab. J. Sci. Eng. 2017 43(12), 43, 6729–6736 (2017). https://doi.org/10.1007/S13369-017-2825-3
Article Google Scholar
Makaremi, M., Razmjooy, N., Ramezani, M.: A new method for detecting texture defects based on modified local binary pattern. Signal. Image Video Process. 12, 1395–1401 (2018). https://doi.org/10.1007/s11760-018-1294-9
Article Google Scholar
Boroujeni, H.S., Charkari, N.M.: Robust moving shadow detection with hierarchical mixture of MLP experts. Signal. Image Video Process. 8, 1291–1305 (2014). https://doi.org/10.1007/s11760-012-0358-5
Article Google Scholar
Hamouche, K., Rasolofondraibe, L., Chiementin, X., Felkaoui, A.: Localization of defects in rolling element bearings by dynamic classification based on meta-analysis of indicators: Supervised real-time OPTICS method. Arab. J. Sci. Eng. 1–21 (2022). https://doi.org/10.1007/S13369-021-06528-0/TABLES/5
Uzen, H., Turkoglu, M., Hanbay, D.: Texture defect classification with multiple pooling and filter ensemble based on deep neural network. Expert Syst. Appl. 175, 114838 (2021). https://doi.org/10.1016/j.eswa.2021.114838
Article Google Scholar
Firat, H., Hanbay, D.: 4CF-Net: New 3D convolutional neural network for spectral spatial classification of hyperspectral remote sensing images. J. Fac. Eng. Archit. Gazi Univ. 37, 439–454 (2021)
Google Scholar
Turkoglu, M., Hanbay, D.: Plant recognition system based on deep features and color-LBP method. In: 27th Signal Processing and Communications Applications Conference, SIU 2019. Institute of Electrical and Electronics Engineers Inc. (2019)
Tan, M., Le, Q.: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. 36th International Conference on Machine Learning, ICML 2019. 2019-June, 10691–10700 Accessed: Mar. 04, 2021. [Online]. Available: (2019). http://arxiv.org/abs/1905.11946
Tabernik, D., Šela, S., Skvarč, J., Skočaj, D.: Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 31, 759–776 (2020). https://doi.org/10.1007/S10845-019-01476-X/FIGURES/17
Article Google Scholar
Wang, J., Lv, P., Wang, H., Shi, C.: SAR-U-Net: Squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver segmentation in computed tomography. Comput. Methods Programs Biomed. 208, 106268 (2021). https://doi.org/10.1016/J.CMPB.2021.106268
Article Google Scholar
Roy, A.G., Navab, N., Wachinger, C.: Recalibrating fully convolutional networks with spatial and Channel Squeeze and excitation blocks. IEEE Trans. Med. Imaging. 38, 540–549 (2019). https://doi.org/10.1109/TMI.2018.2867261
Article Google Scholar
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-Excitation Networks. IEEE Trans Pattern Anal Mach Intell. 42, 2011–2023 Accessed: Jun. 18, 2021. [Online]. Available: (2017). http://arxiv.org/abs/1709.01507
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Accessed: Oct. 18, 2021. [Online]. Available: https://arxiv.org/abs/2010.11929v2 (2020)
Wang, W., Su, C.: Automatic classification of Reinforced concrete bridge defects using the Hybrid Network. Arab. J. Sci. Eng. 1–11 (2022). https://doi.org/10.1007/S13369-021-06474-X/FIGURES/8
Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. Accessed: Oct. 12, 2021. [Online]. Available: (2021). https://arxiv.org/abs/2102.04306v1
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. (2021), Accessed: Oct. 12, 2021. [Online]. Available: http://arxiv.org/abs/2103.14030
Lin, A., Chen, B., Xu, J., Zhang, Z., Lu, G.: DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation. Accessed: Oct. 12, 2021. [Online]. Available: (2021). https://arxiv.org/abs/2106.06716v1
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: Inverted Residuals and Linear Bottlenecks. IEEE Computer Society Accessed: Mar. 04, 2021. [Online]. Available: (2018). http://arxiv.org/abs/1801.04381
Masci, J., Meier, U., Ciresan, D., Schmidhuber, J., Fricout, G.: Steel defect classification with Max-Pooling Convolutional Neural Networks. Proceedings of the International Joint Conference on Neural Networks. (2012). https://doi.org/10.1109/IJCNN.2012.6252468
Weimer, D., Scholz-Reiter, B., Shpitalni, M.: Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Ann. Manuf. Technol. 65, 417–420 (2016). https://doi.org/10.1016/j.cirp.2016.04.072
Article Google Scholar
Racki, D., Tomazevic, D., Skocaj, D.: A Compact Convolutional Neural Network for Textured Surface Anomaly Detection. Proceedings – 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018-January, 1331–1339 (2018). (2018). https://doi.org/10.1109/WACV.2018.00150
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, Real-Time object detection. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2016–December, 779–788 (2015)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single shot MultiBox detector. Lecture Notes Comput. Sci. (Including Subser. Lecture Notes Artif. Intell. Lecture Notes Bioinformatics). 9905 LNCS, 21–37 (2015). https://doi.org/10.1007/978-3-319-46448-0_2
Article Google Scholar
Mujeeb, A., Dai, W., Erdt, M., Sourin, A.: Unsupervised surface defect detection using deep autoencoders and data augmentation. In: Proceedings – 2018 International Conference on Cyberworlds, CW 2018. pp. 391–398. Institute of Electrical and Electronics Engineers Inc. (2018)
Yuan, H., Chen, H., Liu, S., Lin, J., Luo, X.: A deep convolutional neural network for detection of rail surface defect. 2019 IEEE Veh. Power Propuls. Conf. VPPC 2019 - Proc. (2019). https://doi.org/10.1109/VPPC46532.2019.8952236
Article Google Scholar
Cha, Y.J., Choi, W., Suh, G., Mahmoudkhani, S., Büyüköztürk, O.: Autonomous Structural Visual Inspection using region-based deep learning for detecting multiple damage types. Computer-Aided Civ. Infrastruct. Eng. 33, 731–747 (2018). https://doi.org/10.1111/mice.12334
Article Google Scholar
Li, J., Su, Z., Geng, J., Yin, Y.: Real-Time Detection of Steel Strip Surface Defects Based on Improved YOLO Detection Network. IFAC-PapersOnLine. 51, 76–81 (2018). https://doi.org/10.1016/j.ifacol.2018.09.412
Article Google Scholar
Li, Y., Huang, H., Xie, Q., Yao, L., Chen, Q.: Research on a surface defect detection Algorithm based on MobileNet-SSD. Appl. Sci. 8, 1678 (2018). https://doi.org/10.3390/app8091678
Article Google Scholar
Huang, Y., Qiu, C., Yuan, K.: Surface defect saliency of magnetic tile. Vis. Comput. 36, 85–96 (2020). https://doi.org/10.1007/s00371-018-1588-5
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention Is All You Need. Adv Neural Inf Process Syst. 2017-December, 5999–6009 Accessed: Nov. 27, 2021. [Online]. Available: (2017). https://arxiv.org/abs/1706.03762v5
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference. 1, 4171–4186 Accessed: Jan. 27, 2022. [Online]. Available: https://arxiv.org/abs/1810.04805v2 (2018)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H., Ai, F.: Training data-efficient image transformers & distillation through attention. (2020), Accessed: Jan. 27, 2022. [Online]. Available: https://arxiv.org/abs/2012.12877v2
Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. Accessed: Jan. 27, 2022. [Online]. Available: https://arxiv.org/abs/2102.12122v2 (2021)
Farahani, M., Gharachorloo, M., Farahani, M., Manthouri, M.: ParsBERT: Transformer-based Model for Persian Language understanding. Neural Process. Lett. 53, 3831–3847 (2021). https://doi.org/10.1007/S11063-021-10528-4/TABLES/9
Article Google Scholar
Xu, G., Wu, X., Zhang, X., He, X.: LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation. Accessed: Jan. 06, 2022. [Online]. Available: (2021). https://arxiv.org/abs/2107.08623v1
Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. (2021), Accessed: Oct. 12, 2021. [Online]. Available: http://arxiv.org/abs/2105.05537
Firat, H., Hanbay, D.: Classification of Hyperspectral Images Using 3D CNN Based ResNet50. 2021 29th Signal Processing and Communications Applications Conference (SIU). 1–4 (2021). https://doi.org/10.1109/SIU53274.2021.9477899
Baheti, B., Innani, S., Gajre, S., Talbar, S.: Eff-UNet: A Novel Architecture for Semantic Segmentation in Unstructured Environment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 358–359 (2020)
Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: MVTEC ad-A comprehensive real-world dataset for unsupervised anomaly detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 9584–9592. IEEE Computer Society (2019)
Silvestre-Blanes, J., Albero-Albero, T., Miralles, I., Pérez-Llorens, R., Moreno, J.: A Public Fabric Database for Defect Detection Methods and results. Autex Res. J. Vol. 19 (2019). https://doi.org/10.2478/aut-2019-0035
Wieler, M., Hahn, T.: Weakly Supervised Learning for Industrial Optical Inspection | Heidelberg Collaboratory for Image Processing (HCI), https://hci.iwr.uni-heidelberg.de/content/weakly-supervised-learning-industrial-optical-inspection
Oktay, O., Schlemper, J., Folgoc, L., le, Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., Glocker, B., Rueckert, D.: Attention U-Net: Learning Where to Look for the Pancreas. Accessed: Oct. 20, 2021. [Online]. Available: https://arxiv.org/abs/1804.03999v3 (2018)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature Pyramid Networks for Object Detection. Accessed: May 05, 2021. [Online]. Available: (2016). http://arxiv.org/abs/1612.03144
Chaurasia, A., Culurciello, E.: LinkNet: Exploiting encoder representations for efficient semantic segmentation. 2017 IEEE Visual Communications and Image Processing, VCIP 2018-January, 1–4 (2018). (2017). https://doi.org/10.1109/VCIP.2017.8305148
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. 2017-January, 6230–6239 (2017). https://doi.org/10.1109/CVPR.2017.660
Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 234–241. Springer (2015)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017). https://doi.org/10.1109/TPAMI.2016.2644615
Article Google Scholar

Download references

Funding

This work was supported by the Inonu University Scientific Research Projects Coordination [Grant Number FDK-2021-2725].

Author information

Authors and Affiliations

Department of Computer Engineering, Bingol University, Bingol, 12000, Turkey
Hüseyin Üzen
Department of Software Engineering, Samsun University, Samsun, 55000, Turkey
Muammer Turkoglu
Department of Electrical-Electronics Engineering, Bingol University, Bingol, 12000, Turkey
Dursun Ozturk
Department of Computer Engineering, Inonu University, Malatya, 44000, Turkey
Davut Hanbay

Authors

Hüseyin Üzen
View author publications
You can also search for this author inPubMed Google Scholar
Muammer Turkoglu
View author publications
You can also search for this author inPubMed Google Scholar
Dursun Ozturk
View author publications
You can also search for this author inPubMed Google Scholar
Davut Hanbay
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors read and approved the final manuscript. All authors’ individual contributions are as follows: Hüseyin Uzen: Methodology, Software, Writing- Original Draft Preparation, Visualization. Muammer Turkoglu: Discussed the results, Writing - Original Draft Preparation, Validation, Formal analysis. Dursun Ozturk: Reviewing and Editing, Validation, Supervision. Davut HANBAY: Supervision, Validation, Formal analysis.

Corresponding author

Correspondence to Hüseyin Üzen.

Ethics declarations

Ethical approval

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Üzen, H., Turkoglu, M., Ozturk, D. et al. A novel hybrid attention gate based on vision transformer for the detection of surface defects. SIViP 18, 6835–6851 (2024). https://doi.org/10.1007/s11760-024-03355-2

Download citation

Received: 09 September 2022
Revised: 12 January 2024
Accepted: 03 June 2024
Published: 17 June 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s11760-024-03355-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel hybrid attention gate based on vision transformer for the detection of surface defects

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A dual-structure attention-based multi-level feature fusion network for automatic surface defect detection

TAFFNet: Two-Stage Attention-Based Feature Fusion Network for Surface Defect Detection

A novel micro-defect classification system based on attention enhancement

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Competing interests

Additional information

Publisher’s Note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now