Abstract
Many advanced models have been proposed for automatic surface defect inspection. Although CNN-based methods have achieved superior performance among these models, it is limited to extracting global semantic details due to the locality of the convolution operation. In addition, global semantic details can achieve high success for detecting surface defects. Recently, inspired by the success of Transformer, which has powerful abilities to model global semantic details with global self-attention mechanisms, some researchers have started to apply Transformer-based methods in many computer-vision challenges. However, as many researchers notice, transformers lose spatial details while extracting semantic features. To alleviate these problems, in this paper, a transformer-based Hybrid Attention Gate (HAG) model is proposed to extract both global semantic features and spatial features. The HAG model consists of Transformer (Trans), channel Squeeze-spatial Excitation (sSE), and merge process. The Trans model extracts global semantic features and the sSE extracts spatial features. The merge process which consists of different versions such as concat, add, max, and mul allows these two different models to be combined effectively. Finally, four versions based on HAG-Feature Fusion Network (HAG-FFN) were developed using the proposed HAG model for the detection of surface defects. The four different datasets were used to test the performance of the proposed HAG-FFN versions. In the experimental studies, the proposed model produced 83.83%, 79.34%, 76.53%, and 81.78% mIoU scores for MT, MVTec-Texture, DAGM, and AITEX datasets. These results show that the proposed HAGmax-FFN model provided better performance than the state-of-the-art models.
Similar content being viewed by others
Data availability
The code of this study is available on request from the corresponding author.
References
Cao, J., Yang, G., Yang, X.: A pixel-level segmentation convolutional neural network based on Deep Feature Fusion for Surface defect detection. IEEE Trans. Instrum. Meas. 70 (2021). https://doi.org/10.1109/TIM.2020.3033726
Hanbay, K., Golgiyaz, S., Talu, M.F.: Real time fabric defect detection system on Matlab and C++/Opencv platforms. In: 2017 International Artificial Intelligence and Data Processing Symposium (IDAP). pp. 1–8. IEEE, Malatya (2017)
Silvén, O., Niskanen, M., Kauppinen, H.: Wood inspection with non-supervised clustering. Mach. Vis. Appl. 13, 275–285 (2003). https://doi.org/10.1007/s00138-002-0084-z
Aslam, Y., Santhi, N., Ramasamy, N., Ramar, K.: Localization and segmentation of metal cracks using deep learning. J. Ambient Intell. Humaniz. Comput. 1, 3 (2020). https://doi.org/10.1007/s12652-020-01803-8
Hocenski, Ž., Vasilić, S., Hocenski, V.: Improved canny edge detector in ceramic tiles defect detection. In: IECON Proceedings (Industrial Electronics Conference). pp. 3328–3331. IEEE Computer Society (2006)
Jawahar, M., Jani Anbarasi, L., Graceline Jasmine, S., Narendra, M., Venba, R., Karthik, V.: A Machine Learning-Based Multi-feature Extraction Method for Leather Defect Classification. Lecture Notes in Networks and Systems. 173 LNNS, 189–202 (2021). https://doi.org/10.1007/978-981-33-4305-4_15
Ren, R., Hung, T., Tan, K.C.: A generic deep-learning-based Approach for Automated Surface Inspection. IEEE Trans. Cybern. 48, 929–940 (2018). https://doi.org/10.1109/TCYB.2017.2668395
Dong, H., Song, K., He, Y., Xu, J., Yan, Y., Meng, Q.: PGA-Net: Pyramid feature Fusion and global context attention network for automated surface defect detection. IEEE Trans. Industr Inf. 16, 7448–7458 (2020). https://doi.org/10.1109/TII.2019.2958826
Qiu, L., Wu, X., Yu, Z.: A high-efficiency fully Convolutional Networks for Pixel-wise Surface defect detection. IEEE Access. 7, 15884–15893 (2019). https://doi.org/10.1109/ACCESS.2019.2894420
Hanbay, K., Talu, M.F., Özgüven, Ö.F.: Fabric defect detection systems and methods—A systematic literature review. Optik (Stuttg). 127, 11960–11973 (2016). https://doi.org/10.1016/j.ijleo.2016.09.110
Bhatt, P.M., Malhan, R.K., Rajendran, P., Shah, B.C., Thakar, S., Yoon, Y.J., Gupta, S.K.: Image-Based Surface Defect Detection Using Deep Learning: A Review, (2021)
Cao, G., Ruan, S., Peng, Y., Huang, S., Kwok, N.: Large-complex-surface defect detection by hybrid gradient threshold segmentation and image Registration. IEEE Access. 6, 36235–36246 (2018). https://doi.org/10.1109/ACCESS.2018.2842028
Wakaf, Z., Jalab, H.A.: Defect detection based on extreme edge of defective region histogram. J. King Saud Univ. - Comput. Inform. Sci. 30, 33–40 (2018). https://doi.org/10.1016/j.jksuci.2016.11.001
Cui, D., Xia, K.: Dimension reduction and defect recognition of Strip Surface defects based on Intelligent Information Processing. Arab. J. Sci. Eng. 2017 43(12), 43, 6729–6736 (2017). https://doi.org/10.1007/S13369-017-2825-3
Makaremi, M., Razmjooy, N., Ramezani, M.: A new method for detecting texture defects based on modified local binary pattern. Signal. Image Video Process. 12, 1395–1401 (2018). https://doi.org/10.1007/s11760-018-1294-9
Boroujeni, H.S., Charkari, N.M.: Robust moving shadow detection with hierarchical mixture of MLP experts. Signal. Image Video Process. 8, 1291–1305 (2014). https://doi.org/10.1007/s11760-012-0358-5
Hamouche, K., Rasolofondraibe, L., Chiementin, X., Felkaoui, A.: Localization of defects in rolling element bearings by dynamic classification based on meta-analysis of indicators: Supervised real-time OPTICS method. Arab. J. Sci. Eng. 1–21 (2022). https://doi.org/10.1007/S13369-021-06528-0/TABLES/5
Uzen, H., Turkoglu, M., Hanbay, D.: Texture defect classification with multiple pooling and filter ensemble based on deep neural network. Expert Syst. Appl. 175, 114838 (2021). https://doi.org/10.1016/j.eswa.2021.114838
Firat, H., Hanbay, D.: 4CF-Net: New 3D convolutional neural network for spectral spatial classification of hyperspectral remote sensing images. J. Fac. Eng. Archit. Gazi Univ. 37, 439–454 (2021)
Turkoglu, M., Hanbay, D.: Plant recognition system based on deep features and color-LBP method. In: 27th Signal Processing and Communications Applications Conference, SIU 2019. Institute of Electrical and Electronics Engineers Inc. (2019)
Tan, M., Le, Q.: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. 36th International Conference on Machine Learning, ICML 2019. 2019-June, 10691–10700 Accessed: Mar. 04, 2021. [Online]. Available: (2019). http://arxiv.org/abs/1905.11946
Tabernik, D., Šela, S., Skvarč, J., Skočaj, D.: Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 31, 759–776 (2020). https://doi.org/10.1007/S10845-019-01476-X/FIGURES/17
Wang, J., Lv, P., Wang, H., Shi, C.: SAR-U-Net: Squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver segmentation in computed tomography. Comput. Methods Programs Biomed. 208, 106268 (2021). https://doi.org/10.1016/J.CMPB.2021.106268
Roy, A.G., Navab, N., Wachinger, C.: Recalibrating fully convolutional networks with spatial and Channel Squeeze and excitation blocks. IEEE Trans. Med. Imaging. 38, 540–549 (2019). https://doi.org/10.1109/TMI.2018.2867261
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-Excitation Networks. IEEE Trans Pattern Anal Mach Intell. 42, 2011–2023 Accessed: Jun. 18, 2021. [Online]. Available: (2017). http://arxiv.org/abs/1709.01507
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Accessed: Oct. 18, 2021. [Online]. Available: https://arxiv.org/abs/2010.11929v2 (2020)
Wang, W., Su, C.: Automatic classification of Reinforced concrete bridge defects using the Hybrid Network. Arab. J. Sci. Eng. 1–11 (2022). https://doi.org/10.1007/S13369-021-06474-X/FIGURES/8
Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. Accessed: Oct. 12, 2021. [Online]. Available: (2021). https://arxiv.org/abs/2102.04306v1
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. (2021), Accessed: Oct. 12, 2021. [Online]. Available: http://arxiv.org/abs/2103.14030
Lin, A., Chen, B., Xu, J., Zhang, Z., Lu, G.: DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation. Accessed: Oct. 12, 2021. [Online]. Available: (2021). https://arxiv.org/abs/2106.06716v1
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: Inverted Residuals and Linear Bottlenecks. IEEE Computer Society Accessed: Mar. 04, 2021. [Online]. Available: (2018). http://arxiv.org/abs/1801.04381
Masci, J., Meier, U., Ciresan, D., Schmidhuber, J., Fricout, G.: Steel defect classification with Max-Pooling Convolutional Neural Networks. Proceedings of the International Joint Conference on Neural Networks. (2012). https://doi.org/10.1109/IJCNN.2012.6252468
Weimer, D., Scholz-Reiter, B., Shpitalni, M.: Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Ann. Manuf. Technol. 65, 417–420 (2016). https://doi.org/10.1016/j.cirp.2016.04.072
Racki, D., Tomazevic, D., Skocaj, D.: A Compact Convolutional Neural Network for Textured Surface Anomaly Detection. Proceedings – 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018-January, 1331–1339 (2018). (2018). https://doi.org/10.1109/WACV.2018.00150
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, Real-Time object detection. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2016–December, 779–788 (2015)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single shot MultiBox detector. Lecture Notes Comput. Sci. (Including Subser. Lecture Notes Artif. Intell. Lecture Notes Bioinformatics). 9905 LNCS, 21–37 (2015). https://doi.org/10.1007/978-3-319-46448-0_2
Mujeeb, A., Dai, W., Erdt, M., Sourin, A.: Unsupervised surface defect detection using deep autoencoders and data augmentation. In: Proceedings – 2018 International Conference on Cyberworlds, CW 2018. pp. 391–398. Institute of Electrical and Electronics Engineers Inc. (2018)
Yuan, H., Chen, H., Liu, S., Lin, J., Luo, X.: A deep convolutional neural network for detection of rail surface defect. 2019 IEEE Veh. Power Propuls. Conf. VPPC 2019 - Proc. (2019). https://doi.org/10.1109/VPPC46532.2019.8952236
Cha, Y.J., Choi, W., Suh, G., Mahmoudkhani, S., Büyüköztürk, O.: Autonomous Structural Visual Inspection using region-based deep learning for detecting multiple damage types. Computer-Aided Civ. Infrastruct. Eng. 33, 731–747 (2018). https://doi.org/10.1111/mice.12334
Li, J., Su, Z., Geng, J., Yin, Y.: Real-Time Detection of Steel Strip Surface Defects Based on Improved YOLO Detection Network. IFAC-PapersOnLine. 51, 76–81 (2018). https://doi.org/10.1016/j.ifacol.2018.09.412
Li, Y., Huang, H., Xie, Q., Yao, L., Chen, Q.: Research on a surface defect detection Algorithm based on MobileNet-SSD. Appl. Sci. 8, 1678 (2018). https://doi.org/10.3390/app8091678
Huang, Y., Qiu, C., Yuan, K.: Surface defect saliency of magnetic tile. Vis. Comput. 36, 85–96 (2020). https://doi.org/10.1007/s00371-018-1588-5
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention Is All You Need. Adv Neural Inf Process Syst. 2017-December, 5999–6009 Accessed: Nov. 27, 2021. [Online]. Available: (2017). https://arxiv.org/abs/1706.03762v5
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference. 1, 4171–4186 Accessed: Jan. 27, 2022. [Online]. Available: https://arxiv.org/abs/1810.04805v2 (2018)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H., Ai, F.: Training data-efficient image transformers & distillation through attention. (2020), Accessed: Jan. 27, 2022. [Online]. Available: https://arxiv.org/abs/2012.12877v2
Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. Accessed: Jan. 27, 2022. [Online]. Available: https://arxiv.org/abs/2102.12122v2 (2021)
Farahani, M., Gharachorloo, M., Farahani, M., Manthouri, M.: ParsBERT: Transformer-based Model for Persian Language understanding. Neural Process. Lett. 53, 3831–3847 (2021). https://doi.org/10.1007/S11063-021-10528-4/TABLES/9
Xu, G., Wu, X., Zhang, X., He, X.: LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation. Accessed: Jan. 06, 2022. [Online]. Available: (2021). https://arxiv.org/abs/2107.08623v1
Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. (2021), Accessed: Oct. 12, 2021. [Online]. Available: http://arxiv.org/abs/2105.05537
Firat, H., Hanbay, D.: Classification of Hyperspectral Images Using 3D CNN Based ResNet50. 2021 29th Signal Processing and Communications Applications Conference (SIU). 1–4 (2021). https://doi.org/10.1109/SIU53274.2021.9477899
Baheti, B., Innani, S., Gajre, S., Talbar, S.: Eff-UNet: A Novel Architecture for Semantic Segmentation in Unstructured Environment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 358–359 (2020)
Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: MVTEC ad-A comprehensive real-world dataset for unsupervised anomaly detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 9584–9592. IEEE Computer Society (2019)
Silvestre-Blanes, J., Albero-Albero, T., Miralles, I., Pérez-Llorens, R., Moreno, J.: A Public Fabric Database for Defect Detection Methods and results. Autex Res. J. Vol. 19 (2019). https://doi.org/10.2478/aut-2019-0035
Wieler, M., Hahn, T.: Weakly Supervised Learning for Industrial Optical Inspection | Heidelberg Collaboratory for Image Processing (HCI), https://hci.iwr.uni-heidelberg.de/content/weakly-supervised-learning-industrial-optical-inspection
Oktay, O., Schlemper, J., Folgoc, L., le, Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., Glocker, B., Rueckert, D.: Attention U-Net: Learning Where to Look for the Pancreas. Accessed: Oct. 20, 2021. [Online]. Available: https://arxiv.org/abs/1804.03999v3 (2018)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature Pyramid Networks for Object Detection. Accessed: May 05, 2021. [Online]. Available: (2016). http://arxiv.org/abs/1612.03144
Chaurasia, A., Culurciello, E.: LinkNet: Exploiting encoder representations for efficient semantic segmentation. 2017 IEEE Visual Communications and Image Processing, VCIP 2018-January, 1–4 (2018). (2017). https://doi.org/10.1109/VCIP.2017.8305148
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. 2017-January, 6230–6239 (2017). https://doi.org/10.1109/CVPR.2017.660
Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 234–241. Springer (2015)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017). https://doi.org/10.1109/TPAMI.2016.2644615
Funding
This work was supported by the Inonu University Scientific Research Projects Coordination [Grant Number FDK-2021-2725].
Author information
Authors and Affiliations
Contributions
All authors read and approved the final manuscript. All authors’ individual contributions are as follows: Hüseyin Uzen: Methodology, Software, Writing- Original Draft Preparation, Visualization. Muammer Turkoglu: Discussed the results, Writing - Original Draft Preparation, Validation, Formal analysis. Dursun Ozturk: Reviewing and Editing, Validation, Supervision. Davut HANBAY: Supervision, Validation, Formal analysis.
Corresponding author
Ethics declarations
Ethical approval
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Üzen, H., Turkoglu, M., Ozturk, D. et al. A novel hybrid attention gate based on vision transformer for the detection of surface defects. SIViP 18, 6835–6851 (2024). https://doi.org/10.1007/s11760-024-03355-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-024-03355-2