Skip to main content
Log in

Detection of irregular small defects on metal base surface of infrared laser diode based on deep learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript


The rapid development of deep learning has promoted the research progress in the field of visual object detection. In metal surface irregular small defect object detection, the traditional metal surface defect detection methods were not effective in detecting small and irregular defects. Therefore, how to improve the detection accuracy of small irregular defects on metal surfaces is a hot and difficult research problem. In this study, we propose a deep learning-based method for detecting small irregular defects on metal surfaces. Firstly, we make our own infrared laser diode metal base surface defect dataset to fill the gap of the dataset. Secondly, a new shallow feature extraction layer based on You Only Look Once (YOLO) v5s (an improved YOLO model) is designed for detecting irregular small object defects in the metal surface defect dataset. Thirdly, an attention mechanism was introduced into the network model to enhance the network’s ability to extract features of small target defects. The improved model was trained and evaluated on the laser diode metal base surface defect detection dataset, and the results showed that the accuracy of the improved algorithm improved by 3.8% over the original detection algorithm. The detection accuracy also achieves very significant results when compared with other excellent object detection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The ILS-MB dataset supporting the results of this study is used only under the license of the current study, so the data are not publicly available.


  1. Bao Y, Song K, Liu J, Wang Y, Yan Y, Han Y, Li X (2021) Triplet-graph reasoning network for few-shot metal generic surface defect segmentation. IEEE Trans Instrum Meas 70:1–11

    Google Scholar 

  2. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection. In: arXiv: Computer Vision and Pattern Recognition (CVPR) 17(9):198–215

    Google Scholar 

  3. Cheng-Yang F, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: deconvolutional single shot detector. In: arXiv: Computer Vision and Pattern Recognition (CVPR), arXiv:1701.06659

  4. Chorowski J, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: Neural Information Processing Systems (NIPS) 28(1):577–585

    Google Scholar 

  5. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    Article  Google Scholar 

  6. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks neural information processing systems (NIPS), pp 379–387

    Google Scholar 

  7. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition (CVPR), pp 886–893

    Google Scholar 

  8. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88:303–338

    Article  Google Scholar 

  9. Felzenszwalb PF, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645

    Article  PubMed  Google Scholar 

  10. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139.

  11. Girshick R (2015) Fast R-CNN. In: International Conference on Computer Vision (ICCV), pp 1440–1448. arXiv:1504.08083

    Google Scholar 

  12. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR), pp 580–587. arXiv:1311.2524

  13. Guo Z, Zhang L, Zhang D (2010) A completed modeling of local binary pattern operator for texture classification. IEEE Trans Image Process 19:1657–1663

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  14. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916

    Article  PubMed  Google Scholar 

  15. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: International Conference on Computer Vision (ICCV), pp 2961–2969. arXiv:1703.06870

    Google Scholar 

  16. He Y, Song K, Meng Q, Yan Y (2020) An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans Instrum Meas 69(4):1493–1504

    Article  ADS  Google Scholar 

  17. Hu J, Shen L, Albanie S, Sun G, Wu E (2018) Squeeze-and-excitation networks. In: Computer Vision and Pattern Recognition (CVPR) 42(8):2011–2023

  18. Huang S, Mengxing Huang Y, Zhang JC, Bhatti U (2020) Medical image segmentation using deep learning with feature enhancement. IET Image Process 14(5):3324–3332

    Article  Google Scholar 

  19. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15) 37:448–456

    Google Scholar 

  20. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20:1254–1259

    Article  Google Scholar 

  21. Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. arXiv:1902.07296

  22. Kong T, Sun F, Huang W, Liu H (2018) Deep feature pyramid reconfiguration for object detection. In: European Conference on Computer Vision (ECCV), pp 172–188. arXiv:1808.07993

  23. Krizhevsky A Hinton G (2009) Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 1(4)

  24. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Lawrence Zitnick C (2014) Microsoft COCO: common objects in context. In: European Conference on Computer Vision (ECCV) 8693:740–755

    Google Scholar 

  25. Lin G, Milan A, Shen C, Reid I (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR) 1(2):5168–5177

  26. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Computer Vision and Pattern Recognition (CVPR), pp 936–944. arXiv:1612.03144

  27. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Cheng-Yang F, Berg AC (2016) SSD: single shot MultiBox detector. In: European Conference on Computer Vision 9905:21–37

  28. Liu S, Lu Q, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Computer Vision and Pattern Recognition (CVPR), no.116: 8759–8768

  29. Lowe DG (1999) Object recognition from local scale-invariant features. In: International Conference on Computer Vision (ICCV) 2:1150–1157

    Google Scholar 

  30. Papageorgiou C, Oren M, Poggio T (1998) A general framework for object detection. In: International Conference on Computer Vision (ICCV) 5(2):555–562

    Google Scholar 

  31. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Computer Vision and Pattern Recognition (CVPR), pp 6517–6525. arXiv.1612.08242

    Google Scholar 

  32. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. In: arXiv: Computer Vision and Pattern Recognition (CVPR), pp 1–6. arXiv:1804.02767

    Google Scholar 

  33. Redmon J, Divvala SK, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition (CVPR), pp 779–788.

    Google Scholar 

  34. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NIPS) 28:91–99

    Google Scholar 

  35. Ronald A. Rensink (2000) The dynamic representation of scenes visual cognition 7:17-42

    Google Scholar 

  36. Russakovsky O, Deng J, Hao S, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115:211–252

    Article  MathSciNet  Google Scholar 

  37. Sean B, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. Computer vision and pattern recognition (CVPR).

  38. Sermanet P, Kavukcuoglu K,Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Computer Vision and Pattern Recognition (CVPR), pp 3626–3533. arXiv:1212.0142

    Google Scholar 

  39. Simard PY, Steinkraus DW, Platt J (2003) Best practices for convolutional neural networks applied to visual document analysis. In: International conference on document analysis and recognition. IEEE Computer Society, Los Alamitos, 3:958–962

    Google Scholar 

  40. Song K, Yan Y (2013) A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl Surf Sci 285:858–864

    Article  ADS  CAS  Google Scholar 

  41. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Neural Information Processing Systems (NIPS) 27:3104–3112

    Google Scholar 

  42. van de Sande KEA, Uijlings J, Gevers T, Smeulders AWM (2011) Segmentation as selective search for object recognition. In: International Conference on Computer Vision (ICCV), pp 1879–1886.

  43. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Neural Information Processing systems (NIPS) 30:5998–6008

    Google Scholar 

  44. Wang C-Y, Liao H-YM, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) CSPNet: a new backbone that can enhance learning capability of CNN. In: Computer Vision and Pattern Recognition (CVPR), pp 390–391. arXiv:1911.11929

  45. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European Conference on Computer Vision (ECCV) 8689:818–833

    Google Scholar 

  46. Zhu L, Geng X, Li Z, Liu C (2021) Improving YOLOv5 with attention mechanism for detecting boulders from planetary images. Remote Sens 13:3776

    Article  ADS  Google Scholar 

Download references


The paper work was Supported by Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology (FMZ201901, and the National Natural Science Foundation of China “Research on bionic chewing robot for physical property detection and evaluation of food materials” (51375209).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jinghu Yu.

Ethics declarations

Conflict of interest

The authors declared that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Liu, J., Zhou, X. et al. Detection of irregular small defects on metal base surface of infrared laser diode based on deep learning. Multimed Tools Appl 83, 19181–19197 (2024).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

