Abstract
The rapid development of deep learning has promoted the research progress in the field of visual object detection. In metal surface irregular small defect object detection, the traditional metal surface defect detection methods were not effective in detecting small and irregular defects. Therefore, how to improve the detection accuracy of small irregular defects on metal surfaces is a hot and difficult research problem. In this study, we propose a deep learning-based method for detecting small irregular defects on metal surfaces. Firstly, we make our own infrared laser diode metal base surface defect dataset to fill the gap of the dataset. Secondly, a new shallow feature extraction layer based on You Only Look Once (YOLO) v5s (an improved YOLO model) is designed for detecting irregular small object defects in the metal surface defect dataset. Thirdly, an attention mechanism was introduced into the network model to enhance the network’s ability to extract features of small target defects. The improved model was trained and evaluated on the laser diode metal base surface defect detection dataset, and the results showed that the accuracy of the improved algorithm improved by 3.8% over the original detection algorithm. The detection accuracy also achieves very significant results when compared with other excellent object detection algorithms.
Similar content being viewed by others
Data availability
The ILS-MB dataset supporting the results of this study is used only under the license of the current study, so the data are not publicly available.
References
Bao Y, Song K, Liu J, Wang Y, Yan Y, Han Y, Li X (2021) Triplet-graph reasoning network for few-shot metal generic surface defect segmentation. IEEE Trans Instrum Meas 70:1–11
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection. In: arXiv: Computer Vision and Pattern Recognition (CVPR) 17(9):198–215
Cheng-Yang F, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: deconvolutional single shot detector. In: arXiv: Computer Vision and Pattern Recognition (CVPR), arXiv:1701.06659
Chorowski J, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: Neural Information Processing Systems (NIPS) 28(1):577–585
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks neural information processing systems (NIPS), pp 379–387
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition (CVPR), pp 886–893
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88:303–338
Felzenszwalb PF, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139. https://doi.org/10.1006/jcss.1997.1504
Girshick R (2015) Fast R-CNN. In: International Conference on Computer Vision (ICCV), pp 1440–1448. arXiv:1504.08083
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR), pp 580–587. arXiv:1311.2524
Guo Z, Zhang L, Zhang D (2010) A completed modeling of local binary pattern operator for texture classification. IEEE Trans Image Process 19:1657–1663
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: International Conference on Computer Vision (ICCV), pp 2961–2969. arXiv:1703.06870
He Y, Song K, Meng Q, Yan Y (2020) An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans Instrum Meas 69(4):1493–1504
Hu J, Shen L, Albanie S, Sun G, Wu E (2018) Squeeze-and-excitation networks. In: Computer Vision and Pattern Recognition (CVPR) 42(8):2011–2023
Huang S, Mengxing Huang Y, Zhang JC, Bhatti U (2020) Medical image segmentation using deep learning with feature enhancement. IET Image Process 14(5):3324–3332
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15) 37:448–456
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20:1254–1259
Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. arXiv:1902.07296
Kong T, Sun F, Huang W, Liu H (2018) Deep feature pyramid reconfiguration for object detection. In: European Conference on Computer Vision (ECCV), pp 172–188. arXiv:1808.07993
Krizhevsky A Hinton G (2009) Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 1(4)
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Lawrence Zitnick C (2014) Microsoft COCO: common objects in context. In: European Conference on Computer Vision (ECCV) 8693:740–755
Lin G, Milan A, Shen C, Reid I (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR) 1(2):5168–5177
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Computer Vision and Pattern Recognition (CVPR), pp 936–944. arXiv:1612.03144
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Cheng-Yang F, Berg AC (2016) SSD: single shot MultiBox detector. In: European Conference on Computer Vision 9905:21–37
Liu S, Lu Q, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Computer Vision and Pattern Recognition (CVPR), no.116: 8759–8768
Lowe DG (1999) Object recognition from local scale-invariant features. In: International Conference on Computer Vision (ICCV) 2:1150–1157
Papageorgiou C, Oren M, Poggio T (1998) A general framework for object detection. In: International Conference on Computer Vision (ICCV) 5(2):555–562
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Computer Vision and Pattern Recognition (CVPR), pp 6517–6525. arXiv.1612.08242
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. In: arXiv: Computer Vision and Pattern Recognition (CVPR), pp 1–6. arXiv:1804.02767
Redmon J, Divvala SK, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NIPS) 28:91–99
Ronald A. Rensink (2000) The dynamic representation of scenes visual cognition 7:17-42
Russakovsky O, Deng J, Hao S, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115:211–252
Sean B, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. Computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2016.314
Sermanet P, Kavukcuoglu K,Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Computer Vision and Pattern Recognition (CVPR), pp 3626–3533. arXiv:1212.0142
Simard PY, Steinkraus DW, Platt J (2003) Best practices for convolutional neural networks applied to visual document analysis. In: International conference on document analysis and recognition. IEEE Computer Society, Los Alamitos, 3:958–962
Song K, Yan Y (2013) A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl Surf Sci 285:858–864
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Neural Information Processing Systems (NIPS) 27:3104–3112
van de Sande KEA, Uijlings J, Gevers T, Smeulders AWM (2011) Segmentation as selective search for object recognition. In: International Conference on Computer Vision (ICCV), pp 1879–1886. https://doi.org/10.1109/ICCV.2011.6126456
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Neural Information Processing systems (NIPS) 30:5998–6008
Wang C-Y, Liao H-YM, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) CSPNet: a new backbone that can enhance learning capability of CNN. In: Computer Vision and Pattern Recognition (CVPR), pp 390–391. arXiv:1911.11929
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European Conference on Computer Vision (ECCV) 8689:818–833
Zhu L, Geng X, Li Z, Liu C (2021) Improving YOLOv5 with attention mechanism for detecting boulders from planetary images. Remote Sens 13:3776
Acknowledgements
The paper work was Supported by Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology (FMZ201901, and the National Natural Science Foundation of China “Research on bionic chewing robot for physical property detection and evaluation of food materials” (51375209).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that there is no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, X., Liu, J., Zhou, X. et al. Detection of irregular small defects on metal base surface of infrared laser diode based on deep learning. Multimed Tools Appl 83, 19181–19197 (2024). https://doi.org/10.1007/s11042-023-16352-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16352-3