Skip to main content
Log in

An adaptive loss weighting multi-task network with attention-guide proposal generation for small size defect inspection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Computer vision-based detection approaches have been widely used in defect inspection tasks. However, identifying small-sized defects is still a challenge for most existing methods. It is mainly because: (1) the existing methods fail to extract sufficient information from the small-sized defects; (2) the existing detectors cannot generate effective region proposals for small-sized defects, which results in a low recall rate. To address the above issues, an adaptive loss weighting multi-task model with attention-guide proposal generation is proposed. First, the proposed multi-task model can excavate contextual information to enrich the feature information of small-sized defect areas, enhancing the model’s representation capability. Additionally, to improve the recall rate of small-sized defects, an object attention-guide proposal generation module is proposed by leveraging object attention to guide the confidence enhancement of small-sized defects, which can generate more high-quality region proposals for small-sized defects. Finally, to speed up the joint optimization of the proposed multi-task framework, an adaptive loss weighting algorithm is proposed to learn the optimal combination of multi-task loss functions by maintaining the gradient direction consistency and tuning each task’s loss magnitude. The experimental results on the two public defect datasets demonstrate that the proposed method outperforms other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The DTU-Drone inspection dataset that support the findings of this study are available in: https://data.mendeley.com/datasets/hd96prn3nc. The NEUDET dataset are available in: http://faculty.neu.edu.cn/yunhyan/NEU_surface_defect_database.html.

References

  1. Hu, W., Wang, T., Wang, Y., Chen, Z., Huang, G.: LE–MSFE–DDNet: a defect detection network based on low-light enhancement and multi-scale feature extraction. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02210-6

    Article  Google Scholar 

  2. Liu, G., Li, F.: Fabric defect detection based on low-rank decomposition with structural constraints. Vis. Comput. 38, 639–653 (2022). https://doi.org/10.1007/s00371-020-02040-y

    Article  Google Scholar 

  3. Wang, L., Zhang, Z.: Automatic detection of wind turbine blade surface cracks based on UAV-taken images. IEEE Trans. Ind. Electron. 64, 7293–7303 (2017). https://doi.org/10.1109/tie.2017.2682037

    Article  Google Scholar 

  4. Wang, L., Zhang, Z., Xu, J., Liu, R.: Wind turbine blade breakage monitoring with deep autoencoders. IEEE Trans. Smart Grid 9, 2824–2833 (2018). https://doi.org/10.1109/tsg.2016.2621135

    Article  Google Scholar 

  5. Wang, L., Zhang, Z., Luo, X.: A two-stage data-driven approach for image-based wind turbine blade crack inspections. IEEE/ASME Trans. Mechatron. 24, 1271–1281 (2019). https://doi.org/10.1109/TMECH.2019.2908233

    Article  Google Scholar 

  6. Yu, Y.J., Cao, H., Yan, X.Y., Wang, T., Ge, S.S.: Defect identification of wind turbine blades based on defect semantic features with transfer feature extractor. Neurocomputing 376, 1–9 (2020). https://doi.org/10.1016/j.neucom.2019.09.071

    Article  Google Scholar 

  7. He, Y., Song, K., Meng, Q., Yan, Y.: An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans. Instrum. Meas. 69, 1493–1504 (2020). https://doi.org/10.1109/TIM.2019.2915404

    Article  Google Scholar 

  8. Yeung, C.C., Lam, K.M.: Efficient fused-attention model for steel surface defect detection. IEEE Trans. Instrum. Meas. 71, 1–11 (2022). https://doi.org/10.1109/TIM.2022.3176239

    Article  Google Scholar 

  9. Lian, J., Jia, W., Zareapoor, M., Zheng, Y., Luo, R., Jain, D.K., Kumar, N.: Deep-learning-based small surface defect detection via an exaggerated local variation-based generative adversarial network. IEEE Trans. Ind. Inf. 16, 1343–1351 (2020). https://doi.org/10.1109/TII.2019.2945403

    Article  Google Scholar 

  10. Chen, G., Wang, H., Chen, K., Li, Z., Song, Z., Liu, Y., Chen, W., Knoll, A.: A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans. Syst. Man Cybern. Syst. 52, 936–953 (2022). https://doi.org/10.1109/TSMC.2020.3005231

    Article  Google Scholar 

  11. Liu, Y., Xia, C., Zhu, X., Xu, S.: Two-stage copy-move forgery detection with self deep matching and proposal SuperGlue. IEEE Trans. Image Process. 31, 541–555 (2022). https://doi.org/10.1109/TIP.2021.3132828

    Article  Google Scholar 

  12. Fu, J., Sun, X., Wang, Z., Fu, K.: An anchor-free method based on feature balancing and refinement network for multiscale ship detection in SAR images. IEEE Trans. Geosci. Remote Sens. 59, 1331–1344 (2021). https://doi.org/10.1109/TGRS.2020.3005151

    Article  Google Scholar 

  13. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  14. Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11, 520–527 (2007). https://doi.org/10.1016/j.tics.2007.09.009

    Article  Google Scholar 

  15. Jha, A., Kumar, A., Banerjee, B., Chaudhuri, S.: AdaMT-Net: an adaptive weight learning based multi-task learning model for scene understanding. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. https://doi.org/10.1109/cvprw50498.2020.00361 (2020)

  16. Jha, A., Kumar, A., Banerjee, B., Chaudhuri, S.: AdaMT-Net: an adaptive weight learning based multi-task learning model for scene understanding. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 14–19 June 2020, pp. 3027–3035 (2020)

  17. Chen, Z., Badrinarayanan, V., Lee, C.-Y., Rabinovich, A.: Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: Proceedings of International Conference on Machine Learning. PMLR, pp. 794–803 (2018)

  18. Lin, X., Baweja, H.S., Kantor, G., Held, D.: Adaptive auxiliary task weighting for reinforcement learning. In Proceedings of 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, CANADA, Dec 08–14. Advances in Neural Information Processing Systems (2019)

  19. Ranjan, R., Patel, V.M., Chellappa, R.: HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41, 121–135 (2019). https://doi.org/10.1109/TPAMI.2017.2781233

    Article  Google Scholar 

  20. Yang, H., Fan, Y., Lv, G., Liu, S., Guo, Z.: Exploiting emotional concepts for image emotion recognition. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02472-8

    Article  Google Scholar 

  21. Jiang, M., Zhai, F., Kong, J.: Sparse attention module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. Vis. Comput. 38, 2473–2488 (2022). https://doi.org/10.1007/s00371-021-02124-3

    Article  Google Scholar 

  22. Cipolla, R., Gal, Y., Kendall, A.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18–23 June 2018, pp. 7482–7491 (2018)

  23. Gao, Y., Ma, J., Zhao, M., Liu, W., Yuille, A.L.: NDDR-CNN: layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction. In Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June 2019, pp. 3200–3209 (2019)

  24. Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, pp. 3994–4003 (2016)

  25. Vandenhende, S., Georgoulis, S., Gansbeke, W.V., Proesmans, M., Dai, D., Gool, L.V.: Multi-task learning for dense prediction tasks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3054719

    Article  Google Scholar 

  26. Wilms, C., Frintrop, S.: AttentionMask: attentive, efficient object proposal generation focusing on small objects. In: Proceedings of 14th Asian Conference on Computer Vision (ACCV), Perth, Australia, Dec 02–06. Lecture Notes in Computer Science, pp. 678–694 (2018)

  27. Dong, R.C., Jiao, L.C., Zhang, Y., Zhao, J., Shen, W.Y.: A multi-scale spatial attention region proposal network for high-resolution optical remote sensing imagery. Remote Sens. (2021). https://doi.org/10.3390/rs13173362

    Article  Google Scholar 

  28. Quan, Y., Li, Z.X., Zhang, C.L., Ma, H.F., IEEE Computer Society: Object detection model based on scene-level region proposal self-attention. In: Proceedings of 25th International Conference on Pattern Recognition (ICPR), Electrical Network, Jan 10–15. International Conference on Pattern Recognition, pp. 954–961 (2021)

  29. Guo, M., Haque, A., Huang, D.-A., Yeung, S., Fei-Fei, L.: Dynamic task prioritization for multitask learning. In: Proceedings of Proceedings of the European Conference on Computer Vision (ECCV), pp. 270–287 (2018)

  30. Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, pp. 936–944 (2017)

  31. Zhang, J.H., Min, X.K., Jia, J., Zhu, Z.H., Wang, J., Zhai, G.T.: Fine localization and distortion resistant detection of multi-class barcode in complex environments. Multimed. Tools Appl. 80, 16153–16172 (2021). https://doi.org/10.1007/s11042-019-08578-x

    Article  Google Scholar 

  32. Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, pp. 2874–2883 (2016)

  33. Jia, J., Zhai, G., Zhang, J., Gao, Z., Zhu, Z., Min, X., Yang, X., Guo, G.: EMBDN: an efficient multiclass barcode detection network for complicated environments. IEEE Internet Things J. 6, 9919–9933 (2019). https://doi.org/10.1109/JIOT.2019.2933254

    Article  Google Scholar 

  34. Jia, J., Zhai, G., Ren, P., Zhang, J., Gao, Z., Min, X., Yang, X.: Tiny-BDN: an efficient and compact barcode detection network. IEEE J. Sel. Top. Signal Process. 14, 688–699 (2020). https://doi.org/10.1109/JSTSP.2020.2976566

    Article  Google Scholar 

  35. Hong, M., Li, S., Yang, Y., Zhu, F., Zhao, Q., Lu, L.: SSPNet: scale selection pyramid network for tiny person detection from UAV images. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022). https://doi.org/10.1109/LGRS.2021.3103069

    Article  Google Scholar 

  36. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.Jae-p: YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)

  37. Redmon, J., Farhadi, A.Jae-p.: YOLOv3: an incremental improvement. arXiv:1804.02767 (2018)

  38. Zheng, Z., Wang, P., Liu, W., Li, J., Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of AAAI Conference on Artificial Intelligence

  39. Zhu, Y., Zhai, G., Yang, Y., Duan, H., Min, X., Yang, X.: Viewing behavior supported visual saliency predictor for 360 degree videos. IEEE Trans. Circuits Syst. Video Technol. 32, 4188–4201 (2022). https://doi.org/10.1109/TCSVT.2021.3126590

    Article  Google Scholar 

  40. Zhu, Y., Zhai, G., Min, X., Zhou, J.: The prediction of saliency map for head and eye movements in 360 degree images. iEEE Trans. Multimed. 22, 2331–2344 (2020). https://doi.org/10.1109/TMM.2019.2957986

    Article  Google Scholar 

  41. Min, X., Zhai, G., Zhou, J., Zhang, X.P., Yang, X., Guan, X.: A multimodal saliency model for videos with high audio-visual correspondence. IEEE Trans. Image Process. 29, 3805–3819 (2020). https://doi.org/10.1109/TIP.2020.2966082

    Article  MathSciNet  Google Scholar 

  42. Milletari, F., Navab, N., Ahmadi, S.: V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In: Proceedings of 2016 Fourth International Conference on 3D Vision (3DV), 25–28 Oct. 2016, pp. 565–571 (2016)

  43. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proceedings of Computer Vision—ECCV 2014. Springer, Cham, pp. 818–833 (2014)

  44. Shihavuddin, A.S.M., Chen, X., Fedorov, V., Christensen, A.N., Riis, N.A.B., Branner, K., Dahl, A.B., Paulsen, R.R.: Wind turbine surface damage detection by deep learning aided drone inspection analysis. Energies 12, 15 (2019). https://doi.org/10.3390/en12040676

    Article  Google Scholar 

  45. Krishna, H., Jawahar, C.V.: Improving small object detection. In: Proceedings of 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), 26–29 Nov. 2017, pp. 340–345 (2017)

  46. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Proceedings of Computer Vision—ECCV 2016. Springer, pp. 21–37 (2016)

  47. Redmon, J., Farhadi, A.Jae-p: YOLOv3: an incremental improvement (2018)

  48. Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, pp. 10778–10787 (2020)

  49. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: keypoint triplets for object detection. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 27 Oct.–2 Nov. 2019, pp. 6568–6577 (2019)

  50. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: exceeding YOLO series in 2021 (2021). arxiv:2107.08430

  51. Duan, K., Du, D., Qi, H., Huang, Q.: Detecting small objects using a channel-aware deconvolutional network. IEEE Trans. Circuits Syst. Video Technol. 30, 1639–1652 (2020). https://doi.org/10.1109/TCSVT.2019.2906246

    Article  Google Scholar 

  52. Liang, X., Zhang, J., Zhuo, L., Li, Y., Tian, Q.: Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Trans. Circuits Syst. Video Technol. 30, 1758–1770 (2020). https://doi.org/10.1109/TCSVT.2019.2905881

    Article  Google Scholar 

  53. Parashar, D., Agrawal, D.K.: Automatic classification of glaucoma stages using two-dimensional tensor empirical wavelet transform. IEEE Signal Process. Lett. 28, 66–70 (2021). https://doi.org/10.1109/LSP.2020.3045638

    Article  Google Scholar 

  54. Parashar, D., Agrawal, D.K.: Classification of glaucoma stages using image empirical mode decomposition from fundus images. J. Digit. Imaging (2022). https://doi.org/10.1007/s10278-022-00648-1

    Article  Google Scholar 

  55. Min, X., Ma, K., Gu, K., Zhai, G., Wang, Z., Lin, W.: Unified blind quality assessment of compressed natural, graphic, and screen content images. IEEE Trans. Image Process. 26, 5462–5474 (2017). https://doi.org/10.1109/TIP.2017.2735192

    Article  MathSciNet  Google Scholar 

  56. Min, X.K., Gu, K., Zhai, G.T., Yang, X.K., Zhang, W.J., Le Callet, P., Chen, C.W.: Screen content quality assessment: overview, benchmark, and beyond. ACM Comput Surv (2022). https://doi.org/10.1145/3470970

    Article  Google Scholar 

  57. Min, X., Zhai, G., Zhou, J., Farias, M.C.Q., Bovik, A.C.: Study of subjective and objective quality assessment of audio-visual signals. IEEE Trans. Image Process. 29, 6054–6068 (2020). https://doi.org/10.1109/TIP.2020.2988148

    Article  Google Scholar 

  58. Zhai, G.T., Min, X.K.: Perceptual image quality assessment: a survey. Sci. China Inf. Sci. 63, 10 (2020). https://doi.org/10.1007/s11432-019-2757-1

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Key-Area Research and Development Program of Guangdong Province under Grant (2020B1111010002, 2018B010109001), 2021 Guangdong Provincial Science and Technology Special Fund (“Big Project + Task List”) under Grant 210719145863737, the Guangdong Marine Economic Development Project under Grant GDNRC[2020]018, and Laboratory of Autonomous Systems and Network Control of Ministry of Education.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bin Li or Lianfang Tian.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, H., Li, B., Tian, L. et al. An adaptive loss weighting multi-task network with attention-guide proposal generation for small size defect inspection. Vis Comput 40, 681–698 (2024). https://doi.org/10.1007/s00371-023-02809-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02809-x

Keywords

Navigation