An adaptive loss weighting multi-task network with attention-guide proposal generation for small size defect inspection

Wu, Huangyuan; Li, Bin; Tian, Lianfang; Feng, Junjian; Dong, Chao

doi:10.1007/s00371-023-02809-x

An adaptive loss weighting multi-task network with attention-guide proposal generation for small size defect inspection

Original article
Published: 09 March 2023

Volume 40, pages 681–698, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Huangyuan Wu¹^na1,
Bin Li ORCID: orcid.org/0000-0001-6767-5350¹^na1,
Lianfang Tian^1,2,
Junjian Feng¹ &
…
Chao Dong³

346 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Computer vision-based detection approaches have been widely used in defect inspection tasks. However, identifying small-sized defects is still a challenge for most existing methods. It is mainly because: (1) the existing methods fail to extract sufficient information from the small-sized defects; (2) the existing detectors cannot generate effective region proposals for small-sized defects, which results in a low recall rate. To address the above issues, an adaptive loss weighting multi-task model with attention-guide proposal generation is proposed. First, the proposed multi-task model can excavate contextual information to enrich the feature information of small-sized defect areas, enhancing the model’s representation capability. Additionally, to improve the recall rate of small-sized defects, an object attention-guide proposal generation module is proposed by leveraging object attention to guide the confidence enhancement of small-sized defects, which can generate more high-quality region proposals for small-sized defects. Finally, to speed up the joint optimization of the proposed multi-task framework, an adaptive loss weighting algorithm is proposed to learn the optimal combination of multi-task loss functions by maintaining the gradient direction consistency and tuning each task’s loss magnitude. The experimental results on the two public defect datasets demonstrate that the proposed method outperforms other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Data availability

The DTU-Drone inspection dataset that support the findings of this study are available in: https://data.mendeley.com/datasets/hd96prn3nc. The NEUDET dataset are available in: http://faculty.neu.edu.cn/yunhyan/NEU_surface_defect_database.html.

References

Hu, W., Wang, T., Wang, Y., Chen, Z., Huang, G.: LE–MSFE–DDNet: a defect detection network based on low-light enhancement and multi-scale feature extraction. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02210-6
Article Google Scholar
Liu, G., Li, F.: Fabric defect detection based on low-rank decomposition with structural constraints. Vis. Comput. 38, 639–653 (2022). https://doi.org/10.1007/s00371-020-02040-y
Article Google Scholar
Wang, L., Zhang, Z.: Automatic detection of wind turbine blade surface cracks based on UAV-taken images. IEEE Trans. Ind. Electron. 64, 7293–7303 (2017). https://doi.org/10.1109/tie.2017.2682037
Article Google Scholar
Wang, L., Zhang, Z., Xu, J., Liu, R.: Wind turbine blade breakage monitoring with deep autoencoders. IEEE Trans. Smart Grid 9, 2824–2833 (2018). https://doi.org/10.1109/tsg.2016.2621135
Article Google Scholar
Wang, L., Zhang, Z., Luo, X.: A two-stage data-driven approach for image-based wind turbine blade crack inspections. IEEE/ASME Trans. Mechatron. 24, 1271–1281 (2019). https://doi.org/10.1109/TMECH.2019.2908233
Article Google Scholar
Yu, Y.J., Cao, H., Yan, X.Y., Wang, T., Ge, S.S.: Defect identification of wind turbine blades based on defect semantic features with transfer feature extractor. Neurocomputing 376, 1–9 (2020). https://doi.org/10.1016/j.neucom.2019.09.071
Article Google Scholar
He, Y., Song, K., Meng, Q., Yan, Y.: An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans. Instrum. Meas. 69, 1493–1504 (2020). https://doi.org/10.1109/TIM.2019.2915404
Article Google Scholar
Yeung, C.C., Lam, K.M.: Efficient fused-attention model for steel surface defect detection. IEEE Trans. Instrum. Meas. 71, 1–11 (2022). https://doi.org/10.1109/TIM.2022.3176239
Article Google Scholar
Lian, J., Jia, W., Zareapoor, M., Zheng, Y., Luo, R., Jain, D.K., Kumar, N.: Deep-learning-based small surface defect detection via an exaggerated local variation-based generative adversarial network. IEEE Trans. Ind. Inf. 16, 1343–1351 (2020). https://doi.org/10.1109/TII.2019.2945403
Article Google Scholar
Chen, G., Wang, H., Chen, K., Li, Z., Song, Z., Liu, Y., Chen, W., Knoll, A.: A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans. Syst. Man Cybern. Syst. 52, 936–953 (2022). https://doi.org/10.1109/TSMC.2020.3005231
Article Google Scholar
Liu, Y., Xia, C., Zhu, X., Xu, S.: Two-stage copy-move forgery detection with self deep matching and proposal SuperGlue. IEEE Trans. Image Process. 31, 541–555 (2022). https://doi.org/10.1109/TIP.2021.3132828
Article Google Scholar
Fu, J., Sun, X., Wang, Z., Fu, K.: An anchor-free method based on feature balancing and refinement network for multiscale ship detection in SAR images. IEEE Trans. Geosci. Remote Sens. 59, 1331–1344 (2021). https://doi.org/10.1109/TGRS.2020.3005151
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11, 520–527 (2007). https://doi.org/10.1016/j.tics.2007.09.009
Article Google Scholar
Jha, A., Kumar, A., Banerjee, B., Chaudhuri, S.: AdaMT-Net: an adaptive weight learning based multi-task learning model for scene understanding. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. https://doi.org/10.1109/cvprw50498.2020.00361 (2020)
Jha, A., Kumar, A., Banerjee, B., Chaudhuri, S.: AdaMT-Net: an adaptive weight learning based multi-task learning model for scene understanding. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 14–19 June 2020, pp. 3027–3035 (2020)
Chen, Z., Badrinarayanan, V., Lee, C.-Y., Rabinovich, A.: Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: Proceedings of International Conference on Machine Learning. PMLR, pp. 794–803 (2018)
Lin, X., Baweja, H.S., Kantor, G., Held, D.: Adaptive auxiliary task weighting for reinforcement learning. In Proceedings of 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, CANADA, Dec 08–14. Advances in Neural Information Processing Systems (2019)
Ranjan, R., Patel, V.M., Chellappa, R.: HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41, 121–135 (2019). https://doi.org/10.1109/TPAMI.2017.2781233
Article Google Scholar
Yang, H., Fan, Y., Lv, G., Liu, S., Guo, Z.: Exploiting emotional concepts for image emotion recognition. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02472-8
Article Google Scholar
Jiang, M., Zhai, F., Kong, J.: Sparse attention module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. Vis. Comput. 38, 2473–2488 (2022). https://doi.org/10.1007/s00371-021-02124-3
Article Google Scholar
Cipolla, R., Gal, Y., Kendall, A.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18–23 June 2018, pp. 7482–7491 (2018)
Gao, Y., Ma, J., Zhao, M., Liu, W., Yuille, A.L.: NDDR-CNN: layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction. In Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June 2019, pp. 3200–3209 (2019)
Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, pp. 3994–4003 (2016)
Vandenhende, S., Georgoulis, S., Gansbeke, W.V., Proesmans, M., Dai, D., Gool, L.V.: Multi-task learning for dense prediction tasks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3054719
Article Google Scholar
Wilms, C., Frintrop, S.: AttentionMask: attentive, efficient object proposal generation focusing on small objects. In: Proceedings of 14th Asian Conference on Computer Vision (ACCV), Perth, Australia, Dec 02–06. Lecture Notes in Computer Science, pp. 678–694 (2018)
Dong, R.C., Jiao, L.C., Zhang, Y., Zhao, J., Shen, W.Y.: A multi-scale spatial attention region proposal network for high-resolution optical remote sensing imagery. Remote Sens. (2021). https://doi.org/10.3390/rs13173362
Article Google Scholar
Quan, Y., Li, Z.X., Zhang, C.L., Ma, H.F., IEEE Computer Society: Object detection model based on scene-level region proposal self-attention. In: Proceedings of 25th International Conference on Pattern Recognition (ICPR), Electrical Network, Jan 10–15. International Conference on Pattern Recognition, pp. 954–961 (2021)
Guo, M., Haque, A., Huang, D.-A., Yeung, S., Fei-Fei, L.: Dynamic task prioritization for multitask learning. In: Proceedings of Proceedings of the European Conference on Computer Vision (ECCV), pp. 270–287 (2018)
Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, pp. 936–944 (2017)
Zhang, J.H., Min, X.K., Jia, J., Zhu, Z.H., Wang, J., Zhai, G.T.: Fine localization and distortion resistant detection of multi-class barcode in complex environments. Multimed. Tools Appl. 80, 16153–16172 (2021). https://doi.org/10.1007/s11042-019-08578-x
Article Google Scholar
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, pp. 2874–2883 (2016)
Jia, J., Zhai, G., Zhang, J., Gao, Z., Zhu, Z., Min, X., Yang, X., Guo, G.: EMBDN: an efficient multiclass barcode detection network for complicated environments. IEEE Internet Things J. 6, 9919–9933 (2019). https://doi.org/10.1109/JIOT.2019.2933254
Article Google Scholar
Jia, J., Zhai, G., Ren, P., Zhang, J., Gao, Z., Min, X., Yang, X.: Tiny-BDN: an efficient and compact barcode detection network. IEEE J. Sel. Top. Signal Process. 14, 688–699 (2020). https://doi.org/10.1109/JSTSP.2020.2976566
Article Google Scholar
Hong, M., Li, S., Yang, Y., Zhu, F., Zhao, Q., Lu, L.: SSPNet: scale selection pyramid network for tiny person detection from UAV images. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022). https://doi.org/10.1109/LGRS.2021.3103069
Article Google Scholar
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.Jae-p: YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)
Redmon, J., Farhadi, A.Jae-p.: YOLOv3: an incremental improvement. arXiv:1804.02767 (2018)
Zheng, Z., Wang, P., Liu, W., Li, J., Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of AAAI Conference on Artificial Intelligence
Zhu, Y., Zhai, G., Yang, Y., Duan, H., Min, X., Yang, X.: Viewing behavior supported visual saliency predictor for 360 degree videos. IEEE Trans. Circuits Syst. Video Technol. 32, 4188–4201 (2022). https://doi.org/10.1109/TCSVT.2021.3126590
Article Google Scholar
Zhu, Y., Zhai, G., Min, X., Zhou, J.: The prediction of saliency map for head and eye movements in 360 degree images. iEEE Trans. Multimed. 22, 2331–2344 (2020). https://doi.org/10.1109/TMM.2019.2957986
Article Google Scholar
Min, X., Zhai, G., Zhou, J., Zhang, X.P., Yang, X., Guan, X.: A multimodal saliency model for videos with high audio-visual correspondence. IEEE Trans. Image Process. 29, 3805–3819 (2020). https://doi.org/10.1109/TIP.2020.2966082
Article MathSciNet Google Scholar
Milletari, F., Navab, N., Ahmadi, S.: V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In: Proceedings of 2016 Fourth International Conference on 3D Vision (3DV), 25–28 Oct. 2016, pp. 565–571 (2016)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proceedings of Computer Vision—ECCV 2014. Springer, Cham, pp. 818–833 (2014)
Shihavuddin, A.S.M., Chen, X., Fedorov, V., Christensen, A.N., Riis, N.A.B., Branner, K., Dahl, A.B., Paulsen, R.R.: Wind turbine surface damage detection by deep learning aided drone inspection analysis. Energies 12, 15 (2019). https://doi.org/10.3390/en12040676
Article Google Scholar
Krishna, H., Jawahar, C.V.: Improving small object detection. In: Proceedings of 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), 26–29 Nov. 2017, pp. 340–345 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Proceedings of Computer Vision—ECCV 2016. Springer, pp. 21–37 (2016)
Redmon, J., Farhadi, A.Jae-p: YOLOv3: an incremental improvement (2018)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, pp. 10778–10787 (2020)
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: keypoint triplets for object detection. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 27 Oct.–2 Nov. 2019, pp. 6568–6577 (2019)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: exceeding YOLO series in 2021 (2021). arxiv:2107.08430
Duan, K., Du, D., Qi, H., Huang, Q.: Detecting small objects using a channel-aware deconvolutional network. IEEE Trans. Circuits Syst. Video Technol. 30, 1639–1652 (2020). https://doi.org/10.1109/TCSVT.2019.2906246
Article Google Scholar
Liang, X., Zhang, J., Zhuo, L., Li, Y., Tian, Q.: Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Trans. Circuits Syst. Video Technol. 30, 1758–1770 (2020). https://doi.org/10.1109/TCSVT.2019.2905881
Article Google Scholar
Parashar, D., Agrawal, D.K.: Automatic classification of glaucoma stages using two-dimensional tensor empirical wavelet transform. IEEE Signal Process. Lett. 28, 66–70 (2021). https://doi.org/10.1109/LSP.2020.3045638
Article Google Scholar
Parashar, D., Agrawal, D.K.: Classification of glaucoma stages using image empirical mode decomposition from fundus images. J. Digit. Imaging (2022). https://doi.org/10.1007/s10278-022-00648-1
Article Google Scholar
Min, X., Ma, K., Gu, K., Zhai, G., Wang, Z., Lin, W.: Unified blind quality assessment of compressed natural, graphic, and screen content images. IEEE Trans. Image Process. 26, 5462–5474 (2017). https://doi.org/10.1109/TIP.2017.2735192
Article MathSciNet Google Scholar
Min, X.K., Gu, K., Zhai, G.T., Yang, X.K., Zhang, W.J., Le Callet, P., Chen, C.W.: Screen content quality assessment: overview, benchmark, and beyond. ACM Comput Surv (2022). https://doi.org/10.1145/3470970
Article Google Scholar
Min, X., Zhai, G., Zhou, J., Farias, M.C.Q., Bovik, A.C.: Study of subjective and objective quality assessment of audio-visual signals. IEEE Trans. Image Process. 29, 6054–6068 (2020). https://doi.org/10.1109/TIP.2020.2988148
Article Google Scholar
Zhai, G.T., Min, X.K.: Perceptual image quality assessment: a survey. Sci. China Inf. Sci. 63, 10 (2020). https://doi.org/10.1007/s11432-019-2757-1
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Key-Area Research and Development Program of Guangdong Province under Grant (2020B1111010002, 2018B010109001), 2021 Guangdong Provincial Science and Technology Special Fund (“Big Project + Task List”) under Grant 210719145863737, the Guangdong Marine Economic Development Project under Grant GDNRC[2020]018, and Laboratory of Autonomous Systems and Network Control of Ministry of Education.

Author information

Huangyuan Wu and Bin Li contributed equally to this work.

Authors and Affiliations

School of Automation Science and Engineering, South China University of Technology, Guangzhou, 510640, China
Huangyuan Wu, Bin Li, Lianfang Tian & Junjian Feng
Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai, 519000, China
Lianfang Tian
South China Sea Marine Survey and Technology Center, Guangzhou, 510300, China
Chao Dong

Authors

Huangyuan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Li
View author publications
You can also search for this author in PubMed Google Scholar
Lianfang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Junjian Feng
View author publications
You can also search for this author in PubMed Google Scholar
Chao Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Bin Li or Lianfang Tian.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, H., Li, B., Tian, L. et al. An adaptive loss weighting multi-task network with attention-guide proposal generation for small size defect inspection. Vis Comput 40, 681–698 (2024). https://doi.org/10.1007/s00371-023-02809-x

Download citation

Accepted: 09 February 2023
Published: 09 March 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00371-023-02809-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive loss weighting multi-task network with attention-guide proposal generation for small size defect inspection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An adaptive loss weighting multi-task network with attention-guide proposal generation for small size defect inspection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation