Multi-label noisy samples in underwater inspection from the oil and gas industry

Sousa, First Vitor; Pereira, Second Amanda; Koher, Third Manoela; Pacheco, Fourth Marco

doi:10.1007/s00521-024-09434-2

Multi-label noisy samples in underwater inspection from the oil and gas industry

Original Article
Published: 16 February 2024

Volume 36, pages 6855–6873, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

First Vitor Sousa ORCID: orcid.org/0000-0003-4231-0667¹,
Second Amanda Pereira¹,
Third Manoela Koher¹ &
…
Fourth Marco Pacheco¹

259 Accesses
Explore all metrics

Abstract

Deep learning has shown remarkable success in various machine learning tasks, including multi-label classification. Multi-label classification is a supervised task where an input instance can be associated with multiple labels simultaneously, instead of exclusively one, as in the single-label scenario. When building a multi-label dataset for real-world applications, a recurrent problem is the presence of noisy labels. In this context, noisy labels refer to mislabeled data, which can potentially weaken the performance of supervised models. Although this issue may be well explored for single-label noise, it is still an emerging topic for multi-label applications. In this work, a novel deep learning model that handles multi-label noise is proposed, where we combine the Small Loss Approach Multi-label (SLAM) with a joint loss, in order to automatically identify and rectify noisy labels. The model outperforms in $2.5\%$ for the F1-score state-of-the-art (SOTA) models in the noisy version of the benchmark UcMerced. A new open noisy version of the benchmark TreeSATAI was developed and is now disclosed, where the performance gains reached $1.8\%$ in F-1 Score. Furthermore, the model was able to reduce the presence of noise from $25\%$ to $5\%$ in both sets. In addition, we evaluate the model on a real-world application of underwater inspections, to assist with the multi-label classification for an oil and gas company. Our model achieved gains in the F1-Score of $10\%$ when compared to a standard model (without noise-handling techniques), and up to $2.7\%$ and $1.9\%$ when compared to SOTA models SLAM and JoCoR, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SplitNet: Learnable Clean-Noisy Label Splitting for Learning with Noisy Labels

Article Open access 09 August 2024

Noisy Label Learning Based on Weighted Neighborhood Consistency

Delving Deeper Into Clean Samples for Combating Noisy Labels

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability Statement

The benchmark dataset TreeSatAI used to evaluate our model is available in:https://zenodo.org/records/6598391 and the noisy labels available in the link: https://github.com/ICA-PUC/TreeSatAINoiseDataset; The benchmark dataset UcMerced used to evaluate our model is available in: http://weegee.vision.ucmerced.edu/datasets/landuse.html and the noisy labels available in: https://github.com/ICA-PUC/UcMercedNoiseDataset The dataset used in the real application of our model is private data, and in respect to the determinations of the data owners it will not be turn public, for corporate reasons. The code used for models Slam by joint loss will not be turn public due the determinations of the data owners, for corporate reasons.

References

You A, Kim JK, Ryu IH, Yoo TK (2022) Application of generative adversarial networks (gan) for ophthalmology image domains: a survey. Eye Vis 9(1):1–19
Article Google Scholar
Baek M, Baker D (2022) Deep learning and protein structure modeling. Nature Methods 19(1):13–14
Article CAS PubMed Google Scholar
Yusoff M, Haryanto T, Suhartanto H, Mustafa WA, Zain JM, Kusmardi K (2023) Accuracy analysis of deep learning methods in breast cancer classification: A structured review. Diagnostics 13(4):683
Article PubMed PubMed Central Google Scholar
Yu J, Yin H, Xia X, Chen T, Li J, Huang Z (2023) Self-supervised learning for recommender systems: a survey. IEEE Trans Knowl Data Eng 23:535
Google Scholar
Liu W, Wang H, Shen X, Tsang IW (2021) The emerging trends of multi-label learning. IEEE Trans Patt Anal Mach Intell 44(11):7955–7974
Article Google Scholar
Haghighian Roudsari A, Afshar J, Lee W, Lee S (2022) Patentnet: multi-label classification of patent documents using deep learning based language understanding. Scientometrics 1:1–25
Google Scholar
Kaselimi M, Voulodimos A, Daskalopoulos I, Doulamis N, Doulamis A (2022) A vision transformer model for convolution-free multilabel classification of satellite imagery in deforestation monitoring. IEEE Trans Neural Netw Learn Syst 2:003
Google Scholar
Cheng X, Lin H, Wu X, Shen D, Yang, F, Liu H, Shi N Mltr: Multi-label classification with transformer. In: 2022 IEEE international conference on multimedia and expo (ICME), pp. 1–6 (2022). IEEE
Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang I, Sugiyama M (2018) Co-teaching: Robust training of deep neural networks with extremely noisy labels. Adv Neural Inform Process Syst 31:355
Google Scholar
Wei H, Feng L, Chen X, An B (2020) Combating noisy labels by agreement: A joint training method with co-regularization. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13726–13735
Yao Y, Sun Z, Zhang C, Shen F, Wu Q, Zhang J, Tang Z (2021) Jo-src: a contrastive approach for combating noisy labels. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5192–5201
Yu X, Han B, Yao J, Niu G, Tsang I, Sugiyama M (2019) How does disagreement help generalization against label corruption? In: international conference on machine learning, pp. 7164–7173 PMLR
Huang L, Zhang C, Zhang H (2021) Self-adaptive training: bridging the supervised and self-supervised learning. arXiv preprint arXiv:2101.08732
Burgert T, Ravanbakhsh M, Demir B (2022) On the effects of different types of label noise in multi-label remote sensing image classification. IEEE Trans Geosci Remote Sens 60:1–13
Article Google Scholar
Sousa V, Pereira AL, Kohler M, Pacheco M Learning by small loss approach multi-label to deal with noisy labels. In: International Conference on Computational Science and Its Applications, pp. 385–403 (2023). Springer
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: computer vision–ECCV 2016 workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14, pp. 850–865 . Springer
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: proceedings of the eleventh annual conference on computational learning theory, pp. 92–100
Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, pp. 270–279
Ahlswede S, Schulz C, Gava C, Helber P, Bischke B, Förster M, Arias F, Hees J, Demir B, Kleinschmit B (2022) Treesatai benchmark archive: a multi-sensor, multi-label dataset for tree species classification in remote sensing. Earth Syst Sci Data Discuss 2022:1–22
Google Scholar
Wei T, Shi JX, Tu WW, Li YF (2021) Robust long-tailed learning under label noise. arXiv preprint arXiv:2108.11569
Ghosh A, Kumar H, Sastry PS (2017) Robust loss functions under label noise for deep neural networks. In: proceedings of the AAAI conference on artificial intelligence, vol. 31
Liu S, Niles-Weed J, Razavian N, Fernandez-Granda C (2020) Early-learning regularization prevents memorization of noisy labels. Adv Neural Inform Process Syst 33:20331–20342
Google Scholar
Sun Z, Shen F, Huang D, Wang Q, Shu X, Yao Y, Tang J (2022) Pnp: Robust learning from noisy labels by probabilistic noise prediction. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5311–5320
Inoue N, Simo-Serra E, Yamasaki T, Ishikawa H (2017) Multi-label fashion image classification with minimal human supervision. In: proceedings of the IEEE international conference on computer vision workshops, pp. 2261–2267
Hu M, Han, H, Shan S, Chen X (2019) Multi-label learning from noisy labels with non-linear feature transformation. In: computer vision–ACCV 2018: 14th asian conference on computer vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part V 14, pp. 404–419 Springer
Menéndez M, Pardo J, Pardo L, Pardo M (1997) The jensen-shannon divergence. J Frank Inst 334(2):307–318
Article MathSciNet Google Scholar
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2021) Understanding deep learning (still) requires rethinking generalization. Commun ACM 64(3):107–115
Article Google Scholar
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Adv Neural Inform Process Syst 33:18661–18673
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee
Kingma DP, Ba J (2014)Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

Download references

Acknowledgements

The authors would like to thank Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Capes), and Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio) for their financial support.

Author information

Authors and Affiliations

Department of Electrical Engineering, Pontifícia Universidade Católica, R. Marquês de São Vicente, 225, Rio de Janeiro, 22451–900, Rio de Janeiro, Brazil
First Vitor Sousa, Second Amanda Pereira, Third Manoela Koher & Fourth Marco Pacheco

Authors

First Vitor Sousa
View author publications
You can also search for this author inPubMed Google Scholar
Second Amanda Pereira
View author publications
You can also search for this author inPubMed Google Scholar
Third Manoela Koher
View author publications
You can also search for this author inPubMed Google Scholar
Fourth Marco Pacheco
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to First Vitor Sousa.

Ethics declarations

Conflict of interest

All authors state that there is no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sousa, F.V., Pereira, S.A., Koher, T.M. et al. Multi-label noisy samples in underwater inspection from the oil and gas industry. Neural Comput & Applic 36, 6855–6873 (2024). https://doi.org/10.1007/s00521-024-09434-2

Download citation

Received: 28 November 2023
Accepted: 15 January 2024
Published: 16 February 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00521-024-09434-2

Keyword

Multi-label noise, Noisy label, Deep learning

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-label noisy samples in underwater inspection from the oil and gas industry

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SplitNet: Learnable Clean-Noisy Label Splitting for Learning with Noisy Labels

Noisy Label Learning Based on Weighted Neighborhood Consistency

Delving Deeper Into Clean Samples for Combating Noisy Labels

Explore related subjects

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keyword

Subscribe and save

Buy Now