SFOD-Trans: semi-supervised fine-grained object detection framework with transformer module

Liu, Quankai; Zhang, Guangyuan; Li, Kefeng; Zhou, Fengyu; Yu, Dexin

doi:10.1007/s11517-022-02682-1

SFOD-Trans: semi-supervised fine-grained object detection framework with transformer module

Original Article
Published: 17 October 2022

Volume 60, pages 3555–3566, (2022)
Cite this article

Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Quankai Liu¹,
Guangyuan Zhang¹,
Kefeng Li ORCID: orcid.org/0000-0001-8278-6454¹,
Fengyu Zhou² &
…
Dexin Yu³

502 Accesses
1 Citation
Explore all metrics

Abstract

As the labeling cost of object detection for medical images is very high, semi-supervised learning methods for medical images are investigated. In this paper, semi-supervised fine-grained object detection framework with transformer module (SFOD-Trans) is proposed for hepatic portal vein detection. It adopts Sparse R-CNN as the backbone. In detection model, the transformer module is introduced and contrastive loss is added to improve the performance of fine-grained object detection. In order to complete the information transfer both of labeled and unlabeled pictures, a new fusion module named normalized ROI fusion (NRF) is designed based on the characteristics of hepatic portal vein. We run a large number of experiments on a dataset of 1000 real CT scans. The results show that Average Precision (AP) and Average Recall (AR) of the proposed method reach 0.773 and 0.831 respectively with the 300 labeled and 1500 unlabeled samples.

Graphic abstract

An overview of semi-supervised fine-grained object detection framework with transformer module (SFOD-Trans). There are two parallel branches to train supervised loss and semi-supervised loss respectively

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self- and Semi-supervised Learning for Gastroscopic Lesion Detection

CSSD: Cross-Supervision and Self-denoising for Hybrid-Supervised Hepatic Vessel Segmentation

Hetero-Modal Learning and Expansive Consistency Constraints for Semi-supervised Detection from Multi-sequence Data

References

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 779–788
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis Machine Intelligence 39(6):1137–1149
Article PubMed Google Scholar
Qiao S, Chen L-C, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 10213–10224
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229
Sun P, Zhang R, Jiang Y, Kong T, Xu C, Zhan W, Tomizuka M, Li L, Yuan Z, Wang C et al (2021) Sparse R-CNN: End-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 14454–14463
He J, Chen J-N, Liu S, Kortylewski A, Yang C, Bai Y, Wang C, Yuille A (2021) Transfg: a transformer architecture for fine-grained recognition. Preprint at arXiv: 2103.07976
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: transformers make strong encoders for medical image segmentation. Preprint at arXiv: 2102.04306
Xie E, Wang W, Wang W, Sun P, Xu H, Liang D, Luo P (2021) Trans2seg: Transparent object segmentation with transformer
Stewart R, Andriluka M, Ng AY (2016) End-to-end people detection in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2325–2333
Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning. In: Chapelle O et al (ed) 2006 IEEE Transactions on Neural Networks, vol 20, no 3. pp 542–542
Sajjadi M, Javanmardi M, Tasdizen T (2016) Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Adv Neural Inf Proces Syst 29
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: Beyond empirical risk minimization. Preprint at arXiv:1710.09412
Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA (2019) Mixmatch: A holistic approach to semi-supervised learning. Adv Neural Inf Proces Syst 32
Grandvalet Y, Bengio Y (2004) Semi-supervised learning by entropy minimization. Adv Neural Inf Proces Syst 17
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 7794–7803
Jie H, Li S, Gang S, Albanie S (2017) Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence PP:99
Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: International Conference on Machine Learning. PMLR, pp 4055–4064
Lüscher C, Beck E, Irie K, Kitza M, Michel W, Zeyer A, Schlüter R, Ney H (2019) Rwth ASR systems for librispeech: Hybrid vs attention–w/o data augmentation. Preprint at arXiv: 1905.03072
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint at arXiv: 1810.04805
Joachims T et al (1999) Transductive inference for text classification using support vector machines. In: ICML, vol 99. pp 200–209
Gammerman A, Vovk V, Vapnik V (2013) Learning by transduction. Morgan Kaufmann Publishers Inc.
Liu B, Wu Z, Hu H, Lin S (2019) Deep metric transfer for label propagation with limited annotated data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp 0–0
Kingma DP, Rezende DJ, Mohamed S, Welling M (2014) Semi-supervised learning with deep generative models. Adv Neural Inf Proces Syst 4:3581–3589
Google Scholar
Pu Y, Gan Z, Henao R, Yuan X, Li C, Stevens A, Carin L (2016) Variational autoencoder for deep learning of images, labels and captions. Adv Neural Inf Proces Syst 29
Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. Preprint at arXiv: 1610.02242
Xie Q, Dai Z, Hovy E, Luong M-T, Le QV (2019) Unsupervised data augmentation for consistency training. Preprint at arXiv: 1904.12848
Liu Y, Ning Z, Örmeci N, An W, Yu Q, Han K, Huang Y, Liu D, Liu F, Li Z et al (2020) Deep convolutional neural network-aided detection of portal hypertension in patients with cirrhosis. Clin Gastroenterol Hepatol 18(13):2998–3007
Article PubMed Google Scholar
Nie D, Gao Y, Wang L, Shen D (2018) ASDNET: attention based semi-supervised deep networks for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 370–378
Qiao S, Shen W, Zhang Z, Wang B, Yuille A (2018) Deep co-training for semi-supervised image recognition. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 135–152
Li X, Yu L, Chen H, Fu C-W, Xing L, Heng P-A (2020) Transformation-consistent self-ensembling model for semisupervised medical image segmentation. IEEE Transactions on Neural Networks and Learning Systems 32(2):523–534
Article Google Scholar
Luo X, Liao W, Chen J, Song T, Chen Y, Zhang S, Chen N, Wang G, Zhang S (2021) Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp 318–329
Zhou Y, He X, Huang L, Liu L, Zhu F, Cui S, Shao L (2019) Collaborative learning of semi-supervised segmentation and classification for medical images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2079–2088
Chen S, Bortsova G, García-Uceda Juárez A, Tulder GV, Bruijne MD (2019) Multi-task attention-based semi-supervised learning for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp 457–465
Ganaye P-A, Sdika M, Benoit-Cattin H (2018) Semi-supervised learning for segmentation under semantic constraint. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer. pp 595–602
Huynh T, Nibali A, He Z (2022) Semi-supervised learning for medical image classification using imbalanced training data. Comput Methods Prog Biomed 106628
Wang Y, Zheng K, Cheng C-T, Zhou X-Y, Zheng Z, Xiao J, Lu L, Liao C-H, Miao S (2021)Knowledge distillation with adaptive asymmetric label sharpening for semi-supervised fracture detection in chest x-rays. In: International Conference on Information Processing in Medical Imaging. Springer, pp 599–610
Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 6154–6162
Everingham M, Eslami S, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136
Article Google Scholar
Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. Adv Neural Inf Proces Syst 29
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable DETR: Deformable transformers for end-to-end object detection. Preprint at arXiv: 2010.04159
Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille AL (2018) PCL: Proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell PP:1–1
Wan F, Wei P, Jiao J, Han Z, Ye Q (2018) Min-entropy latent model for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 1297–1306
Sohn K, Zhang Z, Li C-L, Zhang H, Lee C-Y, Pfister T (2020) A simple semi-supervised learning framework for object detection. Preprint at arXiv: 2005.04757

Download references

Author information

Authors and Affiliations

School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan, 250357, China
Quankai Liu, Guangyuan Zhang & Kefeng Li
School of Control Science and Engineering, Shandong University, Jinan, China
Fengyu Zhou
Department of Radiology, Qilu Hospital of Shandong University, Jinan, 250000, People’s Republic of China
Dexin Yu

Authors

Quankai Liu
View author publications
You can also search for this author inPubMed Google Scholar
Guangyuan Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Kefeng Li
View author publications
You can also search for this author inPubMed Google Scholar
Fengyu Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Dexin Yu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Kefeng Li.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Q., Zhang, G., Li, K. et al. SFOD-Trans: semi-supervised fine-grained object detection framework with transformer module. Med Biol Eng Comput 60, 3555–3566 (2022). https://doi.org/10.1007/s11517-022-02682-1

Download citation

Received: 02 November 2021
Accepted: 24 September 2022
Published: 17 October 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11517-022-02682-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SFOD-Trans: semi-supervised fine-grained object detection framework with transformer module

Abstract

Graphic abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Self- and Semi-supervised Learning for Gastroscopic Lesion Detection

CSSD: Cross-Supervision and Self-denoising for Hybrid-Supervised Hepatic Vessel Segmentation

Hetero-Modal Learning and Expansive Consistency Constraints for Semi-supervised Detection from Multi-sequence Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now