Abstract
As the labeling cost of object detection for medical images is very high, semi-supervised learning methods for medical images are investigated. In this paper, semi-supervised fine-grained object detection framework with transformer module (SFOD-Trans) is proposed for hepatic portal vein detection. It adopts Sparse R-CNN as the backbone. In detection model, the transformer module is introduced and contrastive loss is added to improve the performance of fine-grained object detection. In order to complete the information transfer both of labeled and unlabeled pictures, a new fusion module named normalized ROI fusion (NRF) is designed based on the characteristics of hepatic portal vein. We run a large number of experiments on a dataset of 1000 real CT scans. The results show that Average Precision (AP) and Average Recall (AR) of the proposed method reach 0.773 and 0.831 respectively with the 300 labeled and 1500 unlabeled samples.
Graphic abstract
An overview of semi-supervised fine-grained object detection framework with transformer module (SFOD-Trans). There are two parallel branches to train supervised loss and semi-supervised loss respectively







Similar content being viewed by others
References
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 779–788
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis Machine Intelligence 39(6):1137–1149
Qiao S, Chen L-C, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 10213–10224
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229
Sun P, Zhang R, Jiang Y, Kong T, Xu C, Zhan W, Tomizuka M, Li L, Yuan Z, Wang C et al (2021) Sparse R-CNN: End-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 14454–14463
He J, Chen J-N, Liu S, Kortylewski A, Yang C, Bai Y, Wang C, Yuille A (2021) Transfg: a transformer architecture for fine-grained recognition. Preprint at arXiv: 2103.07976
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: transformers make strong encoders for medical image segmentation. Preprint at arXiv: 2102.04306
Xie E, Wang W, Wang W, Sun P, Xu H, Liang D, Luo P (2021) Trans2seg: Transparent object segmentation with transformer
Stewart R, Andriluka M, Ng AY (2016) End-to-end people detection in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2325–2333
Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning. In: Chapelle O et al (ed) 2006 IEEE Transactions on Neural Networks, vol 20, no 3. pp 542–542
Sajjadi M, Javanmardi M, Tasdizen T (2016) Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Adv Neural Inf Proces Syst 29
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: Beyond empirical risk minimization. Preprint at arXiv:1710.09412
Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA (2019) Mixmatch: A holistic approach to semi-supervised learning. Adv Neural Inf Proces Syst 32
Grandvalet Y, Bengio Y (2004) Semi-supervised learning by entropy minimization. Adv Neural Inf Proces Syst 17
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 7794–7803
Jie H, Li S, Gang S, Albanie S (2017) Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence PP:99
Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: International Conference on Machine Learning. PMLR, pp 4055–4064
Lüscher C, Beck E, Irie K, Kitza M, Michel W, Zeyer A, Schlüter R, Ney H (2019) Rwth ASR systems for librispeech: Hybrid vs attention–w/o data augmentation. Preprint at arXiv: 1905.03072
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint at arXiv: 1810.04805
Joachims T et al (1999) Transductive inference for text classification using support vector machines. In: ICML, vol 99. pp 200–209
Gammerman A, Vovk V, Vapnik V (2013) Learning by transduction. Morgan Kaufmann Publishers Inc.
Liu B, Wu Z, Hu H, Lin S (2019) Deep metric transfer for label propagation with limited annotated data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp 0–0
Kingma DP, Rezende DJ, Mohamed S, Welling M (2014) Semi-supervised learning with deep generative models. Adv Neural Inf Proces Syst 4:3581–3589
Pu Y, Gan Z, Henao R, Yuan X, Li C, Stevens A, Carin L (2016) Variational autoencoder for deep learning of images, labels and captions. Adv Neural Inf Proces Syst 29
Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. Preprint at arXiv: 1610.02242
Xie Q, Dai Z, Hovy E, Luong M-T, Le QV (2019) Unsupervised data augmentation for consistency training. Preprint at arXiv: 1904.12848
Liu Y, Ning Z, Örmeci N, An W, Yu Q, Han K, Huang Y, Liu D, Liu F, Li Z et al (2020) Deep convolutional neural network-aided detection of portal hypertension in patients with cirrhosis. Clin Gastroenterol Hepatol 18(13):2998–3007
Nie D, Gao Y, Wang L, Shen D (2018) ASDNET: attention based semi-supervised deep networks for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 370–378
Qiao S, Shen W, Zhang Z, Wang B, Yuille A (2018) Deep co-training for semi-supervised image recognition. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 135–152
Li X, Yu L, Chen H, Fu C-W, Xing L, Heng P-A (2020) Transformation-consistent self-ensembling model for semisupervised medical image segmentation. IEEE Transactions on Neural Networks and Learning Systems 32(2):523–534
Luo X, Liao W, Chen J, Song T, Chen Y, Zhang S, Chen N, Wang G, Zhang S (2021) Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp 318–329
Zhou Y, He X, Huang L, Liu L, Zhu F, Cui S, Shao L (2019) Collaborative learning of semi-supervised segmentation and classification for medical images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2079–2088
Chen S, Bortsova G, García-Uceda Juárez A, Tulder GV, Bruijne MD (2019) Multi-task attention-based semi-supervised learning for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp 457–465
Ganaye P-A, Sdika M, Benoit-Cattin H (2018) Semi-supervised learning for segmentation under semantic constraint. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer. pp 595–602
Huynh T, Nibali A, He Z (2022) Semi-supervised learning for medical image classification using imbalanced training data. Comput Methods Prog Biomed 106628
Wang Y, Zheng K, Cheng C-T, Zhou X-Y, Zheng Z, Xiao J, Lu L, Liao C-H, Miao S (2021)Knowledge distillation with adaptive asymmetric label sharpening for semi-supervised fracture detection in chest x-rays. In: International Conference on Information Processing in Medical Imaging. Springer, pp 599–610
Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 6154–6162
Everingham M, Eslami S, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136
Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. Adv Neural Inf Proces Syst 29
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable DETR: Deformable transformers for end-to-end object detection. Preprint at arXiv: 2010.04159
Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille AL (2018) PCL: Proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell PP:1–1
Wan F, Wei P, Jiao J, Han Z, Ye Q (2018) Min-entropy latent model for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 1297–1306
Sohn K, Zhang Z, Li C-L, Zhang H, Lee C-Y, Pfister T (2020) A simple semi-supervised learning framework for object detection. Preprint at arXiv: 2005.04757
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Q., Zhang, G., Li, K. et al. SFOD-Trans: semi-supervised fine-grained object detection framework with transformer module. Med Biol Eng Comput 60, 3555–3566 (2022). https://doi.org/10.1007/s11517-022-02682-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-022-02682-1