Abstract
Medical image segmentation enjoys the advantage of understanding 3D contexts, but 3D networks are prone to over-fitting due to the limited amount of annotated data. This paper investigates self-supervised pre-training, i.e., making use of unlabeled medical data to initialize 3D segmentation networks. We build our system upon contrastive learning, where the dependence on positive and negative samples obstructs it from satisfying performance on medical image datasets with fewer samples. To alleviate this issue, we present a novel proxy task that takes advantage of the property of human body similarity in medical scans, and defines the sub-volumes from the same position of different cases as Semi-Positive samples. Pre-trained on a mixed dataset containing 1254 CT volumes, the proposed approach, VoxSeP, transfers well to 4 downstream datasets with 2 different backbone networks. On both fully supervised and semi-supervised fine-tuning, VoxSeP achieves favorable averaged improvements (\(2\%\) and \(4\%\)), which surpass several state-of-the-art counterparts.






Similar content being viewed by others
References
Azizi, S., Mustafa, B., Ryan, F., Beaver, Z., Freyberg, J., Deaton, J., Loh, A., Karthikesalingam, A., Kornblith, S., Chen, T., et al.: Big self-supervised models advance medical image classification. arXiv preprint arXiv:2101.05224 (2021)
Baid, U., Talbar, S., Rane, S., Gupta, S., Thakur, M.H., Moiyadi, A., Thakur, S., Mahajan, A.: Deep learning radiomics algorithm for gliomas (drag) model: a novel approach using 3d unet based deep convolutional neural network for predicting survival in gliomas. In: International MICCAI Brainlesion Workshop, pp. 369–379. Springer (2018)
Chaitanya, K., Erdil, E., Karani, N., Konukoglu, E.: Contrastive learning of global and local features for medical image segmentation with limited annotations. Adv. Neural Inform. Process. Syst. 33, 12546–12558 (2020)
Chen, L., Bentley, P., Mori, K., Misawa, K., Fujiwara, M., Rueckert, D.: Self-supervised learning for medical image analysis using image context restoration. Med. Image Anal. 58, 101539 (2019)
Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., Sutskever, I.: Generative pretraining from pixels. In: International Conference on Machine Learning, pp. 1691–1703. PMLR (2020)
Chen, S., Ma, K., Zheng, Y.: Med3d: Transfer learning for 3d medical image analysis. arXiv preprint arXiv:1904.00625 (2019)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proc. of Intl. Conf. on Machine Learning, pp. 1597–1607 (2020)
Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029 (2020)
Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. arXiv preprint arXiv:2104.14548 (2021)
Fan, D.P., Zhou, T., Ji, G.P., Zhou, Y., Chen, G., Fu, H., Shen, J., Shao, L.: Inf-net: automatic covid-19 lung infection segmentation from ct images. IEEE Trans. Med. Imaging 39(8), 2626–2637 (2020). https://doi.org/10.1109/TMI.2020.2996645
Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: Synthetic data augmentation using gan for improved liver lesion classification. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp. 289–293. IEEE (2018)
Gaur, L., Bhatia, U., Jhanjhi, N., Muhammad, G., Masud, M.: Medical image-based detection of covid-19 using deep convolution neural networks. Multimedia Syste pp 1–10 (2021)
Gibson, E., Giganti, F., Hu, Y., Bonmati, E., Bandula, S., Gurusamy, K., Davidson, B., Pereira, S.P., Clarkson, M.J., Barratt, D.C.: Automatic multi-organ segmentation on abdominal ct with dense v-networks. IEEE Trans Med Imaging 37(8), 1822–1834 (2018). https://doi.org/10.1109/TMI.2018.2806309
Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inform. Process. Syst. 33, 21271–21284 (2020)
Guo, S., Rigall, E., Qi, L., Dong, X., Li, H., Dong, J.: Graph-based cnns with self-supervised module for 3d hand pose estimation from monocular rgb. IEEE Trans. Circuits Syst. Video Technol. 31(4), 1514–1525 (2021). https://doi.org/10.1109/TCSVT.2020.3004453
Haghighi, F., Taher, M.R.H., Zhou, Z., Gotway, M.B., Liang, J.: Transferable visual words: Exploiting the semantics of anatomical patterns for self-supervised learning. IEEE Transactions on Medical Imaging pp. 1–1 (2021). https://doi.org/10.1109/TMI.2021.3060634
Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H., Xu, D.: Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. arXiv preprint arXiv:2201.01266 (2022)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Heller, N., Isensee, F., Maier-Hein, K.H., Hou, X., Xie, C., Li, F., Nan, Y., Mu, G., Lin, Z., Han, M., et al.: The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge 67, 101821 (2021)
Huo, X., Xie, L., Wei, L., Zhang, X., Li, H., Yang, Z., Zhou, W., Li, H., Tian, Q.: Heterogeneous contrastive learning: Encoding spatial information for compact visual representations. arXiv preprint arXiv:2011.09941 (2020)
Isensee, F., Petersen, J., Klein, A., Zimmerer, D., Jaeger, P.F., Kohl, S., Wasserthal, J., Koehler, G., Norajitra, T., Wirkert, S., et al.: nnu-net: Self-adapting framework for u-net-based medical image segmentation. arXiv preprint arXiv:1809.10486 (2018)
Juarez, A.G.U., Selvan, R., Saghir, Z., de Bruijne, M.: A joint 3d unet-graph neural network-based method for airway segmentation from chest cts. In: International workshop on machine learning in medical imaging, pp. 583–591. Springer (2019)
Kausar, A., Razzak, I., Shapiai, M.I., Beheshti, A.: 3d shallow deep neural network for fast and precise segmentation of left atrium. Multimedia Systems pp. 1–11 (2021)
Kayal, S., Chen, S., de Bruijne, M.: Region-of-interest guided supervoxel inpainting for self-supervision. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 500–509. Springer (2020)
Landman, B., Xu, Z., Igelsias, J., Styner, M., Langerak, T., Klein, A.: Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In: Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge, vol. 5, p.12 (2015)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Li, H., Zhang, X., Sun, R., Xiong, H., Tian, Q.: Center-wise local image mixture for contrastive representation learning. arXiv preprint arXiv:2011.02697 (2020)
Li, J., Zhao, G., Tao, Y., Zhai, P., Chen, H., He, H., Cai, T.: Multi-task contrastive learning for automatic ct and x-ray diagnosis of covid-19. Pattern Recognit. 114, 107848 (2021)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440 (2015)
Mahapatra, D., Poellinger, A., Shao, L., Reyes, M.: Interpretability-driven sample selection using self supervised learning for disease classification and segmentation. IEEE Transactions on Medical Imaging pp. 1–1 (2021). https://doi.org/10.1109/TMI.2021.3061724
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: International Conference on 3D vision (3DV), pp. 565–571. IEEE (2016)
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: European conference on computer vision, pp. 69–84. Springer (2016)
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Ouyang, C., Biffi, C., Chen, C., Kart, T., Qiu, H., Rueckert, D.: Self-supervision with superpixels: Training few-shot medical image segmentation without annotation. In: European Conference on Computer Vision, pp. 762–780. Springer (2020)
Qian, R., Meng, T., Gong, B., Yang, M.H., Wang, H., Belongie, S., Cui, Y.: Spatiotemporal contrastive video representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6964–6974 (2021)
Qiao, S., Shen, W., Zhang, Z., Wang, B., Yuille, A.: Deep co-training for semi-supervised image recognition. In: Proceedings of the european conference on computer vision (eccv), pp. 135–152 (2018)
Qureshi, K.N., Alhudhaif, A., Ali, M., Qureshi, M.A., Jeon, G.: Self-assessment and deep learning-based coronavirus detection and medical diagnosis systems for healthcare. Multimedia Systems pp. 1–10 (2021)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inform. Process. Syst. 28, 91–99 (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Proc. of Intl. Conf. on Medical Image Computing and Computer Assisted Intervention 9351, 234–241 (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Springer (2015)
Roth, H.R., Lu, L., Farag, A., Shin, H.C., Liu, J., Turkbey, E.B., Summers, R.M.: Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 556–564. Springer (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Simpson, A.L., Antonelli, M., Bakas, S., Bilello, M., Farahani, K., Van Ginneken, B., Kopp-Schneider, A., Landman, B.A., Litjens, G., Menze, B., et al.: A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063 (2019)
Taleb, A., Loetzsch, W., Danz, N., Severin, J., Gaertner, T., Bergner, B., Lippert, C.: 3d self-supervised methods for medical imaging. arXiv preprint arXiv:2006.03829 (2020)
Tang, Y., Yang, D., Li, W., Roth, H., Landman, B., Xu, D., Nath, V., Hatamizadeh, A.: Self-supervised pre-training of swin transformers for 3d medical image analysis. arXiv preprint arXiv:2111.14791 (2021)
Tao, X., Li, Y., Zhou, W., Ma, K., Zheng, Y.: Revisiting rubik’s cube: self-supervised learning with volume-wise transformation for 3d medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 238–248. Springer (2020)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803 (2018)
Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training. In: Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (2021)
Xia, Y., Liu, F., Yang, D., Cai, J., Yu, L., Zhu, Z., Xu, D., Yuille, A., Roth, H.: 3d semi-supervised learning with uncertainty-aware multi-view co-training. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3646–3655 (2020)
Xie, L., Yu, Q., Zhou, Y., Wang, Y., Fishman, E.K., Yuille, A.L.: Recurrent saliency transformation network for tiny target segmentation in abdominal ct scans. IEEE Trans. Med. Imag. 39(2), 514–525 (2020). https://doi.org/10.1109/TMI.2019.2930679
Xie, Y., Zhang, J., Liao, Z., Xia, Y., Shen, C.: Pgl: Prior-guided local self-supervised learning for 3d medical image segmentation. arXiv preprint arXiv:2011.12640 (2020)
Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., Hu, H.: Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16684–16693 (2021)
Xu, P., Song, Z., Yin, Q., Song, Y.Z., Wang, L.: Deep self-supervised representation learning for free-hand sketch. IEEE Trans. Circ. Syst. Video Technol. 31(4), 1503–1513 (2021). https://doi.org/10.1109/TCSVT.2020.3003048
Yu, Q., Yang, D., Roth, H., Bai, Y., Zhang, Y., Yuille, A.L., Xu, D.: C2fnas: Coarse-to-fine neural architecture search for 3d medical image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4126–4135 (2020)
Zhao, X., Vemulapalli, R., Mansfield, P., Gong, B., Green, B., Shapira, L., Wu, Y.: Contrastive learning for label-efficient semantic segmentation. arXiv preprint arXiv:2012.06985 (2020)
Zhou, Z., Sodha, V., Siddiquee, M.M.R., Feng, R., Tajbakhsh, N., Gotway, M.B., Liang, J.: Models genesis: Generic autodidactic models for 3d medical image analysis. In: International conference on medical image computing and computer-assisted intervention, pp. 384–393. Springer (2019)
Zhu, J., Li, Y., Hu, Y., Ma, K., Zhou, S.K., Zheng, Y.: Rubik’s cube+: a self-supervised feature learning framework for 3d medical image analysis. Med. Image Anal. 64, 101746 (2020)
Zhu, Z., Xia, Y., Shen, W., Fishman, E., Yuille, A.: A 3d coarse-to-fine framework for volumetric medical image segmentation. In: 2018 International conference on 3D vision (3DV), pp. 682–690. IEEE (2018)
Acknowledgements
The research work is supported by the National Natural Science Foundation of China (61871004) and the Project of Chinese Academy of Sciences (E141020).
Author information
Authors and Affiliations
Contributions
ZY, LX, WZ, and XH wrote the main manuscript text, prepared figures and tables. ZY, WZ conducted the fully supervised fine-tuning experiments, ZY, Xinyue Huo conducted semi-supervised fine-tuning experiments, ZY conducted ablation study and visualizations. LW helped ZY with the re-implementation of comparison methods. LX revised the manuscript. JL, QT, and ST provide guidance and instruction on the idea and methods. All authors reviewed the manuscript
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, Z., Xie, L., Zhou, W. et al. VoxSeP: semi-positive voxels assist self-supervised 3D medical segmentation. Multimedia Systems 29, 33–48 (2023). https://doi.org/10.1007/s00530-022-00977-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-022-00977-9