Skip to main content
Log in

VoxSeP: semi-positive voxels assist self-supervised 3D medical segmentation

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Medical image segmentation enjoys the advantage of understanding 3D contexts, but 3D networks are prone to over-fitting due to the limited amount of annotated data. This paper investigates self-supervised pre-training, i.e., making use of unlabeled medical data to initialize 3D segmentation networks. We build our system upon contrastive learning, where the dependence on positive and negative samples obstructs it from satisfying performance on medical image datasets with fewer samples. To alleviate this issue, we present a novel proxy task that takes advantage of the property of human body similarity in medical scans, and defines the sub-volumes from the same position of different cases as Semi-Positive samples. Pre-trained on a mixed dataset containing 1254 CT volumes, the proposed approach, VoxSeP, transfers well to 4 downstream datasets with 2 different backbone networks. On both fully supervised and semi-supervised fine-tuning, VoxSeP achieves favorable averaged improvements (\(2\%\) and \(4\%\)), which surpass several state-of-the-art counterparts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. We use VoxSeP to represent the whole self-supervised pre-training method in Tables 1 to 4, while in Table 5, it refers to the VoxSeP pretext task \({\mathcal {T}}_\mathrm {VoxSeP}\) upon semi-positive contrastive pairs.

References

  1. Azizi, S., Mustafa, B., Ryan, F., Beaver, Z., Freyberg, J., Deaton, J., Loh, A., Karthikesalingam, A., Kornblith, S., Chen, T., et al.: Big self-supervised models advance medical image classification. arXiv preprint arXiv:2101.05224 (2021)

  2. Baid, U., Talbar, S., Rane, S., Gupta, S., Thakur, M.H., Moiyadi, A., Thakur, S., Mahajan, A.: Deep learning radiomics algorithm for gliomas (drag) model: a novel approach using 3d unet based deep convolutional neural network for predicting survival in gliomas. In: International MICCAI Brainlesion Workshop, pp. 369–379. Springer (2018)

  3. Chaitanya, K., Erdil, E., Karani, N., Konukoglu, E.: Contrastive learning of global and local features for medical image segmentation with limited annotations. Adv. Neural Inform. Process. Syst. 33, 12546–12558 (2020)

    Google Scholar 

  4. Chen, L., Bentley, P., Mori, K., Misawa, K., Fujiwara, M., Rueckert, D.: Self-supervised learning for medical image analysis using image context restoration. Med. Image Anal. 58, 101539 (2019)

    Article  Google Scholar 

  5. Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., Sutskever, I.: Generative pretraining from pixels. In: International Conference on Machine Learning, pp. 1691–1703. PMLR (2020)

  6. Chen, S., Ma, K., Zheng, Y.: Med3d: Transfer learning for 3d medical image analysis. arXiv preprint arXiv:1904.00625 (2019)

  7. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proc. of Intl. Conf. on Machine Learning, pp. 1597–1607 (2020)

  8. Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029 (2020)

  9. Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. arXiv preprint arXiv:2104.14548 (2021)

  10. Fan, D.P., Zhou, T., Ji, G.P., Zhou, Y., Chen, G., Fu, H., Shen, J., Shao, L.: Inf-net: automatic covid-19 lung infection segmentation from ct images. IEEE Trans. Med. Imaging 39(8), 2626–2637 (2020). https://doi.org/10.1109/TMI.2020.2996645

    Article  Google Scholar 

  11. Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: Synthetic data augmentation using gan for improved liver lesion classification. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp. 289–293. IEEE (2018)

  12. Gaur, L., Bhatia, U., Jhanjhi, N., Muhammad, G., Masud, M.: Medical image-based detection of covid-19 using deep convolution neural networks. Multimedia Syste pp 1–10 (2021)

  13. Gibson, E., Giganti, F., Hu, Y., Bonmati, E., Bandula, S., Gurusamy, K., Davidson, B., Pereira, S.P., Clarkson, M.J., Barratt, D.C.: Automatic multi-organ segmentation on abdominal ct with dense v-networks. IEEE Trans Med Imaging 37(8), 1822–1834 (2018). https://doi.org/10.1109/TMI.2018.2806309

    Article  Google Scholar 

  14. Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inform. Process. Syst. 33, 21271–21284 (2020)

    Google Scholar 

  15. Guo, S., Rigall, E., Qi, L., Dong, X., Li, H., Dong, J.: Graph-based cnns with self-supervised module for 3d hand pose estimation from monocular rgb. IEEE Trans. Circuits Syst. Video Technol. 31(4), 1514–1525 (2021). https://doi.org/10.1109/TCSVT.2020.3004453

    Article  Google Scholar 

  16. Haghighi, F., Taher, M.R.H., Zhou, Z., Gotway, M.B., Liang, J.: Transferable visual words: Exploiting the semantics of anatomical patterns for self-supervised learning. IEEE Transactions on Medical Imaging pp. 1–1 (2021). https://doi.org/10.1109/TMI.2021.3060634

  17. Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H., Xu, D.: Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. arXiv preprint arXiv:2201.01266 (2022)

  18. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

  19. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  21. Heller, N., Isensee, F., Maier-Hein, K.H., Hou, X., Xie, C., Li, F., Nan, Y., Mu, G., Lin, Z., Han, M., et al.: The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge 67, 101821 (2021)

  22. Huo, X., Xie, L., Wei, L., Zhang, X., Li, H., Yang, Z., Zhou, W., Li, H., Tian, Q.: Heterogeneous contrastive learning: Encoding spatial information for compact visual representations. arXiv preprint arXiv:2011.09941 (2020)

  23. Isensee, F., Petersen, J., Klein, A., Zimmerer, D., Jaeger, P.F., Kohl, S., Wasserthal, J., Koehler, G., Norajitra, T., Wirkert, S., et al.: nnu-net: Self-adapting framework for u-net-based medical image segmentation. arXiv preprint arXiv:1809.10486 (2018)

  24. Juarez, A.G.U., Selvan, R., Saghir, Z., de Bruijne, M.: A joint 3d unet-graph neural network-based method for airway segmentation from chest cts. In: International workshop on machine learning in medical imaging, pp. 583–591. Springer (2019)

  25. Kausar, A., Razzak, I., Shapiai, M.I., Beheshti, A.: 3d shallow deep neural network for fast and precise segmentation of left atrium. Multimedia Systems pp. 1–11 (2021)

  26. Kayal, S., Chen, S., de Bruijne, M.: Region-of-interest guided supervoxel inpainting for self-supervision. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 500–509. Springer (2020)

  27. Landman, B., Xu, Z., Igelsias, J., Styner, M., Langerak, T., Klein, A.: Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In: Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge, vol. 5, p.12 (2015)

  28. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  29. Li, H., Zhang, X., Sun, R., Xiong, H., Tian, Q.: Center-wise local image mixture for contrastive representation learning. arXiv preprint arXiv:2011.02697 (2020)

  30. Li, J., Zhao, G., Tao, Y., Zhai, P., Chen, H., He, H., Cai, T.: Multi-task contrastive learning for automatic ct and x-ray diagnosis of covid-19. Pattern Recognit. 114, 107848 (2021)

    Article  Google Scholar 

  31. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

  32. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440 (2015)

  33. Mahapatra, D., Poellinger, A., Shao, L., Reyes, M.: Interpretability-driven sample selection using self supervised learning for disease classification and segmentation. IEEE Transactions on Medical Imaging pp. 1–1 (2021). https://doi.org/10.1109/TMI.2021.3061724

  34. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: International Conference on 3D vision (3DV), pp. 565–571. IEEE (2016)

  35. Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: European conference on computer vision, pp. 69–84. Springer (2016)

  36. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  37. Ouyang, C., Biffi, C., Chen, C., Kart, T., Qiu, H., Rueckert, D.: Self-supervision with superpixels: Training few-shot medical image segmentation without annotation. In: European Conference on Computer Vision, pp. 762–780. Springer (2020)

  38. Qian, R., Meng, T., Gong, B., Yang, M.H., Wang, H., Belongie, S., Cui, Y.: Spatiotemporal contrastive video representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6964–6974 (2021)

  39. Qiao, S., Shen, W., Zhang, Z., Wang, B., Yuille, A.: Deep co-training for semi-supervised image recognition. In: Proceedings of the european conference on computer vision (eccv), pp. 135–152 (2018)

  40. Qureshi, K.N., Alhudhaif, A., Ali, M., Qureshi, M.A., Jeon, G.: Self-assessment and deep learning-based coronavirus detection and medical diagnosis systems for healthcare. Multimedia Systems pp. 1–10 (2021)

  41. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inform. Process. Syst. 28, 91–99 (2015)

    Google Scholar 

  42. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Proc. of Intl. Conf. on Medical Image Computing and Computer Assisted Intervention 9351, 234–241 (2015)

    Google Scholar 

  43. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Springer (2015)

  44. Roth, H.R., Lu, L., Farag, A., Shin, H.C., Liu, J., Turkbey, E.B., Summers, R.M.: Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 556–564. Springer (2015)

  45. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  46. Simpson, A.L., Antonelli, M., Bakas, S., Bilello, M., Farahani, K., Van Ginneken, B., Kopp-Schneider, A., Landman, B.A., Litjens, G., Menze, B., et al.: A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063 (2019)

  47. Taleb, A., Loetzsch, W., Danz, N., Severin, J., Gaertner, T., Bergner, B., Lippert, C.: 3d self-supervised methods for medical imaging. arXiv preprint arXiv:2006.03829 (2020)

  48. Tang, Y., Yang, D., Li, W., Roth, H., Landman, B., Xu, D., Nath, V., Hatamizadeh, A.: Self-supervised pre-training of swin transformers for 3d medical image analysis. arXiv preprint arXiv:2111.14791 (2021)

  49. Tao, X., Li, Y., Zhou, W., Ma, K., Zheng, Y.: Revisiting rubik’s cube: self-supervised learning with volume-wise transformation for 3d medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 238–248. Springer (2020)

  50. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803 (2018)

  51. Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training. In: Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (2021)

  52. Xia, Y., Liu, F., Yang, D., Cai, J., Yu, L., Zhu, Z., Xu, D., Yuille, A., Roth, H.: 3d semi-supervised learning with uncertainty-aware multi-view co-training. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3646–3655 (2020)

  53. Xie, L., Yu, Q., Zhou, Y., Wang, Y., Fishman, E.K., Yuille, A.L.: Recurrent saliency transformation network for tiny target segmentation in abdominal ct scans. IEEE Trans. Med. Imag. 39(2), 514–525 (2020). https://doi.org/10.1109/TMI.2019.2930679

    Article  Google Scholar 

  54. Xie, Y., Zhang, J., Liao, Z., Xia, Y., Shen, C.: Pgl: Prior-guided local self-supervised learning for 3d medical image segmentation. arXiv preprint arXiv:2011.12640 (2020)

  55. Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., Hu, H.: Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16684–16693 (2021)

  56. Xu, P., Song, Z., Yin, Q., Song, Y.Z., Wang, L.: Deep self-supervised representation learning for free-hand sketch. IEEE Trans. Circ. Syst. Video Technol. 31(4), 1503–1513 (2021). https://doi.org/10.1109/TCSVT.2020.3003048

    Article  Google Scholar 

  57. Yu, Q., Yang, D., Roth, H., Bai, Y., Zhang, Y., Yuille, A.L., Xu, D.: C2fnas: Coarse-to-fine neural architecture search for 3d medical image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4126–4135 (2020)

  58. Zhao, X., Vemulapalli, R., Mansfield, P., Gong, B., Green, B., Shapira, L., Wu, Y.: Contrastive learning for label-efficient semantic segmentation. arXiv preprint arXiv:2012.06985 (2020)

  59. Zhou, Z., Sodha, V., Siddiquee, M.M.R., Feng, R., Tajbakhsh, N., Gotway, M.B., Liang, J.: Models genesis: Generic autodidactic models for 3d medical image analysis. In: International conference on medical image computing and computer-assisted intervention, pp. 384–393. Springer (2019)

  60. Zhu, J., Li, Y., Hu, Y., Ma, K., Zhou, S.K., Zheng, Y.: Rubik’s cube+: a self-supervised feature learning framework for 3d medical image analysis. Med. Image Anal. 64, 101746 (2020)

    Article  Google Scholar 

  61. Zhu, Z., Xia, Y., Shen, W., Fishman, E., Yuille, A.: A 3d coarse-to-fine framework for volumetric medical image segmentation. In: 2018 International conference on 3D vision (3DV), pp. 682–690. IEEE (2018)

Download references

Acknowledgements

The research work is supported by ​the National Natural Science Foundation of China (61871004) and the Project of Chinese Academy of Sciences (E141020).

Author information

Authors and Affiliations

Authors

Contributions

ZY, LX, WZ, and XH wrote the main manuscript text, prepared figures and tables. ZY, WZ conducted the fully supervised fine-tuning experiments, ZY, Xinyue Huo conducted semi-supervised fine-tuning experiments, ZY conducted ablation study and visualizations. LW helped ZY with the re-implementation of comparison methods. LX revised the manuscript. JL, QT, and ST provide guidance and instruction on the idea and methods. All authors reviewed the manuscript

Corresponding author

Correspondence to Sheng Tang.

Ethics declarations

Conflict of interest

The authors declare no competing interests

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Z., Xie, L., Zhou, W. et al. VoxSeP: semi-positive voxels assist self-supervised 3D medical segmentation. Multimedia Systems 29, 33–48 (2023). https://doi.org/10.1007/s00530-022-00977-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-022-00977-9

Keywords

Navigation