Skip to main content

A Super Token Vision Transformer and CNN Parallel Branch Network for mCNV Lesion Segmentation in OCT Images

  • Conference paper
  • First Online:
Machine Learning in Medical Imaging (MLMI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14348))

Included in the following conference series:

  • 579 Accesses

Abstract

Myopic choroidal neovascularization (mCNV) is a vision-threatening complication of high myopia characterized by the growth of abnormal blood vessels in the choroid layer of the eye. In OCT images, mCNV typically presents as a highly reflective area within the subretinal layer. Therefore, accurate segmentation of mCNV in OCT images can better assist clinicians in assessing the disease status and guiding treatment decisions. However, accurate segmentation in OCT images is highly challenging due to the presence of noise interference, complex lesion areas, and low contrast. Consequently, we propose a parallel-branch network architecture that combines super token vision transformer (STViT) and CNN to more efficiently capture global dependency and low-level feature details. The super token attention mechanism (STA) in STViT reduces the number of tokens in self-attention and preserves global modeling. Additionally, we create a novel feature fusion module that utilizes depth-wise separable convolutions to efficiently fuse multi-level features from two pathways. We conduct extensive experiments on an in-house OCT dataset and a public OCT dataset, and the results demonstrate that our proposed method achieves state-of-the-art segmentation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cheung, C.M.G., et al.: Myopic choroidal neovascularization: review, guidance, and consensus statement on management. Ophthalmology 124, 1690–1711 (2017)

    Article  Google Scholar 

  2. Ohno-Matsui, K., Ikuno, Y., Lai, T.Y., Cheung, C.M.G.: Diagnosis and treatment guideline for myopic choroidal neovascularization due to pathologic myopia. Progr. Retinal Eye Res. 63, 92–106 (2018)

    Article  Google Scholar 

  3. Wilkins, G.R., Houghton, O.M., Oldenburg, A.L.: Automated segmentation of intraretinal cystoid fluid in optical coherence tomography. IEEE Trans. Biomed. Eng. 59(4), 1109–1114 (2012)

    Article  Google Scholar 

  4. Xiang, D., et al.: Automatic segmentation of retinal layer in OCT images with choroidal neovascularization. IEEE Trans. Image Process. 27(12), 5880–5891 (2018)

    Article  MathSciNet  Google Scholar 

  5. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III, pp. 234–241. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  6. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: A nested u-net architecture for medical image segmentation. In: Stoyanov, D., et al. (ed.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1

    Chapter  Google Scholar 

  7. Oktay, O., et al.: Attention u-net: learning where to look for the pancreas (2018)

    Google Scholar 

  8. Jha, D., et al.: Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia (ISM), pp. 225–2255. IEEE (2019)

    Google Scholar 

  9. Dosovitskiy, A., et al.: An image is worth 16 × 16 words: transformers for image recognition at scale (2020)

    Google Scholar 

  10. Gao, Y., Zhou, M., Metaxas, D.N.: UTNet: a hybrid transformer architecture for medical image segmentation. In: de Bruijne, M., et al. (ed.) MICCAI 2021. LNCS, vol. 12903, pp. 61–71. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_6

    Chapter  Google Scholar 

  11. Zhang, Y., Liu, H., Qiang, H.: Transfuse: fusing transformers and cnns for medical image segmentation. In: de Bruijne, M., et al. (ed.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I, pp. 14–24. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_2

    Chapter  Google Scholar 

  12. Azad, R., Heidari, M., Yuli, W., Merhof, D.: Contextual attention network: transformer meets u-net. In: Lian, C., Cao, X., Rekik, I., Xuanang, X., Cui, Z. (eds.) Machine Learning in Medical Imaging: 13th International Workshop, MLMI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings, pp. 377–386. Springer Nature Switzerland, Cham (2022). https://doi.org/10.1007/978-3-031-21014-3_39

    Chapter  Google Scholar 

  13. Huang, H., Zhou, X., Cao, J., He, R., Tan, T.: Vision transformer with super token sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22690–22699 (2023)

    Google Scholar 

  14. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2019)

    Google Scholar 

  15. Schlemper, J., et al.: Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019)

    Article  Google Scholar 

  16. Huang, H., Zhou, X., He, R.: Orthogonal transformer: an efficient vision transformer backbone with token orthogonalization. Adv. Neural Inf. Process. Syst. 35, 14596–14607 (2022)

    Google Scholar 

  17. Jampani, V., Sun, D., Liu, M.-Y., Yang, M.-H., Kautz, J.: Superpixel sampling networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 352–368 (2018)

    Google Scholar 

  18. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

    Google Scholar 

  19. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

    Google Scholar 

  20. Rashno, A., et al.: Fully automated segmentation of fluid/cyst regions in optical coherence tomography images with diabetic macular edema using neutrosophic sets and graph algorithms. IEEE Trans. Biomed. Eng. 65(5), 989–1001 (2017)

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (No. U22A2024, 62106153, 82271103), Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515110605, 2022A1515012326) and Natural Science Foundation of Shenzhen (No. JCYJ20220818095809021).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Guoming Zhang or Baiying Lei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dong, X. et al. (2024). A Super Token Vision Transformer and CNN Parallel Branch Network for mCNV Lesion Segmentation in OCT Images. In: Cao, X., Xu, X., Rekik, I., Cui, Z., Ouyang, X. (eds) Machine Learning in Medical Imaging. MLMI 2023. Lecture Notes in Computer Science, vol 14348. Springer, Cham. https://doi.org/10.1007/978-3-031-45673-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45673-2_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45672-5

  • Online ISBN: 978-3-031-45673-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics