A Super Token Vision Transformer and CNN Parallel Branch Network for mCNV Lesion Segmentation in OCT Images

Dong, Xiang; Xie, Hai; Sun, Yunlong; Wu, Zhenquan; Yang, Bao; Qu, Junlong; Zhang, Guoming; Lei, Baiying

doi:10.1007/978-3-031-45673-2_27

Xiang Dong¹²,
Hai Xie¹²,
Yunlong Sun¹²,
Zhenquan Wu¹³,
Bao Yang¹²,
Junlong Qu¹²,
Guoming Zhang¹³ &
…
Baiying Lei¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14348))

Included in the following conference series:

International Workshop on Machine Learning in Medical Imaging

579 Accesses

Abstract

Myopic choroidal neovascularization (mCNV) is a vision-threatening complication of high myopia characterized by the growth of abnormal blood vessels in the choroid layer of the eye. In OCT images, mCNV typically presents as a highly reflective area within the subretinal layer. Therefore, accurate segmentation of mCNV in OCT images can better assist clinicians in assessing the disease status and guiding treatment decisions. However, accurate segmentation in OCT images is highly challenging due to the presence of noise interference, complex lesion areas, and low contrast. Consequently, we propose a parallel-branch network architecture that combines super token vision transformer (STViT) and CNN to more efficiently capture global dependency and low-level feature details. The super token attention mechanism (STA) in STViT reduces the number of tokens in self-attention and preserves global modeling. Additionally, we create a novel feature fusion module that utilizes depth-wise separable convolutions to efficiently fuse multi-level features from two pathways. We conduct extensive experiments on an in-house OCT dataset and a public OCT dataset, and the results demonstrate that our proposed method achieves state-of-the-art segmentation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cheung, C.M.G., et al.: Myopic choroidal neovascularization: review, guidance, and consensus statement on management. Ophthalmology 124, 1690–1711 (2017)
Article Google Scholar
Ohno-Matsui, K., Ikuno, Y., Lai, T.Y., Cheung, C.M.G.: Diagnosis and treatment guideline for myopic choroidal neovascularization due to pathologic myopia. Progr. Retinal Eye Res. 63, 92–106 (2018)
Article Google Scholar
Wilkins, G.R., Houghton, O.M., Oldenburg, A.L.: Automated segmentation of intraretinal cystoid fluid in optical coherence tomography. IEEE Trans. Biomed. Eng. 59(4), 1109–1114 (2012)
Article Google Scholar
Xiang, D., et al.: Automatic segmentation of retinal layer in OCT images with choroidal neovascularization. IEEE Trans. Image Process. 27(12), 5880–5891 (2018)
Article MathSciNet Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III, pp. 234–241. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: A nested u-net architecture for medical image segmentation. In: Stoyanov, D., et al. (ed.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Chapter Google Scholar
Oktay, O., et al.: Attention u-net: learning where to look for the pancreas (2018)
Google Scholar
Jha, D., et al.: Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia (ISM), pp. 225–2255. IEEE (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 × 16 words: transformers for image recognition at scale (2020)
Google Scholar
Gao, Y., Zhou, M., Metaxas, D.N.: UTNet: a hybrid transformer architecture for medical image segmentation. In: de Bruijne, M., et al. (ed.) MICCAI 2021. LNCS, vol. 12903, pp. 61–71. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_6
Chapter Google Scholar
Zhang, Y., Liu, H., Qiang, H.: Transfuse: fusing transformers and cnns for medical image segmentation. In: de Bruijne, M., et al. (ed.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I, pp. 14–24. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_2
Chapter Google Scholar
Azad, R., Heidari, M., Yuli, W., Merhof, D.: Contextual attention network: transformer meets u-net. In: Lian, C., Cao, X., Rekik, I., Xuanang, X., Cui, Z. (eds.) Machine Learning in Medical Imaging: 13th International Workshop, MLMI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings, pp. 377–386. Springer Nature Switzerland, Cham (2022). https://doi.org/10.1007/978-3-031-21014-3_39
Chapter Google Scholar
Huang, H., Zhou, X., Cao, J., He, R., Tan, T.: Vision transformer with super token sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22690–22699 (2023)
Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2019)
Google Scholar
Schlemper, J., et al.: Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019)
Article Google Scholar
Huang, H., Zhou, X., He, R.: Orthogonal transformer: an efficient vision transformer backbone with token orthogonalization. Adv. Neural Inf. Process. Syst. 35, 14596–14607 (2022)
Google Scholar
Jampani, V., Sun, D., Liu, M.-Y., Yang, M.-H., Kautz, J.: Superpixel sampling networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 352–368 (2018)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Google Scholar
Rashno, A., et al.: Fully automated segmentation of fluid/cyst regions in optical coherence tomography images with diabetic macular edema using neutrosophic sets and graph algorithms. IEEE Trans. Biomed. Eng. 65(5), 989–1001 (2017)
Google Scholar

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (No. U22A2024, 62106153, 82271103), Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515110605, 2022A1515012326) and Natural Science Foundation of Shenzhen (No. JCYJ20220818095809021).

Author information

Authors and Affiliations

School of Biomedical Engineering, Health Science Center, National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, Shenzhen University, Shenzhen, China
Xiang Dong, Hai Xie, Yunlong Sun, Bao Yang, Junlong Qu & Baiying Lei
Shenzhen Eye Hospital, Jinan University, Shenzhen Eye Institute, Shenzhen, Guangdong, China
Zhenquan Wu & Guoming Zhang

Authors

Xiang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Hai Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yunlong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhenquan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Bao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Junlong Qu
View author publications
You can also search for this author in PubMed Google Scholar
Guoming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Baiying Lei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Guoming Zhang or Baiying Lei .

Editor information

Editors and Affiliations

Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Xiaohuan Cao
Rensselaer Polytechnic Institute, Troy, NY, USA
Xuanang Xu
Imperial College London, London, UK
Islem Rekik
ShanghaiTech University, Shanghai, China
Zhiming Cui
Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Xi Ouyang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dong, X. et al. (2024). A Super Token Vision Transformer and CNN Parallel Branch Network for mCNV Lesion Segmentation in OCT Images. In: Cao, X., Xu, X., Rekik, I., Cui, Z., Ouyang, X. (eds) Machine Learning in Medical Imaging. MLMI 2023. Lecture Notes in Computer Science, vol 14348. Springer, Cham. https://doi.org/10.1007/978-3-031-45673-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-45673-2_27
Published: 15 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45672-5
Online ISBN: 978-3-031-45673-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)