Self Pre-training with Single-Scale Adapter for Left Atrial Segmentation

Tu, Can; Huang, Ziyan; Deng, Zhongying; Yang, Yuncheng; Ma, Chenglong; He, Junjun; Ye, Jin; Wang, Haoyu; Ding, Xiaowei

doi:10.1007/978-3-031-31778-1_3

Can Tu^11,12,
Ziyan Huang^11,12,
Zhongying Deng¹³,
Yuncheng Yang^11,12,
Chenglong Ma¹⁴,
Junjun He¹²,
Jin Ye¹²,
Haoyu Wang^11,12 &
…
Xiaowei Ding¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13586))

Included in the following conference series:

Challenge on Left Atrial and Scar Quantification and Segmentation

309 Accesses
1 Citations

Abstract

Accurate Left Atrial (LA) segmentation from Late Gadolinium Enhancement Magnetic Resonance Imaging (LGE MRI) is fundamental to the diagnosis of Atrial Fibrillation (AF). Previous approaches tended to solve this problem by refining network architecture to leverage spatial priors in medical imaging. However, the priors modeling can hardly be achieved due to low image quality and various shapes of LA. In this paper, we try to learn the priors from generation. The motivation is simple: if a model can generate or recover image content well, it possibly has learned the priors well. With the priors built in, such a model can better segment LA. Specifically, we investigate the self pre-training paradigm, i.e., models are pre-trained and fine-tuned on the same LGE-MRI dataset, based on Mask Autoencoder (MAE). In the pre-training stage, we utilize Vision Transformers (ViT) based auto-encoders to perform the pretext task of reconstructing the original MRI images from only partial patches, where the ViT encoder is encouraged to learn contextual information as priors by aggregating global information to recover the contents in masked patches. In the fine-tuning process, we further propose an single-scale adaptor for downstream task. The adapter first has different branches with different numbers of upsampling blocks to remedy the plain, non-hierarchical property of the ViT. This can better adapt ViT to dense prediction task. Then, it constructs a feature pyramid directly from the single-scale feature map of ViT using the multi-scale features from different branches. Finally, the adapter incorporates a decoder to predict the segmentation results based on the feature pyramid. The proposed model (called ViTUNet) outperforms baseline trained from scratch and widely used nnUNet model. The final trained model shows a validation score of 0.89013, 1.70567 and 17.12375 for Dice coefficient, ASD and HD metric, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://zmiclab.github.io/projects/lascarqs22/.

References

Chugh, S.S., et al.: Worldwide epidemiology of atrial fibrillation: a global burden of disease 2010 study. Circulation 129(8), 837 (2013)
Article Google Scholar
Li, L., Zimmer, V.A., Schnabel, J.A., Zhuang, X.: Atrialjsqnet: a new framework for joint segmentation and quantification of left atrium and scars incorporating spatial and shape information. Med. Image Anal. 76, 102303 (2022)
Article Google Scholar
Li, L., Zimmer, V.A., Schnabel, J.A., Zhuang, X.: Medical image analysis on left atrial lge mri for atrial fibrillation studies: a review. Med. Image Anal., 102360 (2022)
Google Scholar
Li, L., Zimmer, V.A., Schnabel, J.A., Zhuang, X.: AtrialGeneral: domain generalization for left atrial segmentation of multi-center LGE MRIs. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) MICCAI 2021. LNCS, vol. 12906, pp. 557–566. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87231-1_54
Chapter Google Scholar
Zhang, J., Xie, Y., Liao, Z., Verjans, J., Xia, Y.: EfficientSeg: a simple but efficient solution to myocardial pathology segmentation challenge. In: Zhuang, X., Li, L. (eds.) MyoPS 2020. LNCS, vol. 12554, pp. 17–25. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65651-5_2
Chapter Google Scholar
Martín-Isla, C., Asadi-Aghbolaghi, M., Gkontra, P., Campello, V.M., Escalera, S., Lekadir, K.: Stacked BCDU-net with semantic CMR synthesis: application to myocardial pathology segmentation challenge. In: Zhuang, X., Li, L. (eds.) MyoPS 2020. LNCS, vol. 12554, pp. 1–16. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65651-5_1
Chapter Google Scholar
Liu, Y., Zhang, M., Zhan, Q., Gu, D., Liu, G.: Two-stage method for segmentation of the myocardial scars and edema on multi-sequence cardiac magnetic resonance. In: Zhuang, X., Li, L. (eds.) MyoPS 2020. LNCS, vol. 12554, pp. 26–36. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65651-5_3
Chapter Google Scholar
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp. 6105–6114, PMLR (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)
Bao, H., Dong, L., Wei, F.: Beit: Bert pre-training of image transformers, arXiv preprint arXiv:2106.08254 (2021)
Xie, Z., et al.: Simmim: a simple framework for masked image modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9653–9663 (2022)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Google Scholar
Chen, Z., Duan, Y., Wang, W., He, J., Lu, T., Dai, J., Qiao, Y.: Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534 (2022)
Zhou, L., Liu, H., Bae, J., He, J., Samaras, D., Prasanna, P.: Self pre-training with masked autoencoders for medical image analysis. arXiv preprint arXiv:2203.05573 (2022)
Li, Y., Mao, H., Girshick, R., He, K.: Exploring plain vision transformer backbones for object detection. arXiv preprint arXiv:2203.16527 (2022)
Hatamizadeh, A., et al.: Unetr: transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584, (2022)
Google Scholar
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Can Tu, Ziyan Huang, Yuncheng Yang, Haoyu Wang & Xiaowei Ding
Shanghai AI Lab, Shanghai, China
Can Tu, Ziyan Huang, Yuncheng Yang, Junjun He, Jin Ye & Haoyu Wang
University of Surrey, Guildford, GU2 7XH, UK
Zhongying Deng
Fudan University, Shanghai, China
Chenglong Ma

Authors

Can Tu
View author publications
You can also search for this author in PubMed Google Scholar
Ziyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongying Deng
View author publications
You can also search for this author in PubMed Google Scholar
Yuncheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chenglong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Junjun He
View author publications
You can also search for this author in PubMed Google Scholar
Jin Ye
View author publications
You can also search for this author in PubMed Google Scholar
Haoyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaowei Ding .

Editor information

Editors and Affiliations

Fudan University, Shanghai, China
Xiahai Zhuang
University of Oxford, Oxford, UK
Lei Li
Fudan University, Shanghai, China
Sihan Wang
University of Oxford, Oxford, UK
Fuping Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tu, C. et al. (2023). Self Pre-training with Single-Scale Adapter for Left Atrial Segmentation. In: Zhuang, X., Li, L., Wang, S., Wu, F. (eds) Left Atrial and Scar Quantification and Segmentation. LAScarQS 2022. Lecture Notes in Computer Science, vol 13586. Springer, Cham. https://doi.org/10.1007/978-3-031-31778-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-31778-1_3
Published: 05 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31777-4
Online ISBN: 978-3-031-31778-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Self Pre-training with Single-Scale Adapter for Left Atrial Segmentation