Skip to main content

Cross Attention Multi Scale CNN-Transformer Hybrid Encoder Is General Medical Image Learner

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14437))

Included in the following conference series:

  • 358 Accesses

Abstract

Medical image segmentation plays a crucial role in medical artificial intelligence. Recent advancements in computer vision have introduced multiscale ViT (Vision Transformer), revealing its robustness and superior feature extraction capabilities. However, the independent processing of data patches by ViT often leads to insufficient attention to fine details. In medical image segmentation tasks like organ and tumor segmentation, precise boundary delineation is of utmost importance. To address this challenge, this study proposes two novel CNN-Transformer feature fusion modules: SFM (Shallow Fusion Module) and DFM (Deep Fusion Module). These modules effectively integrate high-level and low-level semantic information from the feature pyramid while maintaining network efficiency. To expedite network convergence, the Deep Supervise method is introduced during the training phase. Additionally, extensive ablation experiments and comparative studies are conducted on well-known public datasets, namely Synapse and ACDC, to evaluate the effectiveness of the proposed approach. The experimental results not only demonstrate the efficacy of the proposed modules and training method but also showcase the superiority of our architecture compared to previous methods. The code and trained models will be available soon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This work is supported by the Natural Science Foundation of China (No. 62072388), the industry guidance project foundation of science technology bureau of Fujian province in 2020 (No. 2020H0047), and Fujian Sunshine Charity Foundation.

References

  1. Liu, Q., Kaul, C., Anagnostopoulos, C., Murray-Smith, R., Deligianni, F.: Optimizing vision transformers for medical image segmentation and few-shot domain adaptation. arXiv preprint arXiv:2210.08066 (2022)

  2. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  3. Soucy, N., Sekeh, S.Y.: CEU-Net: ensemble semantic segmentation of hyperspectral images using clustering. arXiv preprint arXiv:2203.04873 (2022)

  4. Diakogiannis, F.I., Waldner, F., Caccetta, P., Wu, C.: ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogrammetry Remote Sens. 162, 94–114 (2020)

    Article  Google Scholar 

  5. Huang, H., Tong, R., Hu, H., Zhang, Q.: UNet 3+: a full-scale connected UNet for medical image segmentation. In: International Conference on Acoustics, Speech and Signal Processing (2020)

    Google Scholar 

  6. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-Net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1

    Chapter  Google Scholar 

  7. Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. In: ICLR 2021 (2021)

    Google Scholar 

  8. Chen, J., et al.: TransUNet: transformers make strong encoders for medical image segmentation. arXiv Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  9. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (2021)

    Google Scholar 

  10. Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: International Conference on Computer Vision (2021)

    Google Scholar 

  11. Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. arXiv Image and Video Processing (2021)

    Google Scholar 

  12. Dong, B., Wang, W., Fan, D.-P., Li, J., Fu, H., Shao, L.: Polyp-PVT: polyp segmentation with pyramid vision transformers. arXiv Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  13. Li, W., Yang, H.: Collaborative transformer-CNN learning for semi-supervised medical image segmentation. In: IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2022, Las Vegas, NV, USA, 6–8 December 2022, pp. 1058–1065. IEEE (2022)

    Google Scholar 

  14. Verma, A., Qassim, H., Feinzimer, D.: Residual squeeze CNDS deep learning CNN model for very large scale places image recognition. In: 8th IEEE Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON, New York City, NY, USA, 19–21 October 2017, pp. 463–469. IEEE (2017)

    Google Scholar 

  15. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  16. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020). https://doi.org/10.1109/TPAMI.2019.2913372

    Article  Google Scholar 

  17. Landman, B., Xu, Z., Igelsias, J., Styner, M., Langerak, T., Klein, A.: MICCAI multi-atlas labeling beyond the cranial vault-workshop and challenge. In: Proceedings of the MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge, vol. 5, p. 12 (2015)

    Google Scholar 

  18. Bernard, O., Lalande, A., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018)

    Article  Google Scholar 

  19. Fu, S., et al.: Domain adaptive relational reasoning for 3D multi-organ segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 656–666. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_64

    Chapter  Google Scholar 

  20. Wang, H., et al.: Mixed transformer U-Net for medical image segmentation. arXiv preprint arXiv:2111.04734 (2022)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Junfeng Yao or Qingqi Hong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, R., Yao, J., Hong, Q., Li, X., Cao, X. (2024). Cross Attention Multi Scale CNN-Transformer Hybrid Encoder Is General Medical Image Learner. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14437. Springer, Singapore. https://doi.org/10.1007/978-981-99-8558-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8558-6_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8557-9

  • Online ISBN: 978-981-99-8558-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics