Skip to main content
Log in

Combining frequency transformer and CNNs for medical image segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Image segmentation is one of the most challenging and difficult tasks in digital image processing. It has many medical applications such as cancerous tumors segmentation, organ segmentation, or abnormalities segmentation. Recent techniques combining convolution-based models and transformers are proposed for automatic medical segmentation tasks. These techniques achieve good results but require much time and resources. In this paper, we propose a new model to segment medical images which combines CNNs and frequency transformers in a parallel way to minimize the number of parameters and to reduce computation time. This work presents a powerful model, composed of two main branches, able to learn global-local feature interactions which are currently in a medical image. The first branch based on Frequency Transformer (FT) employs Fourier Transform instead of multi-head attention to capture global dependencies. While a no-deeper convolutional neural network (CNN) is employed to get rich local information. With a small number of parameters, the proposed model was tested on many public medical image databases and achieves state-of-the-art results for lesion/tumor segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

All data are available on their link.

References

  1. Al-Masni MA, Al-Antari MA, Choi M-T, Han S-M, Kim T-S (2018) Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks. Comput Methods Prog Biomed 162:221–231

    Article  Google Scholar 

  2. Alom, MZ, Hasan, M, Yakopcic, C, Taha, TM, Asari, VK (2018) Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv preprint arXiv:1802.06955

  3. An, F-P, Liu, Z-W (2019) Medical image segmentation algorithm based on feedback mechanism cnn. Contrast Media & Molecular Imaging 2019

  4. Bahdanau, D, Cho, K, Bengio, Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  5. Bernal J, Sánchez FJ, Fernández-Esparrach G, Gil D, Rodríguez C, Vilariño F (2015) Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Comput Med Imaging Graph 43:99–111

    Article  PubMed  Google Scholar 

  6. Bi L, Kim J, Ahn E, Kumar A, Feng D, Fulham M (2019) Step-wise integration of deep class-specific learning for dermoscopic image segmentation. Pattern Recog 85:78–89

    Article  ADS  Google Scholar 

  7. Cao, H, Wang, Y, Chen, J, Jiang, D, Zhang, X, Tian, Q, Wang, M (2021) Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537

  8. Chen, L-C, Papandreou, G, Schroff, F, Adam, H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587

  9. Chen, J, Lu, Y, Yu, Q, Luo, X, Adeli, E, Wang, Y, Lu, L, Yuille, AL, Zhou, Y (2021) Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306

  10. Chu, JL, Krzyżak, A (2014) Analysis of feature maps selection in supervised learning using convolutional neural networks. In: Advances in artificial intelligence: 27th Canadian conference on artificial intelligence, Canadian AI 2014, Montréal, QC, Canada, May 6-9, 2014. Proceedings 27, pp 59–70. Springer

  11. Codella, NC, Gutman, D, Celebi, ME, Helba, B, Marchetti, MA, Dusza, SW, Kalloo, A, Liopyris, K, Mishra, N, Kittler, H, et al (2018) Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi):hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018) pp 168–172. IEEE

  12. Dai Y, Gao Y, Liu F (2021) Transmed: Transformers advance multi-modal medical image classification. Diagnostics 11(8):1384

    Article  PubMed  PubMed Central  Google Scholar 

  13. Deng, J, Dong, W, Socher, R, Li, L-J, Li, K, Fei-Fei, L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. Ieee

  14. Dosovitskiy, A, Beyer, L, Kolesnikov, A, Weissenborn, D, Zhai, X, Unterthiner, T, Dehghani, M, Minderer, M, Heigold, G, Gelly, S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  15. Fan, D-P, Ji, G-P, Zhou, T, Chen, G, Fu, H, Shen, J, Shao, L (2020) Pranet: Parallel reverse attention network for polyp segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 263–273. Springer

  16. Fan, H, Xiong, B, Mangalam, K, Li, Y, Yan, Z, Malik, J, Feichtenhofer, C (2021) Multiscale vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV) pp 6824–6835

  17. Hara, K, Kataoka, H, Satoh, Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6546–6555

  18. Hatamizadeh, A, Tang, Y, Nath, V, Yang, D, Myronenko, A, Landman, B, Roth, HR, Xu, D (2022) Unetr: Transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 574–584

  19. He, K, Zhang, X, Ren, S, Sun, J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  20. Heckbert P (1995) Fourier transforms and the fast fourier transform (fft) algorithm. Comput Graph 2:15–463

    Google Scholar 

  21. Hesamian MH, Jia W, He X, Kennedy P (2019) Deep learning techniques for medical image segmentation: achievements and challenges. J Digit Imaging 32(4):582–596

    Article  PubMed  PubMed Central  Google Scholar 

  22. Huang, C-H, Wu, H-Y, Lin, Y-L (2021) Hardnet-mseg: a simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 fps. arXiv preprint arXiv:2101.07172

  23. Huang, XS, Perez, F, Ba, J, Volkovs, M (2020) Improving transformer optimization through better initialization. In: International conference on machine learning, pp 4475–4483. PMLR

  24. Huang, H, Lin, L, Tong, R, Hu, H, Zhang, Q, Iwamoto, Y, Han, X, Chen, Y-W, Wu, J (2020) Unet 3+: A full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) pp 1055–1059. IEEE

  25. Isensee, F, Jäger, PF, Kohl, SA, Petersen, J, Maier-Hein, KH (2019) Automated design of deep learning methods for biomedical image segmentation. arXiv preprint arXiv:1904.08128

  26. Jena B, Jain S, Nayak GK, Saxena S (2023) Analysis of depth variation of u-net architecture for brain tumor segmentation. Multimedia Tools and Applications 82(7):10723–10743

    Article  Google Scholar 

  27. Jha, D, Riegler, MA, Johansen, D, Halvorsen, P, Johansen, HD (2020) Doubleu-net: A deep convolutional neural network for medical image segmentation. In: 2020 IEEE 33rd International symposium on computer-based medical systems (CBMS) pp 58–564. IEEE

  28. Jha, D, Smedsrud, PH, Riegler, MA, Halvorsen, P, Lange, Td, Johansen, D, Johansen, HD (2020) Kvasir-seg: A segmented polyp dataset. In: International conference on multimedia modeling, pp 451–462. Springer

  29. Juneja P, Kashyap R (2016) Energy based methods for medical image segmentation. Int J Comput Appl 146(6):22–27

    Google Scholar 

  30. Kayalibay, B, Jensen, G, van der Smagt, P (2017) Cnn-based segmentation of medical imaging data. arXiv preprint arXiv:1701.03056

  31. Lee-Thorp, J, Ainslie, J, Eckstein, I, Ontanon, S (2021) Fnet: Mixing tokens with fourier transforms. arXiv preprint arXiv:2105.03824

  32. Li, Y, Wang, Z, Yin, L, Zhu, Z, Qi, G, Liu, Y (2021) X-net: a dual encoding–decoding method in medical image segmentation. The Visual Computer, pp 1–11

  33. Li X, Chen H, Qi X, Dou Q, Fu C-W, Heng P-A (2018) H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Trans Med Imaging 37(12):2663–2674

    Article  PubMed  Google Scholar 

  34. Li H, He X, Zhou F, Yu Z, Ni D, Chen S, Wang T, Lei B (2018) Dense deconvolutional network for skin lesion segmentation. IEEE Journal of Biomedical and Health Informatics 23(2):527–537

    Article  PubMed  Google Scholar 

  35. Lin, A, Chen, B, Xu, J, Zhang, Z, Lu, G (2021) Ds-transunet: Dual swin transformer u-net for medical image segmentation. arXiv preprint arXiv:2106.06716

  36. Liu, Z, Lin, Y, Cao, Y, Hu, H, Wei, Y, Zhang, Z, Lin, S, Guo, B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 10012–10022

  37. Liu, W, Tian, T, Xu, W, Yang, H, Pan, X, Yan, S, Wang, L (2022) Phtrans: Parallelly aggregating global and local representations for medical image segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V, pp 235–244. Springer

  38. Luo, H, Changdong, Y, Selvan, R (2022) Hybrid ladder transformers with efficient parallel-cross attention for medical image segmentation. In: International conference on medical imaging with deep learning, pp 808–819. PMLR

  39. Masulli F, Schenone A (1999) A fuzzy clustering based segmentation system as support to diagnosis in medical imaging. Artif Intell Med 16(2):129–147

    Article  CAS  PubMed  Google Scholar 

  40. Nasreen G, Haneef K, Tamoor M, Irshad A (2023) A comparative study of state-of-the-art skin image segmentation techniques with cnn. Multimedia Tools and Applications 82(7):10921–10942

    Article  Google Scholar 

  41. Paszke, A, Gross, S, Massa, F, Lerer, A, Bradbury, J, Chanan, G, Killeen, T, Lin, Z, Gimelshein, N, Antiga, L, et al (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32

  42. Patil DD, Deore SG (2013) Medical image segmentation: a review. International Journal of Computer Science and Mobile Computing 2(1):22–27

    Google Scholar 

  43. Rao, Y, Zhao, W, Zhu, Z, Lu, J, Zhou, J (2021) Global filter networks for image classification. Advances in Neural Information Processing Systems 34

  44. Rezatofighi, H, Tsoi, N, Gwak, J, Sadeghian, A, Reid, I, Savarese, S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 658–666

  45. Ronneberger, O, Fischer, P, Brox, T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241. Springer

  46. Sarker, M, Kamal, M, Rashwan, HA, Akram, F, Banu, SF, Saleh, A, Singh, VK, Chowdhury, FU, Abdulwahab, S, Romani, S, et al (2018) Slsdeep: Skin lesion segmentation based on dilated residual and pyramid pooling networks. In: International conference on medical image computing and computer-assisted intervention, pp 21–29. Springer

  47. Schlemper, J, Oktay, O, Schaap, M, Heinrich, M, Kainz, B, Glocker, B, Rueckert, D (2019) Attention gated networks: Learning to leverage salient regions in medical images.Med Image Anal 53:197–207

  48. Shamir, RR, Duchin, Y, Kim, J, Sapiro, G, Harel, N (2019) Continuous dice coefficient: a method for evaluating probabilistic segmentations. arXiv preprint arXiv:1906.11031

  49. Sharma N, Aggarwal LM (2010) Automated medical image segmentation techniques. Journal of Medical Physics/Association of Medical Physicists of India 35(1):3

    PubMed Central  Google Scholar 

  50. Silva J, Histace A, Romain O, Dray X, Granado B (2014) Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. Int J CARS 9(2):283–293

    Article  Google Scholar 

  51. Srivastava, RK, Greff, K, Schmidhuber, J (2015) Highway networks. arXiv preprint arXiv:1505.00387

  52. Sun, Q, Fang, N, Liu, Z, Zhao, L, Wen, Y, Lin, H, et al (2021) Hybridctrm: Bridging cnn and transformer for multimodal brain image segmentation. Journal of Healthcare Engineering 2021

  53. Tajbakhsh N, Gurudu SR, Liang J (2015) Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans Med Imaging 35(2):630–644

    Article  PubMed  Google Scholar 

  54. Taud, H, Mas, J (2018) Multilayer perceptron (mlp). Geomatic approaches for modeling land change scenarios, pp 451–455

  55. Tomar, NK, Jha, D, Riegler, MA, Johansen, HD, Johansen, D, Rittscher, J, Halvorsen, P, Ali, S (2022) Fanet: A feedback attention network for improved biomedical image segmentation. IEEE Transactions on Neural Networks and Learning Systems

  56. Touvron, H, Cord, M, Douze, M, Massa, F, Sablayrolles, A, Jégou, H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, pp 10347–10357. PMLR

  57. Valanarasu, JMJ, Sindagi, VA, Hacihaliloglu, I, Patel, VM (2020) Kiu-net: Towards accurate segmentation of biomedical images using over-complete representations. In: Medical image computing and computer assisted intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part IV 23, pp 363–373. Springer

  58. Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, Kaiser, Ł, Polosukhin, I (2017) Attention is all you need. Advances in Neural Information Processing Systems 30

  59. Vázquez, D, Bernal, J, Sánchez, F.J, Fernández-Esparrach, G, López, A.M, Romero, A, Drozdzal, M, Courville, A (2017) A benchmark for endoluminal scene segmentation of colonoscopy images. Journal of Healthcare Engineering 2017

  60. Wang, L, Fang, S, Zhang, C, Li, R, Duan, C (2021) Efficient hybrid transformer: Learning global-local context for urban scene segmentation. arXiv preprint arXiv:2109.08937

  61. Wang, X, Girshick, R, Gupta, A, He, K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  62. Weng Y, Zhou T, Li Y, Qiu X (2019) Nas-unet: Neural architecture search for medical image segmentation. IEEE Access 7:44247–44257

    Article  Google Scholar 

  63. Winograd S (1976) On computing the discrete fourier transform. Proc Natl Acad Sci 73(4):1005–1006

    Article  ADS  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  64. Xiao, X, Lian, S, Luo, Z, Li, S (2018) Weighted res-unet for high-quality retina vessel segmentation. In: 2018 9th International conference on information technology in medicine and education (ITME) pp 327–331. IEEE

  65. Xu, S, Quan, H (2021) Ect-nas: Searching efficient cnn-transformers architecture for medical image segmentation. In: 2021 IEEE international conference on bioinformatics and biomedicine (BIBM) pp 1601–1604 (2021). https://doi.org/10.1109/BIBM52615.2021.9669734

  66. Zhang, Y, Liu, H, Hu, Q (2021) Transfuse: Fusing transformers and cnns for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 14–24 . Springer

  67. Zheng, S, Lu, J, Zhao, H, Zhu, X, Luo, Z, Wang, Y, Fu, Y, Feng, J, Xiang, T, Torr, P.H, et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890

  68. Zhou, H-Y, Guo, J, Zhang, Y, Yu, L, Wang, L, Yu, Y (2021) nnformer: Interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201

  69. Zhou, Z, Rahman Siddiquee, MM, Tajbakhsh, N, Liang, J (2018) Unet++: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th international workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, pp 3–11. Springer

  70. Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2019) Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging 39(6):1856–1867

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Ismayl Labbihi: Conceptualization Data curation; Formal analysis; Investigation; Methodology; Resources; Software; Validation; Visualization; Writing - original draft. Othmane El Meslouhi: Conceptualization; Formal analysis; Investigation; Methodology; Resources; Validation; supervision; Writing - review and editing. Mohamed Benaddy: Conceptualization; Formal analysis; Investigation; Methodology; Validation; supervision; Writing - review and editing. Mustapha Kardouchi: Project administration; Supervision; Validation: Visualization; Writing - review and editing; Resources;Data curation. Moulay Akhloufi: Project administration; Supervision; Validation; Visualization; Resources; Data curation.

Corresponding author

Correspondence to Ismayl Labbihi.

Ethics declarations

Conflicts of interest

The responsibility for the content of this article rests with its authors. Also, the authors report that is no conflict of interest between the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Othmane El Meslouhi, Mohamed Benaddy, Mustapha Kardouchi and Moulay Akhloufi contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Labbihi, I., El Meslouhi, O., Benaddy, M. et al. Combining frequency transformer and CNNs for medical image segmentation. Multimed Tools Appl 83, 21197–21212 (2024). https://doi.org/10.1007/s11042-023-16279-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16279-9

Keywords

Navigation