Abstract
Image segmentation is one of the most challenging and difficult tasks in digital image processing. It has many medical applications such as cancerous tumors segmentation, organ segmentation, or abnormalities segmentation. Recent techniques combining convolution-based models and transformers are proposed for automatic medical segmentation tasks. These techniques achieve good results but require much time and resources. In this paper, we propose a new model to segment medical images which combines CNNs and frequency transformers in a parallel way to minimize the number of parameters and to reduce computation time. This work presents a powerful model, composed of two main branches, able to learn global-local feature interactions which are currently in a medical image. The first branch based on Frequency Transformer (FT) employs Fourier Transform instead of multi-head attention to capture global dependencies. While a no-deeper convolutional neural network (CNN) is employed to get rich local information. With a small number of parameters, the proposed model was tested on many public medical image databases and achieves state-of-the-art results for lesion/tumor segmentation tasks.
Similar content being viewed by others
Data Availability
All data are available on their link.
References
Al-Masni MA, Al-Antari MA, Choi M-T, Han S-M, Kim T-S (2018) Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks. Comput Methods Prog Biomed 162:221–231
Alom, MZ, Hasan, M, Yakopcic, C, Taha, TM, Asari, VK (2018) Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv preprint arXiv:1802.06955
An, F-P, Liu, Z-W (2019) Medical image segmentation algorithm based on feedback mechanism cnn. Contrast Media & Molecular Imaging 2019
Bahdanau, D, Cho, K, Bengio, Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Bernal J, Sánchez FJ, Fernández-Esparrach G, Gil D, Rodríguez C, Vilariño F (2015) Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Comput Med Imaging Graph 43:99–111
Bi L, Kim J, Ahn E, Kumar A, Feng D, Fulham M (2019) Step-wise integration of deep class-specific learning for dermoscopic image segmentation. Pattern Recog 85:78–89
Cao, H, Wang, Y, Chen, J, Jiang, D, Zhang, X, Tian, Q, Wang, M (2021) Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537
Chen, L-C, Papandreou, G, Schroff, F, Adam, H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Chen, J, Lu, Y, Yu, Q, Luo, X, Adeli, E, Wang, Y, Lu, L, Yuille, AL, Zhou, Y (2021) Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306
Chu, JL, Krzyżak, A (2014) Analysis of feature maps selection in supervised learning using convolutional neural networks. In: Advances in artificial intelligence: 27th Canadian conference on artificial intelligence, Canadian AI 2014, Montréal, QC, Canada, May 6-9, 2014. Proceedings 27, pp 59–70. Springer
Codella, NC, Gutman, D, Celebi, ME, Helba, B, Marchetti, MA, Dusza, SW, Kalloo, A, Liopyris, K, Mishra, N, Kittler, H, et al (2018) Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi):hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018) pp 168–172. IEEE
Dai Y, Gao Y, Liu F (2021) Transmed: Transformers advance multi-modal medical image classification. Diagnostics 11(8):1384
Deng, J, Dong, W, Socher, R, Li, L-J, Li, K, Fei-Fei, L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. Ieee
Dosovitskiy, A, Beyer, L, Kolesnikov, A, Weissenborn, D, Zhai, X, Unterthiner, T, Dehghani, M, Minderer, M, Heigold, G, Gelly, S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Fan, D-P, Ji, G-P, Zhou, T, Chen, G, Fu, H, Shen, J, Shao, L (2020) Pranet: Parallel reverse attention network for polyp segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 263–273. Springer
Fan, H, Xiong, B, Mangalam, K, Li, Y, Yan, Z, Malik, J, Feichtenhofer, C (2021) Multiscale vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV) pp 6824–6835
Hara, K, Kataoka, H, Satoh, Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6546–6555
Hatamizadeh, A, Tang, Y, Nath, V, Yang, D, Myronenko, A, Landman, B, Roth, HR, Xu, D (2022) Unetr: Transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 574–584
He, K, Zhang, X, Ren, S, Sun, J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Heckbert P (1995) Fourier transforms and the fast fourier transform (fft) algorithm. Comput Graph 2:15–463
Hesamian MH, Jia W, He X, Kennedy P (2019) Deep learning techniques for medical image segmentation: achievements and challenges. J Digit Imaging 32(4):582–596
Huang, C-H, Wu, H-Y, Lin, Y-L (2021) Hardnet-mseg: a simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 fps. arXiv preprint arXiv:2101.07172
Huang, XS, Perez, F, Ba, J, Volkovs, M (2020) Improving transformer optimization through better initialization. In: International conference on machine learning, pp 4475–4483. PMLR
Huang, H, Lin, L, Tong, R, Hu, H, Zhang, Q, Iwamoto, Y, Han, X, Chen, Y-W, Wu, J (2020) Unet 3+: A full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) pp 1055–1059. IEEE
Isensee, F, Jäger, PF, Kohl, SA, Petersen, J, Maier-Hein, KH (2019) Automated design of deep learning methods for biomedical image segmentation. arXiv preprint arXiv:1904.08128
Jena B, Jain S, Nayak GK, Saxena S (2023) Analysis of depth variation of u-net architecture for brain tumor segmentation. Multimedia Tools and Applications 82(7):10723–10743
Jha, D, Riegler, MA, Johansen, D, Halvorsen, P, Johansen, HD (2020) Doubleu-net: A deep convolutional neural network for medical image segmentation. In: 2020 IEEE 33rd International symposium on computer-based medical systems (CBMS) pp 58–564. IEEE
Jha, D, Smedsrud, PH, Riegler, MA, Halvorsen, P, Lange, Td, Johansen, D, Johansen, HD (2020) Kvasir-seg: A segmented polyp dataset. In: International conference on multimedia modeling, pp 451–462. Springer
Juneja P, Kashyap R (2016) Energy based methods for medical image segmentation. Int J Comput Appl 146(6):22–27
Kayalibay, B, Jensen, G, van der Smagt, P (2017) Cnn-based segmentation of medical imaging data. arXiv preprint arXiv:1701.03056
Lee-Thorp, J, Ainslie, J, Eckstein, I, Ontanon, S (2021) Fnet: Mixing tokens with fourier transforms. arXiv preprint arXiv:2105.03824
Li, Y, Wang, Z, Yin, L, Zhu, Z, Qi, G, Liu, Y (2021) X-net: a dual encoding–decoding method in medical image segmentation. The Visual Computer, pp 1–11
Li X, Chen H, Qi X, Dou Q, Fu C-W, Heng P-A (2018) H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Trans Med Imaging 37(12):2663–2674
Li H, He X, Zhou F, Yu Z, Ni D, Chen S, Wang T, Lei B (2018) Dense deconvolutional network for skin lesion segmentation. IEEE Journal of Biomedical and Health Informatics 23(2):527–537
Lin, A, Chen, B, Xu, J, Zhang, Z, Lu, G (2021) Ds-transunet: Dual swin transformer u-net for medical image segmentation. arXiv preprint arXiv:2106.06716
Liu, Z, Lin, Y, Cao, Y, Hu, H, Wei, Y, Zhang, Z, Lin, S, Guo, B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 10012–10022
Liu, W, Tian, T, Xu, W, Yang, H, Pan, X, Yan, S, Wang, L (2022) Phtrans: Parallelly aggregating global and local representations for medical image segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V, pp 235–244. Springer
Luo, H, Changdong, Y, Selvan, R (2022) Hybrid ladder transformers with efficient parallel-cross attention for medical image segmentation. In: International conference on medical imaging with deep learning, pp 808–819. PMLR
Masulli F, Schenone A (1999) A fuzzy clustering based segmentation system as support to diagnosis in medical imaging. Artif Intell Med 16(2):129–147
Nasreen G, Haneef K, Tamoor M, Irshad A (2023) A comparative study of state-of-the-art skin image segmentation techniques with cnn. Multimedia Tools and Applications 82(7):10921–10942
Paszke, A, Gross, S, Massa, F, Lerer, A, Bradbury, J, Chanan, G, Killeen, T, Lin, Z, Gimelshein, N, Antiga, L, et al (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32
Patil DD, Deore SG (2013) Medical image segmentation: a review. International Journal of Computer Science and Mobile Computing 2(1):22–27
Rao, Y, Zhao, W, Zhu, Z, Lu, J, Zhou, J (2021) Global filter networks for image classification. Advances in Neural Information Processing Systems 34
Rezatofighi, H, Tsoi, N, Gwak, J, Sadeghian, A, Reid, I, Savarese, S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 658–666
Ronneberger, O, Fischer, P, Brox, T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241. Springer
Sarker, M, Kamal, M, Rashwan, HA, Akram, F, Banu, SF, Saleh, A, Singh, VK, Chowdhury, FU, Abdulwahab, S, Romani, S, et al (2018) Slsdeep: Skin lesion segmentation based on dilated residual and pyramid pooling networks. In: International conference on medical image computing and computer-assisted intervention, pp 21–29. Springer
Schlemper, J, Oktay, O, Schaap, M, Heinrich, M, Kainz, B, Glocker, B, Rueckert, D (2019) Attention gated networks: Learning to leverage salient regions in medical images.Med Image Anal 53:197–207
Shamir, RR, Duchin, Y, Kim, J, Sapiro, G, Harel, N (2019) Continuous dice coefficient: a method for evaluating probabilistic segmentations. arXiv preprint arXiv:1906.11031
Sharma N, Aggarwal LM (2010) Automated medical image segmentation techniques. Journal of Medical Physics/Association of Medical Physicists of India 35(1):3
Silva J, Histace A, Romain O, Dray X, Granado B (2014) Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. Int J CARS 9(2):283–293
Srivastava, RK, Greff, K, Schmidhuber, J (2015) Highway networks. arXiv preprint arXiv:1505.00387
Sun, Q, Fang, N, Liu, Z, Zhao, L, Wen, Y, Lin, H, et al (2021) Hybridctrm: Bridging cnn and transformer for multimodal brain image segmentation. Journal of Healthcare Engineering 2021
Tajbakhsh N, Gurudu SR, Liang J (2015) Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans Med Imaging 35(2):630–644
Taud, H, Mas, J (2018) Multilayer perceptron (mlp). Geomatic approaches for modeling land change scenarios, pp 451–455
Tomar, NK, Jha, D, Riegler, MA, Johansen, HD, Johansen, D, Rittscher, J, Halvorsen, P, Ali, S (2022) Fanet: A feedback attention network for improved biomedical image segmentation. IEEE Transactions on Neural Networks and Learning Systems
Touvron, H, Cord, M, Douze, M, Massa, F, Sablayrolles, A, Jégou, H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, pp 10347–10357. PMLR
Valanarasu, JMJ, Sindagi, VA, Hacihaliloglu, I, Patel, VM (2020) Kiu-net: Towards accurate segmentation of biomedical images using over-complete representations. In: Medical image computing and computer assisted intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part IV 23, pp 363–373. Springer
Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, Kaiser, Ł, Polosukhin, I (2017) Attention is all you need. Advances in Neural Information Processing Systems 30
Vázquez, D, Bernal, J, Sánchez, F.J, Fernández-Esparrach, G, López, A.M, Romero, A, Drozdzal, M, Courville, A (2017) A benchmark for endoluminal scene segmentation of colonoscopy images. Journal of Healthcare Engineering 2017
Wang, L, Fang, S, Zhang, C, Li, R, Duan, C (2021) Efficient hybrid transformer: Learning global-local context for urban scene segmentation. arXiv preprint arXiv:2109.08937
Wang, X, Girshick, R, Gupta, A, He, K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Weng Y, Zhou T, Li Y, Qiu X (2019) Nas-unet: Neural architecture search for medical image segmentation. IEEE Access 7:44247–44257
Winograd S (1976) On computing the discrete fourier transform. Proc Natl Acad Sci 73(4):1005–1006
Xiao, X, Lian, S, Luo, Z, Li, S (2018) Weighted res-unet for high-quality retina vessel segmentation. In: 2018 9th International conference on information technology in medicine and education (ITME) pp 327–331. IEEE
Xu, S, Quan, H (2021) Ect-nas: Searching efficient cnn-transformers architecture for medical image segmentation. In: 2021 IEEE international conference on bioinformatics and biomedicine (BIBM) pp 1601–1604 (2021). https://doi.org/10.1109/BIBM52615.2021.9669734
Zhang, Y, Liu, H, Hu, Q (2021) Transfuse: Fusing transformers and cnns for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 14–24 . Springer
Zheng, S, Lu, J, Zhao, H, Zhu, X, Luo, Z, Wang, Y, Fu, Y, Feng, J, Xiang, T, Torr, P.H, et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890
Zhou, H-Y, Guo, J, Zhang, Y, Yu, L, Wang, L, Yu, Y (2021) nnformer: Interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201
Zhou, Z, Rahman Siddiquee, MM, Tajbakhsh, N, Liang, J (2018) Unet++: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th international workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, pp 3–11. Springer
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2019) Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging 39(6):1856–1867
Author information
Authors and Affiliations
Contributions
Ismayl Labbihi: Conceptualization Data curation; Formal analysis; Investigation; Methodology; Resources; Software; Validation; Visualization; Writing - original draft. Othmane El Meslouhi: Conceptualization; Formal analysis; Investigation; Methodology; Resources; Validation; supervision; Writing - review and editing. Mohamed Benaddy: Conceptualization; Formal analysis; Investigation; Methodology; Validation; supervision; Writing - review and editing. Mustapha Kardouchi: Project administration; Supervision; Validation: Visualization; Writing - review and editing; Resources;Data curation. Moulay Akhloufi: Project administration; Supervision; Validation; Visualization; Resources; Data curation.
Corresponding author
Ethics declarations
Conflicts of interest
The responsibility for the content of this article rests with its authors. Also, the authors report that is no conflict of interest between the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Othmane El Meslouhi, Mohamed Benaddy, Mustapha Kardouchi and Moulay Akhloufi contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Labbihi, I., El Meslouhi, O., Benaddy, M. et al. Combining frequency transformer and CNNs for medical image segmentation. Multimed Tools Appl 83, 21197–21212 (2024). https://doi.org/10.1007/s11042-023-16279-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16279-9