TANet: Triple Attention Network for medical image segmentation

doi:10.1016/j.bspc.2023.104608

Biomedical Signal Processing and Control

Volume 82, April 2023, 104608

https://doi.org/10.1016/j.bspc.2023.104608 Get rights and content

Highlights

•
Help clinicians make the diagnosis by automatically marking the diseased tissues.
•
Low-level features contribute limitedly but incur substantial computational cost.
•
Correlation between channels and spaces can be used for medical image segmentation.
•
Attention mechanism allows the networks to pay attention to the areas of Interest.
•
Selecting appropriate scale features can solve the scale adaptability problem.

Abstract

In recent years, deep learning-based methods have achieved remarkable progress in medical image processing, like polyp segmentation in colonoscopy images and skin lesion segmentation in dermoscopy images. However, the current state-of-the-art medical segmentation methods still suffer from the problem of low accuracy in segmenting the small-scale and variable-scale objects. To solve this problem, we propose Triple Attention Network (TANet). In TANet, a novel Triple Attention Module (TAM) is presented. TAM has two sub-modules: Multi-scale Feature Selection Module (MFSM) and Contextual Feature Extraction Module (CFEM). MFSM is used to extract more adaptable multi-scale features for capturing variable-scale objects, while CFEM is for capturing small-scale objects by extracting contextual features. TAM aims to combine MFSM and CFEM to finally enhance the segmentation performance of the medical images with the small-scale and variable-scale lesions. Extensive experiments are conducted on five polyp datasets and one skin lesion dataset. Results show that the proposed models outperform the previous state-of-the-art models on most evaluation metrics and improve the Dice score by up to 7.1%. All results consistently confirm the effectiveness of the proposed TANet and show that the TANet achieves state-of-the-art performance on the above datasets.

Introduction

With the advent of digital medical imaging equipment, the application of image processing technology in medical image analysis has received extensive attention. Medical image segmentation is an active and important field in medical image analysis. It helps clinicians make the diagnosis by automatically locating and marking the diseased tissues. Therefore, automatic medical image segmentation is significant for facilitating quantitative pathological evaluation, treatment planning, and monitoring disease progression [1]. However, due to various factors, such as background artifacts, noise, varied shape and size of lesions, and blurred boundaries, accurate segmentation has been a challenging task. In recent years, Convolutional Neural Networks (CNNs) have made great progress on computer vision tasks, such as medical image classification [2], [3], [4], object detection [5], [6], [7] and image retrieval [8]. Unsurprisingly, CNNs have also obtained massive achievements on semantic segmentation tasks. Ciresan et al. [9] propose a sliding window-based pipeline using CNN for semantic segmentation. Long et al. [10] propose a fully convolutional network (FCN), which removes the fully connected layers and only uses the convolutional layers for segmentation tasks. Based on FCN, SegNet [11] has been designed, which employs a symmetrical encoder–decoder architecture for segmentation tasks. The encoder extracts spatial features, then the decoder restores the low-resolution feature maps to the original resolution and predicts the segmentation masks. Naturally, CNNs have been introduced to conduct medical image segmentation. U-Net [12] is one of the most popular CNNs-based methods for medical image segmentation. Similar to SegNet, U-Net also includes an encoding path and a decoding path. The encoding path gradually reduces feature map resolution and learns the sophisticated features of the input image. The decoding path restores the low-resolution feature maps into the original size of an input image by an upsampling approach. However, it is well known that downsampling approaches lead to the loss of some meaningful information and degrade the segmentation performance [13]. To overcome this problem, U-Net introduces skip connections, which concatenate features from both encoder and decoder to obtain more meaningful features. U-Net beated FCN and obtained the state-of-the-art (SOTA) performance on medical images at the time. After that, many U-Net variants have been proposed, including U-Net++ [14], R2UNet [15], Attention-Unet [16], ConvLSTMU-Net [17], and etc.

However, for the problem that the lesions in medical images vary significantly in size and shape, the existing encoder–decoder-based architectures such as U-Net cannot produce accurate segmentation masks. We call it the scale adaptation problem. In this case, the existing encoder–decoder-based architectures do not provide sufficient multi-scale features for generating accurate segmentation. To tackle this problem, a common way is to design new skip connections to explore multi-scale features, such as MDU-Net [18], H-DenseUNet [19] and U-Net++ [14]. In addition, multiscale-based methods have been developed to deal with the scale adaptation problem. The atrous spatial pyramid pooling module (ASPP) [20] and the pyramid pooling module (PPM) [21] are widely used to extract multi-scale features. For example, PoolNet [22] extracts and processes the feature maps from the deepest layer via multiple parallel poolings operations with different-size pooling kernels. Ce-Net [23] adopts multiple dilated convolution branches with different dilated rates to obtain rich multi-scale context features of images. Although the network with new skip connections and multi-scale based methods can alleviate the scale adaptation problem to a certain extent, these methods cannot automatically select the most adaptable scale features from the extracted multi-scale features, which is necessary for obtaining accurate segmentation on medical images [24].

Additionally, for the medical images containing multiple lesions of different sizes, the context of the large-scale lesions may harm the segmentation of the small-scale lesions, resulting in the so-called small-scale lesion problem. To solve this problem, a type of methods [25], [26] attempts to capture more small-scale lesion features by fusing the multi-scale context features. Another type of methods [27], [28] addresses the small-scale lesion problem by extracting global contextual features. These methods utilize the enlarged kernel size or introduce an effective encoding layer on top of the network to capture global contextual features. However, all these methods only conduct simple explorations of global contextual features, and they do not explore the relationship between these contextual features so that features from dominated salient objects (e.g., large-scale lesions) still affect the segmentation of inconspicuous objects (e.g., small-scale lesions).

To solve the scale adaptation problem and the small-scale lesion problem in medical image segmentation, and improve the segmentation performance as much as possible, we propose a novel network architecture named Triple Attention Network (TANet) in this paper. TANet simultaneously fuses scale attention, position attention and channel attention, and use Res2Net [29] as the encoder backbone to extract features. Moreover, a novel module is proposed in TANet, called Triple Attention Module (TAM). TAM concentrates on addressing the above-mentioned two problems. The proposed TAM consists of Multi-scale Feature Selection Module (MFSM) and Contextual Features Extraction Module (CFEM). It combines MFSM and CFEM to sufficiently and efficiently extract discriminative features. To solve the scale adaptation problem, TAM utilizes MFSM to extract more multi-scale features based on the features of high-level layers, and dynamically select the most adaptable scale features from the new multi-scale features. Simultaneously, TAM highlights the feature representations of small-scale lesions to avoid the influence of large-scale lesions by using CFEM. CFEM exploits the correlation in channel and spatial dimensions between the similar features of lesions. It introduces the self-attention mechanism to establish spatial inter-pixel and inter-channel correlation. In this manner, CFEM can selectively aggregate the similar features of the lesions at any scales, and improve the feature representation of lesions at any scale. Therefore, CFEM can not only obtain global contextual features, but also capture the relationship between these features.

The main contributions of this paper are summarized as follows:

(1) A novel network architecture, Triple Attention Network (TANet), is presented for medical image segmentation tasks. TANet outperforms the existing SOTA methods on five polyp datasets and one skin lesion dataset.

(2) A new network module, Triple Attention Module (TAM), is proposed for addressing the scale adaptation problem and small-scale lesion problem. TAM can capture more discriminative features by combining the Multi-scale Feature Selection Module and Contextual Features Extraction Module.

(3) Extensive experiments are conducted to demonstrate the effectiveness of our proposed method. All experimental results consistently show that our method is superior to the existing medical image segmentation methods.

Section snippets

Related works

Traditional medical image segmentation methods are mainly based on hand-crafted features [30], [31], [32], [33], [34], [35]. These methods not only require a large amount of labor input, but also lead to misjudgment or over-segment. In recent years, many methods based on CNNs have been proposed and made brilliant achievements for medical image analysis [10], [11], [12], [36], [37]. Among these CNNs-based methods, encoder–decoder or U-shape-based networks are prevalent for medical image

Overview of network architecture

In this paper, we propose a new network, called Triple Attention Network (TANet), for medical image segmentation tasks. The overall architecture of TANet is presented in Fig. 1. TANet mainly includes three parts: (1) Feature Encoder; (2) Triple Attention Module (TAM), and (3) Feature Aggregation Module (FAM). Compared with traditional U-shape architecture (e.g., U-Net), the encoder and decoder in our model are not entirely symmetrical. According to the observations from [13], [49], the

Experiments

Experiments are conducted on two types of medical images, namely polyp and skin lesion images. The proposed TANet and the existing SOTA methods are compared on the polyp segmentation task and skin lesion segmentation task. The details of the experiments are presented in the following subsections.

Conclusions

In this paper, we have proposed a novel deep network called Triple Attention Network (TANet) for medical image segmentation. The Triple Attention Module (TAM) presented in TANet can capture more discriminative features by combining the proposed Multi-scale Feature Selection Module (MFSM) and Contextual Feature Extraction Module (CFEM). In TAM, MFSM can extract more multi-scale features and select adaptable scale features from all features. With these adaptable scale features, CFEM can extract

CRediT authorship contribution statement

Xin Wei: Conceptualization, Methodology, Software, Writing – review & editing. Fanghua Ye: Data curation, Writing – original draft. Huan Wan: Investigation. Jianfeng Xu: Software, Validation. Weidong Min: Writing – review & editing.

Declaration of Competing Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Xin Wei reports financial support was provided by National Natural Science Foundation of China. Weidong Min reports financial support was provided by National Natural Science Foundation of China. Weidong Min reports financial support was provided by Jiangxi key Laboratory of Smart City. Xin Wei reports a relationship with Beijing Jiaotong University that

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 62106093, 62076117, 62106090), Jiangxi Key Laboratory of Smart City (Grant No. 20192BCD40002), the Urgent Need for Overseas Talent project (Grant No. 20223BCJ25040, 20223BCJ25026) and Jiangxi Training Program for Academic and the Technical Leaders in Major Disciplines - Leading Talents Project (Grant No. 20225BCJ22016).

References (72)

LitjensG. et al.
A survey on deep learning in medical image analysis
Med. Image Anal.
(2017)
WangH. et al.
Visual saliency guided complex image retrieval
Pattern Recognit. Lett.
(2020)
PoudelS. et al.
Deep multi-scale attentional features for medical image segmentation
Appl. Soft Comput.
(2021)
LongZ. et al.
Segmentation and classification of knee joint ultrasonic image via deep learning
Appl. Soft Comput.
(2020)
JinQ. et al.
Cascade knowledge diffusion network for skin lesion diagnosis and segmentation
Appl. Soft Comput.
(2021)
SchlemperJ. et al.
Attention gated networks: Learning to leverage salient regions in medical images
Med. Image Anal.
(2019)
OlivaA. et al.
The role of context in object recognition
Trends in Cognitive Sciences
(2007)
BernalJ. et al.
WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. Saliency maps from physicians
Comput. Med. Imaging Graph.
(2015)
QinX. et al.
U2-Net: Going deeper with nested U-structure for salient object detection
Pattern Recognit.
(2020)
WuH. et al.
FAT-Net: Feature adaptive transformers for automated skin lesion segmentation
Med. Image Anal.
(2022)

SahaM. et al.

Her2Net: A deep framework for semantic segmentation and classification of cell membranes and nuclei in breast cancer evaluation

IEEE Trans. Image Process.

(2018)

NardelliP. et al.

Pulmonary artery–vein classification in CT images using deep learning

IEEE Trans. Med. Imaging

(2018)

PoudelS. et al.

Colorectal disease classification using efficiently scaled dilation in convolutional neural network

IEEE Access

(2020)

ShinH.-C. et al.

Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning

IEEE Trans. Med. Imaging

(2016)

ZhangJ. et al.

Detecting anatomical landmarks from limited medical imaging data using two-stage task-oriented deep neural networks

IEEE Trans. Image Process.

(2017)

DingL. et al.

A novel deep learning pipeline for retinal vessel detection in fluorescein angiography

IEEE Trans. Image Process.

(2020)

CiresanD. et al.

Deep neural networks segment neuronal membranes in electron microscopy images

Adv. Neural Inf. Process. Syst.

(2012)

J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE...

BadrinarayananV. et al.

Segnet: A deep convolutional encoder-decoder architecture for image segmentation

IEEE Trans. Pattern Anal. Mach. Intell.

(2017)

RonnebergerO. et al.

U-Net: Convolutional networks for biomedical image segmentation

ZhouZ. et al.

Unet++: A nested U-Net architecture for medical image segmentation

AlomM.Z. et al.

Recurrent residual U-Net for medical image segmentation

J. Med. Imaging

(2019)

OktayO. et al.

Attention U-Net: Learning where to look for the pancreas

(2018)

R. Azad, M. Asadi-Aghbolaghi, M. Fathy, S. Escalera, Bi-directional ConvLSTM U-Net with densley connected convolutions,...

ZhangJ. et al.

Mdu-net: Multi-scale densely connected U-Net for biomedical image segmentation

(2018)

LiX. et al.

H-DenseUNet: Hybrid densely connected UNet for liver and tumor segmentation from CT volumes

IEEE Trans. Med. Imaging

(2018)

ChenL.-C. et al.

Rethinking atrous convolution for semantic image segmentation

(2017)

H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in: Proceedings of the IEEE Conference on...

J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, J. Jiang, A simple pooling-based design for real-time salient object...

GuZ. et al.

Ce-Net: Context encoder network for 2D medical image segmentation

IEEE Trans. Med. Imaging

(2019)

X. Li, W. Wang, X. Hu, J. Yang, Selective kernel networks, in: Proceedings of the IEEE/CVF Conference on Computer...

H. Ding, X. Jiang, B. Shuai, A.Q. Liu, G. Wang, Context contrasted feature and gated multi-scale aggregation for scene...

G. Lin, A. Milan, C. Shen, I. Reid, Refinenet: Multi-path refinement networks for high-resolution semantic...

C. Peng, X. Zhang, G. Yu, G. Luo, J. Sun, Large kernel matters–Improve semantic segmentation by global convolutional...

WangJ. et al.

Global context encoding for salient objects detection

GaoS. et al.

Res2net: A new multi-scale backbone architecture

IEEE Trans. Pattern Anal. Mach. Intell.

(2019)

Cited by (3)

Striped WriNet: Automatic wrinkle segmentation based on striped attention module
2024, Biomedical Signal Processing and Control
Accurate segmentation of wrinkles from facial images forms the basis of wrinkle analysis and acts as an important process in skin aging evaluation. However, Wrinkles have high intraclass variability while shallow wrinkles more closely resemble the texture of the skin. Accurate segmentation of both deep wrinkles and shallow fine lines is therefore still a significant challenge. In this paper, we propose a deep learning-based automatic wrinkle segmentation model named Striped WriNet. Its central module is Striped Attention Module (SAM), which consists of Multi-Scale Striped Attention (MSA) and Global Striped Attention (GSA). The ability of the model to extract contextual information about deep folds and shallow fine lines is improved by involving the geometry of the folds to design multi-scale structures and attention mechanisms. In this paper, we conducted experiments on both the public and private datasets, where the images are finely labeled. For Accuracy (Acc), Dice Score (Dice), and Jaccard Similarity Index (JSI), the proposed method is superior to the current state-of-the-art methods. Ablation experiments and coupling tests are performed to demonstrate the effectiveness of each module and the decoupling of SAM. Finally, this paper innovatively proposes the wrinkle index as a computational method for wrinkle evaluation. The experiments were performed on 2000 wrinkle pictures, and the Pearson coefficient of wrinkle index and the expert clinical score was 0.9399, which proving the robustness of the proposed method in assessing wrinkles under different individuals. The code is available from https://github.com/mingyu6yang/striped-wrinet.
GLSNet: A Global Guided Local Feature Stepwise Aggregation Network for polyp segmentation
2024, Biomedical Signal Processing and Control
Accurate polyp segmentation is of great significance for the prevention and diagnosis of early colon cancer. Transformer-based image segmentation models have been proposed for polyp segmentation with good results, however, these methods do not sufficiently consider the negative impact of background noise in different levels of features, resulting in the loss of local details. To alleviate this problem, we propose a GLSNet (A Global Guided Local Feature Stepwise Aggregation Network for polyp segmentation) network, including a spatial feature enhancement (SFE) module, a globally guided local feature enhancement (GLFE) module, and a feature stepwise aggregation (FSA) module. SFE can enhance the spatial feature representation of polyps to better capture the polyp information in the features. GLFE uses high-level features to filter noise in low-level features to capture polyp information hidden in shallow features; FSA fuses the semantic and positional information of polyps across scales to obtain the final segmentation results. Qualitative and quantitative experiments were conducted on 7 benchmark datasets, and the experimental results demonstrated that GLSNet outperforms other existing methods and has stronger generalization performance. In particular, we achieved a mean Dice of 92.9% on the large-scale Kvasir dataset, and mean Dice of 81.3% and 81.6% on CVC-ColonDB and ETIS, respectively, which are significantly higher than those of the competing methods.
MSMCNet: Differential context drives accurate localization and edge smoothing of lesions for medical image segmentation
2023, Computers in Biology and Medicine
Medical image segmentation plays a crucial role in clinical assistance for diagnosis. The UNet-based network architecture has achieved tremendous success in the field of medical image segmentation. However, most methods commonly employ element-wise addition or channel merging to fuse features, resulting in smaller differentiation of feature information and excessive redundancy. Consequently, this leads to issues such as inaccurate lesion localization and blurred boundaries in segmentation. To alleviate these problems, the Multi-scale Subtraction and Multi-key Context Conversion Networks (MSMCNet) are proposed for medical image segmentation. Through the construction of differentiated contextual representations, MSMCNet emphasizes vital information and achieves precise medical image segmentation by accurately localizing lesions and enhancing boundary perception. Specifically, the construction of differentiated contextual representations is accomplished through the proposed Multi-scale Non-crossover Subtraction (MSNS) module and Multi-key Context Conversion Module (MCCM). The MSNS module utilizes the context of MCCM coding and redistribute the value of feature map pixels. Extensive experiments were conducted on widely used public datasets, including the ISIC-2018 dataset, COVID-19-CT-Seg dataset, Kvasir dataset, as well as a privately constructed traumatic brain injury dataset. The experimental results demonstrated that our proposed MSMCNet outperforms state-of-the-art medical image segmentation methods across different evaluation metrics.

¹: Equal contribution.

View full text

TANet: Triple Attention Network for medical image segmentation

Highlights

Abstract

Introduction

Section snippets

Related works

Overview of network architecture

Experiments

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Med. Image Anal.

Pattern Recognit. Lett.

Appl. Soft Comput.

Appl. Soft Comput.

Appl. Soft Comput.

Med. Image Anal.

Trends in Cognitive Sciences

Comput. Med. Imaging Graph.

Pattern Recognit.

Med. Image Anal.

Her2Net: A deep framework for semantic segmentation and classification of cell membranes and nuclei in breast cancer evaluation

IEEE Trans. Image Process.

Pulmonary artery–vein classification in CT images using deep learning

IEEE Trans. Med. Imaging

Colorectal disease classification using efficiently scaled dilation in convolutional neural network

IEEE Access

Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning

IEEE Trans. Med. Imaging

Detecting anatomical landmarks from limited medical imaging data using two-stage task-oriented deep neural networks

IEEE Trans. Image Process.

A novel deep learning pipeline for retinal vessel detection in fluorescein angiography

IEEE Trans. Image Process.

Deep neural networks segment neuronal membranes in electron microscopy images

Adv. Neural Inf. Process. Syst.

Segnet: A deep convolutional encoder-decoder architecture for image segmentation

IEEE Trans. Pattern Anal. Mach. Intell.

U-Net: Convolutional networks for biomedical image segmentation

Unet++: A nested U-Net architecture for medical image segmentation

Recurrent residual U-Net for medical image segmentation

J. Med. Imaging

Attention U-Net: Learning where to look for the pancreas

Mdu-net: Multi-scale densely connected U-Net for biomedical image segmentation

H-DenseUNet: Hybrid densely connected UNet for liver and tumor segmentation from CT volumes

IEEE Trans. Med. Imaging

Rethinking atrous convolution for semantic image segmentation

Ce-Net: Context encoder network for 2D medical image segmentation

IEEE Trans. Med. Imaging

Global context encoding for salient objects detection

Res2net: A new multi-scale backbone architecture

IEEE Trans. Pattern Anal. Mach. Intell.