Transformer-based factorized encoder for classification of pneumoconiosis on 3D CT images
Introduction
Pneumoconiosis is a worldwide occupational respiratory disease caused mainly by silica inhalation [1]. People with a history of exposure to mineral dust (e.g., asbestos and silica dust) or associated silicosis from artificial stone may take decades to develop pneumoconiosis [2]. However, recent findings suggest that, in some cases, it does not take decades to develop pneumoconiosis [3], [4]. Given the disease is irreversible and effective treatments are unavailable, early detection for pneumoconiosis is crucial to take preventive measures and delay progression.
Chest radiography (CR) has a long history of being employed as a standard tool used to screen for pneumoconiosis. Pneumoconiosis diagnosis by CR is primarily based on the international criterion published by the International Labor Office (ILO) [5]. Previously, methods for pneumoconiosis diagnosis [6], [7], [8] mainly relied on texture analysis to learn the handcrafted features of CR and explored the upper, middle, and lower regions of the left and right lung fields to extract features of the regions of interest (ROI). Subsequently, these features are fed to a classifier, such as Multi-layer Perception (MLP) [9], Support Vector Machine (SVM) [10], and Random Forest (RF) [11], for disease classification. With the development of deep learning, extensive research on the convolutional neural network (CNN) has been conducted to diagnose lung disorders and achieve positive results, laying the groundwork for the implementation of pneumoconiosis CT diagnosis. For example, Wang et al. [12] proposed the COVID-Net by ImageNet [13] pre-trained ResNext50 [14] network and used a lightweight residual projection-expansion projection-extension (PEPX) design pattern. The accuracy of the COVID-Net method can achieve 93.3% test accuracy. Most lung diagnosis methods are generally implemented based on the ImageNet [13] pre-trained CNN model. In the pneumoconiosis diagnosis task, Zheng et al. [15] employed a CNN model, such as LeNet [16], ALexNet [17], and GoogLeNet [18] (Inception-v1 & v2). Then, they reconstructed it using the concept of convolutional kernel decomposition. As a result, the optimized model GoogLeNet-CF can achieve an accuracy of approximately 96.88% when increasing the size of the training set to 1600. Wang et al. [19] collected a dataset including 923 pneumoconiosis CR images and 958 normal CR images. They used the GoogleNet [18] (Inception-v3) model to detect pneumoconiosis, which achieved an area under the curve (AUC) of 87.80%. Devnath et al. [20] used two CNN models to extract high-level features in pneumoconiosis CR images, including without pre-trained DenseNet [21] and with pre-trained CheXNet architectures [22]. Moreover, hybrid CheXNet achieved an accuracy of 92.68% in the automated detection of pneumoconiosis. Most models have the following characteristics for the above CR method: (1) The model took the advantage of the ImageNet [13] pre-trained model as the backbone for training; (2) The model was mainly suitable for two or three classifications. However, pneumoconiosis generally consists of several stages, and the differences in CR images at an early stage are sometimes small. Therefore, the ImageNet [13] pre-training model cannot be well qualified for the multi-classification task of pneumoconiosis.
Because chest computed tomography (CT) has higher resolution and more diagnostic information, thin-slice CT has been widely used as a benchmark to evaluate lung disorders and has proved to be more sensitive than CR. For example, Bhandary et al. [23] offered two alternative feature extraction ideas in CT images, including a modified AlexNet (called the MAN) technique and a Principal Component Analysis (PCA) technique that combined MAN-learned and hand-crafted features and used SVM to investigate lung pneumonia and cancer. As a result, the MAN-SVM achieved an accuracy of 97.27% on the LIDC-IDRI dataset [24]. To better diagnose lung disorders in different stages, researchers have used 3D CT images to achieve the classification of lung disorders. For example, Fallahpoor et al. [25] used the 3D-CNN network [26], [27] for COVID-19 detection, including 3D ResNet, 3D DenseNet, and 3D SeresNet. On the total combination dataset (Iranmehr dataset and Moscow dataset), the 3D-CNN network could achieve an accuracy of 91% on Iranmehr and 83% on Moscow, respectively. The above CT image studies contributed to the study of the classification of pneumoconiosis in this paper. However, these methods on CT images involve a considerable amount of 3D or 2D convolution operations when using a 3D CNN model or several 2D CNN models. Hence, the computation cost is high.
The Vision Transformer (ViT) [28] consists of a transformer encoder, which is directly applied to the image patch sequences for the image classification task. In contrast to CNN, the vision transformer is effective at capturing long-range dependencies between patches and is computationally less expensive [29]. Therefore, the vision transformer has achieved excellent results in lung disorders. For example, Gao et al. [30] split 3D CT images into several 2D CT images and developed a vision transformer (ViT) based on attention models and DenseNet [21] to learn the classification of COVID-19. According to the findings of Gao et al., ViT, with F1 scores of 0.76, performed better than DenseNet, with F1 scores of 0.72. Heidarian et al. [31] proposed a transformer-based framework (CAE-Transformer) to classify LUACs efficiently on whole 3D CT images. Current transformer-based works considered the relationship within the same slice (intra-slice). However, they ignored the relationship between different slices (inter-slice), leading to insufficient distinction between different stages of pneumoconiosis.
The main idea of this study is to use 3D CT images to classify pneumoconiosis and improve its diagnosis in different stages. To construct a relationship between slices, a transformer-based factorized encoder (TBFE) is proposed for predicting pneumoconiosis stages, which was the first deep learning study to classify pneumoconiosis on 3D CT images. Specifically, TBFE consists of two transformer encoders and TBFE not only captures the intra-slice relationship, but also the inter-slice relationship for slice-level information exchange. Moreover, those with both CT and CR during the same admission were enrolled, leading to a relatively small and unbalanced sample size (especially those with stage 0). At the beginning of training, we pre-trained a 3D convolutional autoencoder on the public LIDC-IDRI dataset to solve this bias and extract feature maps with underlying spatial structural information on our dataset for modeling the factorized encoder. In addition, we used the focal loss to replace the classic classification loss function and cross-entropy to reduce the impact of the imbalance sample. Furthermore, to demonstrate the effectiveness of the proposed method, we also validated it on another lung disorder dataset (the COVID-CT-MD dataset [32]).
In summary, the main contributions of this paper can be described as follows:
- •
A transformer-based factorized encoder was proposed for classifying pneumoconiosis on 3D CT images by combining intra-slice and inter-slice interaction information.
- •
A large number of experiments were conducted based on the pneumoconiosis and the COVID-VT-MD datasets. The experimental results demonstrate the superior performance of the proposed method.
The rest of this paper is organized as follows. Section 2 describes materials and methods in detail. Section 3 reports the model training and validation. Section 4 presents the discussion. Finally, Section 5 presents the conclusion.
Section snippets
Materials and methods
The present study used a pneumoconiosis dataset (CRs and CTs), which was approved by the Ethnic Committee of the hospital, and informed consent for CR and CT was waived for all the enrolled subjects.
Patients and dataset
Our study employed two datasets. The first dataset is a cohort of 343 subjects, including 121 (35.3%) healthy controls, 35 (10.2%) stage 0, 60 (17.5%) stage I, 43 (12.5%) stage II, and 84 (24.5%) stage III pneumoconiosis patients. Other characteristics of the participants are presented in Table 2. Furthermore, Fig. 3 shows comparisons of CT images corresponding to different stages of pneumoconiosis.
To verify that the proposed method is also applicable to other lung disorders, we validated it on
Discussion
To the best of our knowledge, this is the first study that explored the feasibility of the proposed method in classifying pneumoconiosis depicted on 3D CT images. In the past decade, traditional machine learning methods were implemented to classify abnormalities on 2D CR and achieved good performance [45], [46], [47], [48], [49]. Deep learning for image recognition has emerged for years. Recently, its application to medical image diagnosis of pneumoconiosis is predominantly on 2D CR images and
Conclusion
A transformer-based factorized encoder is proposed to classify the severity of pneumoconiosis based on 3D images. Its application may promote early diagnosis of pneumoconiosis and other lung diseases.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (50)
- et al.
Occupational lung diseases: from old and novel exposures to effective preventive strategies
Lancet Respir. Med.
(2017) - et al.
A new era of coal workers’ pneumoconiosis: decades in mines may not be required
Lancet
(2020) - et al.
Artificial stone silicosis: rapid progression following exposure cessation
Chest
(2020) - et al.
Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images
Sci. Rep.
(2020) - et al.
Automated detection of pneumoconiosis with multilevel deep features learned from chest X-Ray radiographs
Comput. Biol. Med.
(2021) - et al.
Deep-learning framework to detect lung abnormality–A study with chest X-Ray and lung CT scan images
Pattern Recognit. Lett.
(2020) - et al.
Generalizability assessment of COVID-19 3D CT data for deep learning-based disease detection
Comput. Biol. Med.
(2022) The world is failing on silicosis
Lancet
(2019)Guidelines for the use of ILO international classification of radiographs of pneumoconioses
(1980)- Horace Xu, Xiaodong Tao, Ramasubramanian Sundararajan, Weizhong Yan, Pavan Annangi, Xiwen Sun, Ling Mao, Computer aided...
An automatic computer-aided detection scheme for pneumoconiosis on digital chest radiographs
J. Digit. Imaging
Computer aided detection for pneumoconiosis based on co-occurrence matrices analysis
Neural Networks and Learning Machines, 3/E
An overview of statistical learning theory
IEEE Trans. Neural Netw.
Classification and regression by randomforest
R News
Imagenet: A large-scale hierarchical image database
An improved CNN-based pneumoconiosis diagnosis method on X-ray chest film
Gradient-based learning applied to document recognition
Proc. IEEE
Imagenet classification with deep convolutional neural networks
Adv. Neural Inf. Process. Syst.
Potential of deep learning in assessing pneumoconiosis depicted on digital chest radiography
Occup. Environ. Med.
Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning
The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on ct scans
Med. Phys.
Cited by (9)
Identification of high-risk population of pneumoconiosis using deep learning segmentation of lung 3D images and radiomics texture analysis
2024, Computer Methods and Programs in BiomedicineRecent progress in transformer-based medical image analysis
2023, Computers in Biology and MedicineOrbitNet—A fully automated orbit multi-organ segmentation model based on transformer in CT images
2023, Computers in Biology and Medicine
- 1
Yingying Huang and Yang Si contributed equally to this work and should be considered co-first authors.