Transformer-based factorized encoder for classification of pneumoconiosis on 3D CT images

https://doi.org/10.1016/j.compbiomed.2022.106137Get rights and content

Highlights

  • This was the first deep learning study to classify pneumoconiosis on 3D CT images.

  • TBFE captures the intra-slice relationship and the slice-level information exchange.

  • TBFE is applied in clinical practice to promote the early diagnosis of pneumoconiosis.

Abstract

In the past decade, deep learning methods have been implemented in the medical image fields and have achieved good performance. Recently, deep learning algorithms have been successful in the evaluation of diagnosis on lung images. Although chest radiography (CR) is the standard data modality for diagnosing pneumoconiosis, computed tomography (CT) typically provides more details of the lesions in the lung. Thus, a transformer-based factorized encoder (TBFE) was proposed and first applied for the classification of pneumoconiosis depicted on 3D CT images. Specifically, a factorized encoder consists of two transformer encoders. The first transformer encoder enables the interaction of intra-slice by encoding feature maps from the same slice of CT. The second transformer encoder explores the inter-slice interaction by encoding feature maps from different slices. In addition, the lack of grading standards on CT for labeling the pneumoconiosis lesions. Thus, an acknowledged CR-based grading system was applied to mark the corresponding pneumoconiosis CT stage. Then, we pre-trained the 3D convolutional autoencoder on the public LIDC-IDRI dataset and fixed the parameters of the last convolutional layer of the encoder to extract CT feature maps with underlying spatial structural information from our 3D CT dataset. Experimental results demonstrated the superiority of the TBFE over other 3D-CNN networks, achieving an accuracy of 97.06%, a recall of 89.33%, precision of 90%, and an F1-score of 93.33%, using 10-fold cross-validation.

Introduction

Pneumoconiosis is a worldwide occupational respiratory disease caused mainly by silica inhalation [1]. People with a history of exposure to mineral dust (e.g., asbestos and silica dust) or associated silicosis from artificial stone may take decades to develop pneumoconiosis [2]. However, recent findings suggest that, in some cases, it does not take decades to develop pneumoconiosis [3], [4]. Given the disease is irreversible and effective treatments are unavailable, early detection for pneumoconiosis is crucial to take preventive measures and delay progression.

Chest radiography (CR) has a long history of being employed as a standard tool used to screen for pneumoconiosis. Pneumoconiosis diagnosis by CR is primarily based on the international criterion published by the International Labor Office (ILO) [5]. Previously, methods for pneumoconiosis diagnosis [6], [7], [8] mainly relied on texture analysis to learn the handcrafted features of CR and explored the upper, middle, and lower regions of the left and right lung fields to extract features of the regions of interest (ROI). Subsequently, these features are fed to a classifier, such as Multi-layer Perception (MLP) [9], Support Vector Machine (SVM) [10], and Random Forest (RF) [11], for disease classification. With the development of deep learning, extensive research on the convolutional neural network (CNN) has been conducted to diagnose lung disorders and achieve positive results, laying the groundwork for the implementation of pneumoconiosis CT diagnosis. For example, Wang et al. [12] proposed the COVID-Net by ImageNet [13] pre-trained ResNext50 [14] network and used a lightweight residual projection-expansion projection-extension (PEPX) design pattern. The accuracy of the COVID-Net method can achieve 93.3% test accuracy. Most lung diagnosis methods are generally implemented based on the ImageNet [13] pre-trained CNN model. In the pneumoconiosis diagnosis task, Zheng et al. [15] employed a CNN model, such as LeNet [16], ALexNet [17], and GoogLeNet [18] (Inception-v1 & v2). Then, they reconstructed it using the concept of convolutional kernel decomposition. As a result, the optimized model GoogLeNet-CF can achieve an accuracy of approximately 96.88% when increasing the size of the training set to 1600. Wang et al. [19] collected a dataset including 923 pneumoconiosis CR images and 958 normal CR images. They used the GoogleNet [18] (Inception-v3) model to detect pneumoconiosis, which achieved an area under the curve (AUC) of 87.80%. Devnath et al. [20] used two CNN models to extract high-level features in pneumoconiosis CR images, including without pre-trained DenseNet [21] and with pre-trained CheXNet architectures [22]. Moreover, hybrid CheXNet achieved an accuracy of 92.68% in the automated detection of pneumoconiosis. Most models have the following characteristics for the above CR method: (1) The model took the advantage of the ImageNet [13] pre-trained model as the backbone for training; (2) The model was mainly suitable for two or three classifications. However, pneumoconiosis generally consists of several stages, and the differences in CR images at an early stage are sometimes small. Therefore, the ImageNet [13] pre-training model cannot be well qualified for the multi-classification task of pneumoconiosis.

Because chest computed tomography (CT) has higher resolution and more diagnostic information, thin-slice CT has been widely used as a benchmark to evaluate lung disorders and has proved to be more sensitive than CR. For example, Bhandary et al. [23] offered two alternative feature extraction ideas in CT images, including a modified AlexNet (called the MAN) technique and a Principal Component Analysis (PCA) technique that combined MAN-learned and hand-crafted features and used SVM to investigate lung pneumonia and cancer. As a result, the MAN-SVM achieved an accuracy of 97.27% on the LIDC-IDRI dataset [24]. To better diagnose lung disorders in different stages, researchers have used 3D CT images to achieve the classification of lung disorders. For example, Fallahpoor et al. [25] used the 3D-CNN network [26], [27] for COVID-19 detection, including 3D ResNet, 3D DenseNet, and 3D SeresNet. On the total combination dataset (Iranmehr dataset and Moscow dataset), the 3D-CNN network could achieve an accuracy of 91% on Iranmehr and 83% on Moscow, respectively. The above CT image studies contributed to the study of the classification of pneumoconiosis in this paper. However, these methods on CT images involve a considerable amount of 3D or 2D convolution operations when using a 3D CNN model or several 2D CNN models. Hence, the computation cost is high.

The Vision Transformer (ViT) [28] consists of a transformer encoder, which is directly applied to the image patch sequences for the image classification task. In contrast to CNN, the vision transformer is effective at capturing long-range dependencies between patches and is computationally less expensive [29]. Therefore, the vision transformer has achieved excellent results in lung disorders. For example, Gao et al. [30] split 3D CT images into several 2D CT images and developed a vision transformer (ViT) based on attention models and DenseNet [21] to learn the classification of COVID-19. According to the findings of Gao et al., ViT, with F1 scores of 0.76, performed better than DenseNet, with F1 scores of 0.72. Heidarian et al. [31] proposed a transformer-based framework (CAE-Transformer) to classify LUACs efficiently on whole 3D CT images. Current transformer-based works considered the relationship within the same slice (intra-slice). However, they ignored the relationship between different slices (inter-slice), leading to insufficient distinction between different stages of pneumoconiosis.

The main idea of this study is to use 3D CT images to classify pneumoconiosis and improve its diagnosis in different stages. To construct a relationship between slices, a transformer-based factorized encoder (TBFE) is proposed for predicting pneumoconiosis stages, which was the first deep learning study to classify pneumoconiosis on 3D CT images. Specifically, TBFE consists of two transformer encoders and TBFE not only captures the intra-slice relationship, but also the inter-slice relationship for slice-level information exchange. Moreover, those with both CT and CR during the same admission were enrolled, leading to a relatively small and unbalanced sample size (especially those with stage 0). At the beginning of training, we pre-trained a 3D convolutional autoencoder on the public LIDC-IDRI dataset to solve this bias and extract feature maps with underlying spatial structural information on our dataset for modeling the factorized encoder. In addition, we used the focal loss to replace the classic classification loss function and cross-entropy to reduce the impact of the imbalance sample. Furthermore, to demonstrate the effectiveness of the proposed method, we also validated it on another lung disorder dataset (the COVID-CT-MD dataset [32]).

In summary, the main contributions of this paper can be described as follows:

  • A transformer-based factorized encoder was proposed for classifying pneumoconiosis on 3D CT images by combining intra-slice and inter-slice interaction information.

  • A large number of experiments were conducted based on the pneumoconiosis and the COVID-VT-MD datasets. The experimental results demonstrate the superior performance of the proposed method.

The rest of this paper is organized as follows. Section 2 describes materials and methods in detail. Section 3 reports the model training and validation. Section 4 presents the discussion. Finally, Section 5 presents the conclusion.

Section snippets

Materials and methods

The present study used a pneumoconiosis dataset (CRs and CTs), which was approved by the Ethnic Committee of the hospital, and informed consent for CR and CT was waived for all the enrolled subjects.

Patients and dataset

Our study employed two datasets. The first dataset is a cohort of 343 subjects, including 121 (35.3%) healthy controls, 35 (10.2%) stage 0, 60 (17.5%) stage I, 43 (12.5%) stage II, and 84 (24.5%) stage III pneumoconiosis patients. Other characteristics of the participants are presented in Table 2. Furthermore, Fig. 3 shows comparisons of CT images corresponding to different stages of pneumoconiosis.

To verify that the proposed method is also applicable to other lung disorders, we validated it on

Discussion

To the best of our knowledge, this is the first study that explored the feasibility of the proposed method in classifying pneumoconiosis depicted on 3D CT images. In the past decade, traditional machine learning methods were implemented to classify abnormalities on 2D CR and achieved good performance [45], [46], [47], [48], [49]. Deep learning for image recognition has emerged for years. Recently, its application to medical image diagnosis of pneumoconiosis is predominantly on 2D CR images and

Conclusion

A transformer-based factorized encoder is proposed to classify the severity of pneumoconiosis based on 3D images. Its application may promote early diagnosis of pneumoconiosis and other lung diseases.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (50)

  • YuPeichun et al.

    An automatic computer-aided detection scheme for pneumoconiosis on digital chest radiographs

    J. Digit. Imaging

    (2011)
  • YuPeichun et al.

    Computer aided detection for pneumoconiosis based on co-occurrence matrices analysis

  • HaykinSimon

    Neural Networks and Learning Machines, 3/E

    (2009)
  • VapnikVladimir N.

    An overview of statistical learning theory

    IEEE Trans. Neural Netw.

    (1999)
  • LiawAndy et al.

    Classification and regression by randomforest

    R News

    (2002)
  • DengJia et al.

    Imagenet: A large-scale hierarchical image database

  • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He, Aggregated residual transformations for deep neural...
  • ZhengRan et al.

    An improved CNN-based pneumoconiosis diagnosis method on X-ray chest film

  • LeCunYann et al.

    Gradient-based learning applied to document recognition

    Proc. IEEE

    (1998)
  • KrizhevskyAlex et al.

    Imagenet classification with deep convolutional neural networks

    Adv. Neural Inf. Process. Syst.

    (2012)
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent...
  • WangXiaohua et al.

    Potential of deep learning in assessing pneumoconiosis depicted on digital chest radiography

    Occup. Environ. Med.

    (2020)
  • Gao Huang, Zhuang Liu, Laurens Van Der Maaten, Kilian Q Weinberger, Densely connected convolutional networks, in:...
  • RajpurkarPranav et al.

    Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning

    (2017)
  • Armato IIISamuel G et al.

    The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on ct scans

    Med. Phys.

    (2011)
  • Cited by (9)

    View all citing articles on Scopus
    1

    Yingying Huang and Yang Si contributed equally to this work and should be considered co-first authors.

    View full text