Feature Ranking Importance from Multimodal Radiomic Texture Features using Machine Learning Paradigm: A Biomarker to Predict the Lung Cancer
Introduction
The lung cancer is the second most common cancer among men and women across the globe, particularly non-small cell lung cancer (NSCLC) accounts for (85%) of all the lung cancer in the United States [1]. In men, it comes after prostate, and in women it comes after breast cancer [2] in which abnormal growth in lung tissue is characterized. The number of patients effected by lung cancer is increased dramatically in the worldwide. In 2018, 234,030 new lungs cancer cases are expected to be diagnosed, about 85% among them were NSCLC [3], [4]. In this regard, the researchers are developing tools to detect lung cancer at an early stage to increase the patient's survival rate.
The brain metastases are developed up to 65% of patients with primary tumor of lung cancer [5]. The main subtypes of lung cancer are non-small cell lung carcinoma (NSCLC) and small cell lung carcinoma (SCLC) accounting for 20% and 80% respectively of all lung cancer [6], [7]. Both mentioned types have multiple ways of treatment and different causes of spreading. The type of cancer that contains both features of above-mentioned types is called mixed large cell/small cell cancer [8]. For the early detection of NSCLC, there are two important techniques such as stereotactic body radiotherapy (SBRT) and radiofrequency (RF) ablation [9]. The combination of adjuvant therapy and surgery is employed to treat NSCLC [10], [11] with much better prognosis and newer treatment opportunities. Currently, two tumors can be distinguished solely based on histologic tissue characterization following tissue biopsy. To provide a more personalized treatment strategies, a non-invasive method is desired for accurate preoperative diagnosis to provide a improved decision making process. The existing studies provide automated tools to classify SCLC from NSCLC cancer based on histopathology images [12], [13]. Another study utilized CT images to classify tumor types [14] yielded 85.71% of accuracy. Grossman et al. utilized Efficientnet with transfer learning approach to distinguish the NSCLC from SCLC brain metastases based on MRI.
The malignancy and prognosis of patients with NSCLC are affected by numerous factors including cancer stage, lymph node metastasis and histologic degree tumor differentiation (DTD) [15], [16]. Consequently, accurate information about different cancer subtypes and grade of tumor are crucial for estimation of disease prognosis. A gloomy prognosis is associated with poorly differentiated tumor and a good prognosis is linked with well distinguished tumor reveals a non-invasive nature of good prognosis [17], [18]. Therefore, accurately discriminating the DTD before surgery is critical for designing therapeutic strategies and for predicting disease prognosis [19].
The spread and growing rate of NSCLC is very fast as compared to the SCLC [20]. The SCLC is commonly associated with smoking and spread very quickly while its death rate is related to the number of cigarette smoked [21]. The experienced pathologists can evaluate the microscopic histopathology slides to inquire and diagnose the malignant cells [22], [23]. The NSCLC are further subcategorized into subtypes such as squamous cell carcinoma (SqCC) and adenocarcinoma [24], [25]. A survival rate of 35-80% can be increased at early detection of NSCLC depending upon the stage of tumor. To diagnose the lung pulmonary nodules, there are several imaging modalities such as magnetic resonance imaging (MRI), chest radiography (X-Ray), positron emission tomography (PET) scan and computed tomography (CT) scans. Some researchers prefer to use MRI to avoid patients to ionizing radiation having very bad effect and can potentially increase lifetime cancer risk [26]. Moreover, the diffusion weighted (DW) MRI also has been used for diagnosis of lung tumor because of its capability to use qualitative checking of high value images and apparent diffusion coefficient (ADC) maps besides to the qualitatively producing median and mean tumor ADCs [27]. However, for lung cancer staging, the most used modalities are PET and CT. CT is more likely to show lung tumors than the other modalities because of its high resolution and clear contrast as compared with other modalities. In this study, we used the CT images of SCLC and NSCLC.
After extracting the features, all features are not contributing equally, their importance can be determined by ranking the features based on different ranking algorithms. The management for feature selection and follow-up for the relevant features information processing, the feature importance ranking (FIR) is required to check that which features are providing more information from set of features can be very helpful for clinicians to make the decision. We used empirical receiver operating characteristics (EROC) curve and rand classifier slop to rank the features. Moreover, for classification of lung types, the researchers in the past used various techniques for detection and prognosis of lung cancer from data mining [28], fuzzy rules, [29], medical imaging [30], [31], [32], [33] and machine learning techniques [34], [35], [36].
The aim of this study is to extract multimodal features to capture the intrinsic dynamics present in the lung cancer subtype. Differentiating the lung cancer subtype is still a challenging task because of the similar hidden dynamics present in the types of lung cancer. To differentiate the SCLC from NSCLC brain metastases is critical as SCLC tends to disseminate earlier than NSCLC in the course of the disease and is more clinically aggressive and is usually treated non-surgically. We first applied image enhancement methods including contrast stretching and gamma correction to improve the images quality to improve the classification performance of NSCLC from SCLC. We then computed the multimodal radiomic features including texture, Haralick texture, gray level size-zone matrix (GLSZM), Gray-level run-length matrix (GLRLM), Gray level co-occurrence matrix (GLCM) features from both types of NSCLC and SCLC. After extracting the features, we applied the ranking algorithms based on EROC and random classifier slope. This method ranks the features based on features importance. We then fed the extracted features from each feature type and categorized ranked features as input to different supervised machine learning algorithms such as Naïve Bayes, decision tree, SVM with Gaussian, RBF and polynomial kernels. This study thus specifically aimed two major improvements in classifying lung cancer types i.e. image Enhancement methods for pre-processing to improve the image qualities and then applying the feature ranking algorithms to rank the features importance. The ranked features are categorized based on ROC values obtained. We observed that top ranked features yielded more than 95% classification performance among all extracted features from each sub-type. The top ranked features from each category thus can be used as a predictor by clinicians for further decision making and early prognosis and diagnosis of lung cancer.
The section 2 described the dataset, proposed algorithmic steps to classify NSCLC from SCLC, image enhancement methods, features extraction, feature ranking importance, supervised machine learning classification methods, performance evaluation measures. The section 3 described the results and discussions followed by conclusion, limitations, and future recommendations.
Section snippets
Dataset
We have taken the publicly available database www.giveascan.org provided by Lung Cancer Alliance (LCA) [37]. This non-profit organizations provide support and advocacy to the patients who are at or with risk for lung cancer and facilitate the researchers. Database Images are in Digital Imaging and Communications in Medicine (DICOM) format. There are a total of 76 patients and 945 images out of which 568 images are from SCLC subjects and 377 from NSCLC subjects.
The SCLC also termed as Oat Cell
Results and discussions
This study is aimed to improve the lung cancer detection performance based extracted multimodal radiomic features and ranking these features to determine the feature importance. We first extracted multimodal radiomic features such as Texture and statistical (14), Haralick (16), GLCM (09), GLSZM (12) and GLRLM (13) with a total of 50 multimodal radiomic features. We ranked each category of these features. We computed the classification performance based on each category of features. The Haralick
Conclusion
Lung cancer is the main cause of mortality across the globe amongst other cancer diseases. Researchers are developing various automated tools improve the detection performance. In the past, researchers used different features extraction approaches. However, feature ranking also plays a vital role to judge the importance of features based on various factors. The important features can be very helpful for clinicians and radiologists to make the early decision. In this study, we extracted
Limitations and future work
Currently, we don't have clinical information of patients and level of severity with the public open dataset. In future, we all validate the results on larger dataset more clinical information and time points. We will utilize more pre-processing methods from segmentation techniques. Currently, we have computed the feature ranking importance based on EROC and random classifier slope. However, in future, we will also test the features importance with other ranking methods. Moreover, currently,
List of abbreviations
Small cell lung cancer (SCLC), Non-small lung cancer (NSCLC), Degree tumor differentiation (DTD), magnetic resonance imaging (MRI), Chest radiography (X-Ray), positron emission tomography (PET), scan and computed tomography (CT), diffusion weighted (DW), Feature importance ranking (FIR), empirical receiver operating characteristics curve (EROC), Angular Second momentum (ASM), Multi-cluster feature selection (MCFS), Local Learning Based Clustering (LLC), Neighborhood Component Analysis (fNCA),
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was funded by the University of Jeddah, Jeddah, Saudi Arabia, under grant No. (UJ-21-DR-151). The authors, therefore, acknowledge with thanks the University of Jeddah technical and financial support
References (116)
- et al.
Transformation from non-small-cell lung cancer to small-cell lung cancer: molecular drivers and cells of origin
Lancet Oncol.
(2015) - et al.
Demographics of brain metastasis
Neurosurg. Clin. North. Am.
(1996) - et al.
Mixed small cell and non-small cell lung cancer
Chest
(1986) Classification and pathology of lung cancer
Surg. Oncol. Clin. N. Am.
(2016)- et al.
A functional role for tumor cell heterogeneity in a mouse model of small cell lung cancer
Cancer Cell
(2011) - et al.
Noninvasive Staging of Non-small Cell Lung Cancer: ACCP Evidenced-Based Clinical Practice Guidelines
(2007) - et al.
Subtyping of undifferentiated non-small cell carcinomas in bronchial biopsy specimens
J. Thorac. Oncol.
(2010) - et al.
Refining the diagnosis and EGFR status of non-small cell lung carcinoma in biopsy and cytologic material, using a panel of mucin staining, TTF-1, cytokeratin 5/6, and P63, and EGFR mutation analysis
J. Thorac. Oncol.
(2010) - et al.
Can diffusion-weighted imaging be used as a reliable sequence in the detection of malignant pulmonary nodules and masses?
Magn. Reson. Imaging
(2013) - et al.
Ensemble classification of colon biopsy images based on information rich hybrid features
Comput. Biol. Med.
(2014)
Automated colon cancer detection using hybrid of novel geometric features and some traditional features
Comput. Biol. Med.
Radiomics in brain tumor: image assessment, quantitative feature descriptors, and machine-learning approaches
Am. J. Neuroradiol.
Radiomics in neuro-oncology: basics, workflow, and applications
Methods
Radiomics in kidney cancer: MR imaging
Magn. Reson. Imaging Clin. N. Am.
Use of gray value distribution of run lengths for texture analysis
Pattern Recognit. Lett.
Unsupervised feature selection with adaptive residual preserving
Neurocomputing
The use of the area under the ROC curve in the evaluation of machine learning algorithms
Pattern Recognit.
Minimum–maximum local structure information for feature selection
Pattern Recognit. Lett.
Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders
Comput. Biol. Med.
Multiresolution MUAPs decomposition and SVM-based analysis in the classification of neuromuscular disorders
Comput. Methods Programs Biomed.
A co-evolving decision tree classification method
Expert Syst. Appl.
Cancer statistics
CA Cancer J. Clin.
A passive state simulation of an anal sphincter using simmechanics
J. Mech. Med. Biol.
Cancer statistics
CA Cancer J. Clin.
Non-small cell lung cancer: current treatment and future advances
Transl. Lung Cancer Res.
Differentiating small-cell lung cancer from non-small-cell lung cancer brain metastases based on MRI using efficientnet and transfer learning approach
Technol. Cancer Res. Treat.
Comparison of therapeutic results from radiofrequency ablation and stereotactic body radiotherapy in solitary lung tumors measuring 5 cm or smaller
Int. J. Clin. Oncol.
Treatment of brain metastasis from lung cancer
Cancers (Basel)
Deep learning for the classification of small-cell and non-small-cell lung cancer
Cancers (Basel)
Automated classification of lung cancer types from cytological images using deep convolutional neural networks
BioMed Res. Int.
Classification of pathological types of lung cancer from CT images by deep residual neural networks with transfer learning strategy
Open Med.
Development and validation of a prognostic gene-expression signature for lung adenocarcinoma
PLoS One
A genomic strategy to refine prognosis in early-stage non–small-cell lung cancer
N. Engl. J. Med.
A nomogram to predict brain metastases of resected non-small cell lung cancer patients
Ann. Surg. Oncol.
Expression profiles of thioredoxin family proteins in human lung cancer tissue: correlation with proliferation and differentiation
Histopathology
An array-based approach to determine different subtype and differentiation of non-small cell lung cancer
Theranostics
Diagnosis of lung cancer prediction system using data mining classification techniques
Int. J. Comput. Sci. Inf. Technol.
International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society: International Multidisciplinary Classification of Lung Adenocarcinoma
Proc. Am. Thorac Soc.
A novel pixel value space statistics map of the pulmonary nodule for classification in computerized tomography images
A new expert system in prediction of lung cancer disease based on fuzzy soft sets
Soft Comput.
Automated detection of pulmonary nodules in helical CT images based on an improved template-matching technique
IEEE Trans. Med. Imaging
A recent survey on colon cancer detection techniques
IEEE/ACM Trans. Comput. Biol. Bioinform.
Capture largest included circles: an approach for counting red blood cells
Detecting brain tumor using machine learning techniques based on different features extracting strategies
Curr. Med. Imaging
Automated breast cancer detection using machine learning techniques by extracting different feature extracting strategies
Prostate cancer detection using machine learning techniques by employing combination of features extracting strategies
Cancer Biomark.
The lung cancer alliance
J. Oncol. Pract.
Brightness preserving contrast enhancement of medical images using adaptive gamma correction and homomorphic filtering
Blind inverse gamma correction
IEEE Trans. Image Process.
Dark satellite image enhancement using knee transfer function and gamma correction based on DWT–SVD
Cited by (6)
Advancements in traditional machine learning techniques for detection and diagnosis of fatal cancer types: Comprehensive review of biomedical imaging datasets
2024, Measurement: Journal of the International Measurement ConfederationCancer detection and segmentation using machine learning and deep learning techniques: a review
2024, Multimedia Tools and ApplicationsA novel myocarditis detection combining deep reinforcement learning and an improved differential evolution algorithm
2024, CAAI Transactions on Intelligence TechnologyA comprehensive analysis of recent advancements in cancer detection using machine learning and deep learning models for improved diagnostics
2023, Journal of Cancer Research and Clinical OncologyAn Automated Lion-Butterfly Optimization (LBO) based Stacking Ensemble Learning Classification (SELC) Model for Lung Cancer Detection
2023, Iraqi Journal for Computer Science and Mathematics