Lung Cancer Detection: A Classification Approach Utilizing Oversampling and Support Vector Machines

Jara-Gavilanes, Adolfo; Robles-Bykbaev, Vladimir

doi:10.1007/s42979-023-02432-6

Lung Cancer Detection: A Classification Approach Utilizing Oversampling and Support Vector Machines

Original Research
Published: 08 December 2023

Volume 5, article number 74, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

64 Accesses
1 Citation
Explore all metrics

Abstract

Lung cancer is the type of cancer that causes the most deaths each year. It is also cancer with the lowest survival rate. This represents a health problem worldwide. Lung cancer has two subtypes: Non-Small Cell Lung Cancer (NSCLC) and Small Cell Lung Cancer (SCLC). For doctors, it can be hard to detect and differentiate them. Therefore, in this work, we present a method to help doctors with this issue. It consists of three phases: image preprocessing is the first phase. It starts gathering the data. After that, PET scans are selected. Then, all the scans are converted to grayscale images, and finally, all the images are joined to create a video from each patient’s scan. Next, the data extraction phase starts. In this phase, some frames are extracted from each video, and they are flattened and blended to create a row of information from each frame. Thus, a dataframe is created where each row represents a patient, and each column is a pixel value. To obtain better results, an oversampling technique is applied. In this manner, the classes are balanced. Following this, a dimensionality reduction technique is applied to reduce the number of columns produced by the previous steps and to check if this technique improves the results yielded by each model. Subsequently, the model evaluation phase begins. At this stage, two models are created: a Support Vector Machine (SVM), and a Random Forest. Ultimately, the findings are unveiled, revealing that the SVM emerged as the top-performing model, boasting an impressive 97% accuracy, 98% precision, and 97% sensitivity. Eventually, this method can be applied to detect and classify different diseases that involve PET scans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

Cancer detection and segmentation using machine learning and deep learning techniques: a review

Article 22 August 2023

Deep learning for lung Cancer detection and classification

Article 02 January 2020

Data availability

The dataset utilized in this research is sourced from "The Cancer Imaging Archive (TCIA)," accessible at https://imaging.cancer.gov/informatics/cancer_imaging_archive.htm. TCIA provides a comprehensive collection of openly available medical images, fostering collaborative research in cancer imaging.

Code Availability

The code implemented in this study is openly available on the following URL: https://colab.research.google.com/drive/1JKq3NV4wCkN1tTbwVkU-bNmL4qt2NPpy?usp=sharing. Researchers interested in reproducing or building upon the methods employed in this paper are encouraged to access the provided code. This transparency aims to facilitate the reproducibility of our findings and promote collaborative efforts in advancing cancer imaging research.

Notes

More information can be found in: https://keras.io/examples/vision/video_classification/.
For more information refer to: https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html.

References

Anowar F, Sadaoui S, Selim B. Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Comput Sci Rev. 2021;40:100378. https://www.sciencedirect.com/science/article/pii/S1574013721000186. Accessed 13 Dec 2022
Bade BC, Dela Cruz CS. Lung cancer 2020: epidemiology, etiology, and prevention. Clin Chest Med. 2020;41(1):1–24. https://doi.org/10.1016/j.ccm.2019.10.001.
Article Google Scholar
Barta JA, Powell CA, Wisnivesky JP. Global epidemiology of lung cancer. Ann Glob Health. 2019;85(1):8–24
Biau G, Scornet E. A random forest guided tour. TEST. 2016;25(2):197–227.
Article MathSciNet MATH Google Scholar
Cano JR, Gutiérrez PA, Krawczyk B, Woźniak M, García S. Monotonic classification: an overview on algorithms, performance measures and data sets. Neurocomputing. 2019;341:168–82.
Article Google Scholar
Chauhan VK, Dahiya K, Sharma A. Problem formulations and solvers in linear svm: a review. Artif Intell Rev. 2019;52(2):803–55.
Article Google Scholar
Chen CC, Li ST. Credit rating with a monotonicity-constrained support vector machine model. Expert Syst Appl. 2014;41(16):7235–47.
Article Google Scholar
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The cancer imaging archive (TCIA): maintaining and operating a public information repository. J Digit Imaging. 2013;26(6):1045–57.
Article Google Scholar
Howlader N, Forjaz G, Mooradian MJ, Meza R, Kong CY, Cronin KA, Mariotto AB, Lowy DR, Feuer EJ. The effect of advances in lung-cancer treatment on population mortality. N Engl J Med. 2020;383(7):640–9. https://doi.org/10.1056/NEJMoa1916623. (pMID: 32786189).
Article Google Scholar
Huang H, Zheng D, Chen H, Wang Y, Chen C, Xu L, Li G, Wang Y, He X, Li W. Fusion of ct images and clinical variables based on deep learning for predicting invasiveness risk of stage i lung adenocarcinoma. Med Phys. 2022;49(10):6384–94. https://doi.org/10.1002/mp.15903.
Article Google Scholar
Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of support vector machine (svm) learning in cancer genomics. Cancer Genom Proteom. 2018;15(1):41–51.
Google Scholar
Lameka K, Farwell MD, Ichise M. Chapter 11 - positron emission tomography. In: Masdeu, JC, González RG. editors. Neuroimaging part I, handbook of clinical neurology, vol. 135. Elsevier; 2016, p. 209–227. Elsevier. https://www.sciencedirect.com/science/article/pii/B9780444534859000118.
Li P, Wang S, Li T, Lu J, HuangFu Y, Wang D. A large-scale CT and PET/CT dataset for lung cancer diagnosis. 2020. https://doi.org/10.7937/TCIA.2020.NNC2-0461.
Liashchynskyi P, Liashchynskyi P. Grid search, random search, genetic algorithm: a big comparison for nas. 2019.
Ma Y, Feng W, Wu Z, Liu M, Zhang F, Liang Z, Cui C, Huang J, Li X, Guo X. Intra-tumoural heterogeneity characterization through texture and colour analysis for differentiation of non-small cell lung carcinoma subtypes. Phys Med Biol. 2018;63(16): 165018.
Article Google Scholar
Makaju S, Prasad P, Alsadoon A, Singh A, Elchouemi A. Lung cancer detection using ct scan images. Proc Comput Sci. 2018;125:107–14. In: The 6th International Conference on smart computing and communications. https://www.sciencedirect.com/science/article/pii/S1877050917327801. Accessed 19 Dec 2022
Nooreldeen R, Bach H. Current and future development in lung cancer diagnosis. Int J Mol Sci. 2021;22(16). https://www.mdpi.com/1422-0067/22/16/8661. Accessed 7 Jan 2023
Park YJ, Choi D, Choi JY, Hyun SH. Performance evaluation of a deep learning system for differential diagnosis of lung cancer with conventional ct and fdg pet/ct using transfer learning and metadata. Clin Nucl Med. 2021;46(8):635–40.
Article Google Scholar
Saito T, Rehmsmeier M. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015;10(3):1–21.
Article Google Scholar
Schabath MB, Cote ML. Cancer progress and priorities: lung cancer. Cancer Epidemiol Biomark Prev. 2019;28(10):1563–79. https://doi.org/10.1158/1055-9965.EPI-19-0221.
Article Google Scholar
Soltanzadeh P, Hashemzadeh M. Rcsmote: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem. Inform Sci. 2021;542:92–111. https://www.sciencedirect.com/science/article/pii/S0020025520306794. Accessed 15 Dec 2022
Tam M, Dyer T, Dissez G, Morgan TN, Hughes M, Illes J, Rasalingham R, Rasalingham S. Augmenting lung cancer diagnosis on chest radiographs: positioning artificial intelligence to improve radiologist performance. Clin Radiol. 2021;76(8):607–14. https://www.sciencedirect.com/science/article/pii/S0009926021002373. Accessed 3 Jan 2023
Tanoue LT, Tanner NT, Gould MK, Silvestri GA. Lung cancer screening. Am J Respir Crit Care Med. 2015;191(1):19–33.
Article Google Scholar
Thai AA, Solomon BJ, Sequist LV, Gainor JF, Heist RS. Lung cancer. Lancet. 2021;398(10299):535–54. https://doi.org/10.1016/S0140-6736(21)00312-3.
Article Google Scholar
Wood DE, Eapen GA, Ettinger DS, Hou L, Jackman D, Kazerooni E, Klippenstein D, Lackner RP, Leard L, Leung AN, et al. Lung cancer screening. J Natl Compr Canc Netw. 2012;10(2):240–65.
Article Google Scholar
Yin Z, Hou J. Recent advances on svm based fault diagnosis and process monitoring in complicated industrial processes. Neurocomputing. 2016;174:643–50. https://www.sciencedirect.com/science/article/pii/S0925231215014149. Accessed 15 Dec 2022

Download references

Acknowledgements

This research has been supported by the “Sistemas Inteligentes de Soporte a la Educación Especial (SINSAE v5)” research project of the UNESCO Chair on Support Technologies for Educational Inclusion.

Author information

Authors and Affiliations

GI-IATa, Cátedra UNESCO Tecnologías de apoyo para la Inclusión Educativa, Universidad Politécnica Salesiana, Cuenca, Ecuador
Adolfo Jara-Gavilanes & Vladimir Robles-Bykbaev

Authors

Adolfo Jara-Gavilanes
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Robles-Bykbaev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir Robles-Bykbaev.

Ethics declarations

Conflict of Interest

The authors of this paper declare that they have no conflicts of interest that could potentially influence or bias the interpretation of the research presented. There are no financial, personal, or professional relationships that might be perceived as having influenced the work reported in this manuscript. This includes, but is not limited to, financial interests, affiliations, or relationships with organizations that may have a direct or indirect interest in the subject matter discussed in the paper. We affirm that this work is conducted with integrity and in compliance with ethical standards. Any external factors that could pose a conflict of interest have been disclosed in an honest and transparent manner.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Emerging Technologies in Applied Informatics” guest edited by Hector Florez and Marcelo Leon.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jara-Gavilanes, A., Robles-Bykbaev, V. Lung Cancer Detection: A Classification Approach Utilizing Oversampling and Support Vector Machines. SN COMPUT. SCI. 5, 74 (2024). https://doi.org/10.1007/s42979-023-02432-6

Download citation

Received: 15 September 2023
Accepted: 17 October 2023
Published: 08 December 2023
DOI: https://doi.org/10.1007/s42979-023-02432-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lung Cancer Detection: A Classification Approach Utilizing Oversampling and Support Vector Machines

Abstract

Access this article

Similar content being viewed by others

A comparative analysis of gradient boosting algorithms

Cancer detection and segmentation using machine learning and deep learning techniques: a review

Deep learning for lung Cancer detection and classification

Data availability

Code Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lung Cancer Detection: A Classification Approach Utilizing Oversampling and Support Vector Machines

Abstract

Access this article

Similar content being viewed by others

A comparative analysis of gradient boosting algorithms

Cancer detection and segmentation using machine learning and deep learning techniques: a review

Deep learning for lung Cancer detection and classification

Data availability

Code Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation