Skip to main content
Log in

Lung Cancer Detection: A Classification Approach Utilizing Oversampling and Support Vector Machines

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Lung cancer is the type of cancer that causes the most deaths each year. It is also cancer with the lowest survival rate. This represents a health problem worldwide. Lung cancer has two subtypes: Non-Small Cell Lung Cancer (NSCLC) and Small Cell Lung Cancer (SCLC). For doctors, it can be hard to detect and differentiate them. Therefore, in this work, we present a method to help doctors with this issue. It consists of three phases: image preprocessing is the first phase. It starts gathering the data. After that, PET scans are selected. Then, all the scans are converted to grayscale images, and finally, all the images are joined to create a video from each patient’s scan. Next, the data extraction phase starts. In this phase, some frames are extracted from each video, and they are flattened and blended to create a row of information from each frame. Thus, a dataframe is created where each row represents a patient, and each column is a pixel value. To obtain better results, an oversampling technique is applied. In this manner, the classes are balanced. Following this, a dimensionality reduction technique is applied to reduce the number of columns produced by the previous steps and to check if this technique improves the results yielded by each model. Subsequently, the model evaluation phase begins. At this stage, two models are created: a Support Vector Machine (SVM), and a Random Forest. Ultimately, the findings are unveiled, revealing that the SVM emerged as the top-performing model, boasting an impressive 97% accuracy, 98% precision, and 97% sensitivity. Eventually, this method can be applied to detect and classify different diseases that involve PET scans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Data availability

The dataset utilized in this research is sourced from "The Cancer Imaging Archive (TCIA)," accessible at https://imaging.cancer.gov/informatics/cancer_imaging_archive.htm. TCIA provides a comprehensive collection of openly available medical images, fostering collaborative research in cancer imaging.

Code Availability

The code implemented in this study is openly available on the following URL: https://colab.research.google.com/drive/1JKq3NV4wCkN1tTbwVkU-bNmL4qt2NPpy?usp=sharing. Researchers interested in reproducing or building upon the methods employed in this paper are encouraged to access the provided code. This transparency aims to facilitate the reproducibility of our findings and promote collaborative efforts in advancing cancer imaging research.

Notes

  1. More information can be found in: https://keras.io/examples/vision/video_classification/.

  2. For more information refer to: https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html.

References

  1. Anowar F, Sadaoui S, Selim B. Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Comput Sci Rev. 2021;40:100378. https://www.sciencedirect.com/science/article/pii/S1574013721000186. Accessed 13 Dec 2022

  2. Bade BC, Dela Cruz CS. Lung cancer 2020: epidemiology, etiology, and prevention. Clin Chest Med. 2020;41(1):1–24. https://doi.org/10.1016/j.ccm.2019.10.001.

    Article  Google Scholar 

  3. Barta JA, Powell CA, Wisnivesky JP. Global epidemiology of lung cancer. Ann Glob Health. 2019;85(1):8–24

  4. Biau G, Scornet E. A random forest guided tour. TEST. 2016;25(2):197–227.

    Article  MathSciNet  MATH  Google Scholar 

  5. Cano JR, Gutiérrez PA, Krawczyk B, Woźniak M, García S. Monotonic classification: an overview on algorithms, performance measures and data sets. Neurocomputing. 2019;341:168–82.

    Article  Google Scholar 

  6. Chauhan VK, Dahiya K, Sharma A. Problem formulations and solvers in linear svm: a review. Artif Intell Rev. 2019;52(2):803–55.

    Article  Google Scholar 

  7. Chen CC, Li ST. Credit rating with a monotonicity-constrained support vector machine model. Expert Syst Appl. 2014;41(16):7235–47.

    Article  Google Scholar 

  8. Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The cancer imaging archive (TCIA): maintaining and operating a public information repository. J Digit Imaging. 2013;26(6):1045–57.

    Article  Google Scholar 

  9. Howlader N, Forjaz G, Mooradian MJ, Meza R, Kong CY, Cronin KA, Mariotto AB, Lowy DR, Feuer EJ. The effect of advances in lung-cancer treatment on population mortality. N Engl J Med. 2020;383(7):640–9. https://doi.org/10.1056/NEJMoa1916623. (pMID: 32786189).

    Article  Google Scholar 

  10. Huang H, Zheng D, Chen H, Wang Y, Chen C, Xu L, Li G, Wang Y, He X, Li W. Fusion of ct images and clinical variables based on deep learning for predicting invasiveness risk of stage i lung adenocarcinoma. Med Phys. 2022;49(10):6384–94. https://doi.org/10.1002/mp.15903.

    Article  Google Scholar 

  11. Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of support vector machine (svm) learning in cancer genomics. Cancer Genom Proteom. 2018;15(1):41–51.

    Google Scholar 

  12. Lameka K, Farwell MD, Ichise M. Chapter 11 - positron emission tomography. In: Masdeu, JC, González RG. editors. Neuroimaging part I, handbook of clinical neurology, vol. 135. Elsevier; 2016, p. 209–227. Elsevier. https://www.sciencedirect.com/science/article/pii/B9780444534859000118.

  13. Li P, Wang S, Li T, Lu J, HuangFu Y, Wang D. A large-scale CT and PET/CT dataset for lung cancer diagnosis. 2020. https://doi.org/10.7937/TCIA.2020.NNC2-0461.

  14. Liashchynskyi P, Liashchynskyi P. Grid search, random search, genetic algorithm: a big comparison for nas. 2019.

  15. Ma Y, Feng W, Wu Z, Liu M, Zhang F, Liang Z, Cui C, Huang J, Li X, Guo X. Intra-tumoural heterogeneity characterization through texture and colour analysis for differentiation of non-small cell lung carcinoma subtypes. Phys Med Biol. 2018;63(16): 165018.

    Article  Google Scholar 

  16. Makaju S, Prasad P, Alsadoon A, Singh A, Elchouemi A. Lung cancer detection using ct scan images. Proc Comput Sci. 2018;125:107–14. In: The 6th International Conference on smart computing and communications. https://www.sciencedirect.com/science/article/pii/S1877050917327801. Accessed 19 Dec 2022

  17. Nooreldeen R, Bach H. Current and future development in lung cancer diagnosis. Int J Mol Sci. 2021;22(16). https://www.mdpi.com/1422-0067/22/16/8661. Accessed 7 Jan 2023

  18. Park YJ, Choi D, Choi JY, Hyun SH. Performance evaluation of a deep learning system for differential diagnosis of lung cancer with conventional ct and fdg pet/ct using transfer learning and metadata. Clin Nucl Med. 2021;46(8):635–40.

    Article  Google Scholar 

  19. Saito T, Rehmsmeier M. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015;10(3):1–21.

    Article  Google Scholar 

  20. Schabath MB, Cote ML. Cancer progress and priorities: lung cancer. Cancer Epidemiol Biomark Prev. 2019;28(10):1563–79. https://doi.org/10.1158/1055-9965.EPI-19-0221.

    Article  Google Scholar 

  21. Soltanzadeh P, Hashemzadeh M. Rcsmote: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem. Inform Sci. 2021;542:92–111. https://www.sciencedirect.com/science/article/pii/S0020025520306794. Accessed 15 Dec 2022

  22. Tam M, Dyer T, Dissez G, Morgan TN, Hughes M, Illes J, Rasalingham R, Rasalingham S. Augmenting lung cancer diagnosis on chest radiographs: positioning artificial intelligence to improve radiologist performance. Clin Radiol. 2021;76(8):607–14. https://www.sciencedirect.com/science/article/pii/S0009926021002373. Accessed 3 Jan 2023

  23. Tanoue LT, Tanner NT, Gould MK, Silvestri GA. Lung cancer screening. Am J Respir Crit Care Med. 2015;191(1):19–33.

    Article  Google Scholar 

  24. Thai AA, Solomon BJ, Sequist LV, Gainor JF, Heist RS. Lung cancer. Lancet. 2021;398(10299):535–54. https://doi.org/10.1016/S0140-6736(21)00312-3.

    Article  Google Scholar 

  25. Wood DE, Eapen GA, Ettinger DS, Hou L, Jackman D, Kazerooni E, Klippenstein D, Lackner RP, Leard L, Leung AN, et al. Lung cancer screening. J Natl Compr Canc Netw. 2012;10(2):240–65.

    Article  Google Scholar 

  26. Yin Z, Hou J. Recent advances on svm based fault diagnosis and process monitoring in complicated industrial processes. Neurocomputing. 2016;174:643–50. https://www.sciencedirect.com/science/article/pii/S0925231215014149. Accessed 15 Dec 2022

Download references

Acknowledgements

This research has been supported by the “Sistemas Inteligentes de Soporte a la Educación Especial (SINSAE v5)” research project of the UNESCO Chair on Support Technologies for Educational Inclusion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Robles-Bykbaev.

Ethics declarations

Conflict of Interest

The authors of this paper declare that they have no conflicts of interest that could potentially influence or bias the interpretation of the research presented. There are no financial, personal, or professional relationships that might be perceived as having influenced the work reported in this manuscript. This includes, but is not limited to, financial interests, affiliations, or relationships with organizations that may have a direct or indirect interest in the subject matter discussed in the paper. We affirm that this work is conducted with integrity and in compliance with ethical standards. Any external factors that could pose a conflict of interest have been disclosed in an honest and transparent manner.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Emerging Technologies in Applied Informatics” guest edited by Hector Florez and Marcelo Leon.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jara-Gavilanes, A., Robles-Bykbaev, V. Lung Cancer Detection: A Classification Approach Utilizing Oversampling and Support Vector Machines. SN COMPUT. SCI. 5, 74 (2024). https://doi.org/10.1007/s42979-023-02432-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02432-6

Keywords

Navigation