Skip to main content

Classification Analysis of Some Cancer Types Using Machine Learning

  • Conference paper
  • First Online:
Safe, Secure, Ethical, Responsible Technologies and Emerging Applications (SAFER-TEA 2023)

Abstract

Cancer is a disease caused by changes in deoxyribonucleic acid, which attacks cells in the body, causing them to grow uncontrollably and spread to other parts of the body. Cancer can be deadly. The fact that it can develop anywhere in the body gives rise to many types of cancer. Because a good diagnosis increases the probability of administering a good treatment to save life. Therefore, to reduce the mortality rate from cancer, several diagnostic methods have been developed as the appropriate treatment option is highly dependent on the type of cancer. In this work, we address the issue of classification of some cancer types by using supervised learning methods to classify prostate cancer, lymphoma, leukaemia and small round blue cell tumour. To be more specific, we used five models: support vector machine, decision tree, random forest, K-nearest neighbours (KNN) and artificial neural network. Each cancer dataset was trained using each of the machine learning methods on the Google Colab graphics processing unit (GPU). The test samples were classified for each cancer type, and the performances of the five models were compared in terms of their percentages according to some metrics. To reduce the dimension of the data, we have incorporated a new approach that involves performing principal component analysis on our dataset. This new approach led to the discovery that the KNN method was the best according to our dataset, with 90% accuracy for the prostate and 100% for the others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sattenspiel, L.: Tropical environments, human activities, and the transmission of infectious diseases. Am. J. Phys. Anthropol. Official Publ. Am. Assoc. Phys. Anthropol. 113(S31), 3–31 (2000)

    Google Scholar 

  2. Merola, R., et al.: PCA3 in prostate cancer and tumor aggressiveness detection on 407 high-risk patients: a National Cancer Institute experience. J. Exp. Clin. Cancer Res. 34(1), 1–6 (2015)

    Article  Google Scholar 

  3. Sharma, R.: Breast cancer burden in Africa: evidence from GLOBOCAN 2018. J. Public Health 43(4), 763–771 (2021)

    Article  Google Scholar 

  4. Hernandez, B.Y., Green, M.D., Cassel, K.D., Pobutsky, A.M., Vu, V., Wilkens, L.R.: Cancer research center hotline: preview of Hawai’i Cancer facts and figures 2010. Hawaii Med. J. 69(9), 223 (2010)

    Google Scholar 

  5. Schadendorf, D., et al.: Melanoma. Lancet 392(10151), 971–984 (2018)

    Article  Google Scholar 

  6. Leung, Y.F., Cavalieri, D.: Fundamentals of cDNA microarray data analysis. Trends Genet. 19(11), 649–659 (2003)

    Article  Google Scholar 

  7. Flores, M., Hsiao, T.-H., Chiu, Y.-C., Chuang, E.Y., Huang, Y., Chen, Y.: Gene regulation, modulation, and their applications in gene expression data analysis. Adv. Bioinform. 2013 (2013)

    Google Scholar 

  8. Lee, G., Rodriguez, C., Madabhushi, A.: Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies. IEEE/ACM Trans. Comput. Biol. Bioinform. 5(3), 368–384 (2008)

    Google Scholar 

  9. Lee, C.-P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11(1), 208–213 (2011)

    Article  Google Scholar 

  10. Peng, Y., Li, W., Liu, Y.: A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification. Cancer Inf. 2, 117693510600200030 (2006)

    Google Scholar 

  11. Ye, J., Li, T., Xiong, T., Janardan, R.: Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinf. 1(4), 181–190 (2004)

    Article  Google Scholar 

  12. Sung, H., et al.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71(3), 209–249 (2021)

    Google Scholar 

  13. Siegel, R.L., Miller, K.D., Fuchs, H.E., Jemal, A.: Cancer statistics, 2022. CA Cancer J. Clin. (2022)

    Google Scholar 

  14. Idikio, H.A.: Human cancer classification: a systems biology-based model integrating morphology, cancer stem cells, proteomics, and genomics. J. Cancer 2, 107 (2011)

    Article  Google Scholar 

  15. Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)

    Article  Google Scholar 

  16. Rasool, A., Bunterngchit, C., Tiejian, L., Islam, M.R., Qu, Q., Jiang, Q.: Improved machine learning-based predictive models for breast cancer diagnosis. Int. J. Environ. Res. Public Health 19(6), 3211 (2022)

    Article  Google Scholar 

  17. African Institute for Mathematical Sciences (AIMS) Cameroon. Master’s thesis, Classification analysis of some cancer types, 29 May 2022. https://library.nexteinstein.org/thesis/classication-analysis-of-some-cancer-types/. Accessed 17 Sept 2023

  18. Makinde, O.S.: Gene expression data classification: some distance-based methods. Kuwait J. Sci. 46(3) (2019)

    Google Scholar 

  19. Dettling, M., Bühlmann, P.: Supervised clustering of genes. Genome Biol. 3(12), 1–15 (2002)

    Article  Google Scholar 

  20. Chung, D., Keles, S.: Sparse partial least squares classification for high dimensional data. Stat. Appl. Genet. Mol. Biol. 9(1) (2010)

    Google Scholar 

  21. Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  22. Khan, J., et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. med. 7(6), 673–679 (2001)

    Google Scholar 

  23. Huang, P.J.: Classification of Imbalanced Data Using Synthetic Over-Sampling Techniques. University of California, Los Angeles (2015)

    Google Scholar 

  24. Raîche, G., Walls, T.A., Magis, D., Riopel, M., Blais, J.G.: Non-Graphical Solutions for Cattell’s Scree Test. Hogrefe Publishing, Methodology (2013)

    Book  Google Scholar 

  25. Saporta, G., Keita, N.N.: Principal component analysis: application to statistical process control. ISTE (2009)

    Google Scholar 

  26. Charbuty, B., Abdulazeez, A.: Classification based on decision tree algorithm for machine learning. J. Appl. Sci. Technol. Trends 2(01), 20–28 (2021)

    Article  Google Scholar 

  27. Rokach, L., Maimon, O.: Decision trees. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 165–192. Springer, Boston (2005). https://doi.org/10.1007/0-387-25465-X_9

  28. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

  29. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7

  30. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    Article  Google Scholar 

  31. Yu, W., Liu, T., Valdez, R., Gwinn, M., Khoury, M.J.: Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Med. Inform. Decis. Mak. 10(1), 1–7 (2010)

    Article  Google Scholar 

  32. Rathgamage Don, D.P.W.: Multiclass Classification Using Support Vector Machines (2018)

    Google Scholar 

  33. Knerr, S., Personnaz, L., Dreyfus, G.: Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series, vol. 68, pp. 41–50. Springer, Heidelberg (1990). https://doi.org/10.1007/978-3-642-76153-9_5

  34. KreBel, Y.H.G.: Advances in Kernel Methods, Pairwise Classification and Support Vector Machines. MIT Press (1998)

    Google Scholar 

  35. Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)

    Article  Google Scholar 

  36. Haykin, S.: Neural Networks and Learning Machines, 3/E. Pearson Education India (2009)

    Google Scholar 

  37. Sanaei, A., Yousefi, S.H., Naseri, A., Khishvand, M.: A novel correlation for prediction of gas viscosity. Energy Sources Part A Recovery Utilization Environ. Eff. 37(18), 1943–1953 (2015)

    Google Scholar 

  38. Duda, R.O., Hart, P.E., et al.: Pattern Classification. Wiley (2006)

    Google Scholar 

  39. Grandini, M., Bagli, E., Visani, G.: Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Scott Ulrich Jemea Ebolo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ebolo, S.U.J., Makinde, O.S., Mpinda, B.N. (2024). Classification Analysis of Some Cancer Types Using Machine Learning. In: Tchakounte, F., Atemkeng, M., Rajagopalan, R.P. (eds) Safe, Secure, Ethical, Responsible Technologies and Emerging Applications. SAFER-TEA 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 566. Springer, Cham. https://doi.org/10.1007/978-3-031-56396-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56396-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56395-9

  • Online ISBN: 978-3-031-56396-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics