Abstract
Cancer is a disease caused by changes in deoxyribonucleic acid, which attacks cells in the body, causing them to grow uncontrollably and spread to other parts of the body. Cancer can be deadly. The fact that it can develop anywhere in the body gives rise to many types of cancer. Because a good diagnosis increases the probability of administering a good treatment to save life. Therefore, to reduce the mortality rate from cancer, several diagnostic methods have been developed as the appropriate treatment option is highly dependent on the type of cancer. In this work, we address the issue of classification of some cancer types by using supervised learning methods to classify prostate cancer, lymphoma, leukaemia and small round blue cell tumour. To be more specific, we used five models: support vector machine, decision tree, random forest, K-nearest neighbours (KNN) and artificial neural network. Each cancer dataset was trained using each of the machine learning methods on the Google Colab graphics processing unit (GPU). The test samples were classified for each cancer type, and the performances of the five models were compared in terms of their percentages according to some metrics. To reduce the dimension of the data, we have incorporated a new approach that involves performing principal component analysis on our dataset. This new approach led to the discovery that the KNN method was the best according to our dataset, with 90% accuracy for the prostate and 100% for the others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sattenspiel, L.: Tropical environments, human activities, and the transmission of infectious diseases. Am. J. Phys. Anthropol. Official Publ. Am. Assoc. Phys. Anthropol. 113(S31), 3–31 (2000)
Merola, R., et al.: PCA3 in prostate cancer and tumor aggressiveness detection on 407 high-risk patients: a National Cancer Institute experience. J. Exp. Clin. Cancer Res. 34(1), 1–6 (2015)
Sharma, R.: Breast cancer burden in Africa: evidence from GLOBOCAN 2018. J. Public Health 43(4), 763–771 (2021)
Hernandez, B.Y., Green, M.D., Cassel, K.D., Pobutsky, A.M., Vu, V., Wilkens, L.R.: Cancer research center hotline: preview of Hawai’i Cancer facts and figures 2010. Hawaii Med. J. 69(9), 223 (2010)
Schadendorf, D., et al.: Melanoma. Lancet 392(10151), 971–984 (2018)
Leung, Y.F., Cavalieri, D.: Fundamentals of cDNA microarray data analysis. Trends Genet. 19(11), 649–659 (2003)
Flores, M., Hsiao, T.-H., Chiu, Y.-C., Chuang, E.Y., Huang, Y., Chen, Y.: Gene regulation, modulation, and their applications in gene expression data analysis. Adv. Bioinform. 2013 (2013)
Lee, G., Rodriguez, C., Madabhushi, A.: Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies. IEEE/ACM Trans. Comput. Biol. Bioinform. 5(3), 368–384 (2008)
Lee, C.-P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11(1), 208–213 (2011)
Peng, Y., Li, W., Liu, Y.: A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification. Cancer Inf. 2, 117693510600200030 (2006)
Ye, J., Li, T., Xiong, T., Janardan, R.: Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinf. 1(4), 181–190 (2004)
Sung, H., et al.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71(3), 209–249 (2021)
Siegel, R.L., Miller, K.D., Fuchs, H.E., Jemal, A.: Cancer statistics, 2022. CA Cancer J. Clin. (2022)
Idikio, H.A.: Human cancer classification: a systems biology-based model integrating morphology, cancer stem cells, proteomics, and genomics. J. Cancer 2, 107 (2011)
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Rasool, A., Bunterngchit, C., Tiejian, L., Islam, M.R., Qu, Q., Jiang, Q.: Improved machine learning-based predictive models for breast cancer diagnosis. Int. J. Environ. Res. Public Health 19(6), 3211 (2022)
African Institute for Mathematical Sciences (AIMS) Cameroon. Master’s thesis, Classification analysis of some cancer types, 29 May 2022. https://library.nexteinstein.org/thesis/classication-analysis-of-some-cancer-types/. Accessed 17 Sept 2023
Makinde, O.S.: Gene expression data classification: some distance-based methods. Kuwait J. Sci. 46(3) (2019)
Dettling, M., Bühlmann, P.: Supervised clustering of genes. Genome Biol. 3(12), 1–15 (2002)
Chung, D., Keles, S.: Sparse partial least squares classification for high dimensional data. Stat. Appl. Genet. Mol. Biol. 9(1) (2010)
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Khan, J., et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. med. 7(6), 673–679 (2001)
Huang, P.J.: Classification of Imbalanced Data Using Synthetic Over-Sampling Techniques. University of California, Los Angeles (2015)
Raîche, G., Walls, T.A., Magis, D., Riopel, M., Blais, J.G.: Non-Graphical Solutions for Cattell’s Scree Test. Hogrefe Publishing, Methodology (2013)
Saporta, G., Keita, N.N.: Principal component analysis: application to statistical process control. ISTE (2009)
Charbuty, B., Abdulazeez, A.: Classification based on decision tree algorithm for machine learning. J. Appl. Sci. Technol. Trends 2(01), 20–28 (2021)
Rokach, L., Maimon, O.: Decision trees. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 165–192. Springer, Boston (2005). https://doi.org/10.1007/0-387-25465-X_9
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Yu, W., Liu, T., Valdez, R., Gwinn, M., Khoury, M.J.: Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Med. Inform. Decis. Mak. 10(1), 1–7 (2010)
Rathgamage Don, D.P.W.: Multiclass Classification Using Support Vector Machines (2018)
Knerr, S., Personnaz, L., Dreyfus, G.: Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series, vol. 68, pp. 41–50. Springer, Heidelberg (1990). https://doi.org/10.1007/978-3-642-76153-9_5
KreBel, Y.H.G.: Advances in Kernel Methods, Pairwise Classification and Support Vector Machines. MIT Press (1998)
Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
Haykin, S.: Neural Networks and Learning Machines, 3/E. Pearson Education India (2009)
Sanaei, A., Yousefi, S.H., Naseri, A., Khishvand, M.: A novel correlation for prediction of gas viscosity. Energy Sources Part A Recovery Utilization Environ. Eff. 37(18), 1943–1953 (2015)
Duda, R.O., Hart, P.E., et al.: Pattern Classification. Wiley (2006)
Grandini, M., Bagli, E., Visani, G.: Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Ebolo, S.U.J., Makinde, O.S., Mpinda, B.N. (2024). Classification Analysis of Some Cancer Types Using Machine Learning. In: Tchakounte, F., Atemkeng, M., Rajagopalan, R.P. (eds) Safe, Secure, Ethical, Responsible Technologies and Emerging Applications. SAFER-TEA 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 566. Springer, Cham. https://doi.org/10.1007/978-3-031-56396-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-56396-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56395-9
Online ISBN: 978-3-031-56396-6
eBook Packages: Computer ScienceComputer Science (R0)