Skip to main content
Log in

A reliable method for colorectal cancer prediction based on feature selection and support vector machine

  • ORIGINAL ARTICLE
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Colorectal cancer (CRC) is a common cancer responsible for approximately 600,000 deaths per year worldwide. Thus, it is very important to find the related factors and detect the cancer accurately. However, timely and accurate prediction of the disease is challenging. In this study, we build an integrated model based on logistic regression (LR) and support vector machine (SVM) to classify the CRC into cancer and normal samples. From various factors, human location, age, gender, BMI, and cancer tumor type, tumor grade, and DNA, of the cancer, we select the most significant factors (p < 0.05) using logistic regression as main features, and with these features, a grid-search SVM model is designed using different kernel types (Linear, radial basis function (RBF), Sigmoid, and Polynomial). The result of the logistic regression indicates that the Firmicutes (AUC 0.918), Bacteroidetes (AUC 0.856), body mass index (BMI) (AUC 0.777), and age (AUC 0.710) and their combined factors (AUC 0.942) are effective for CRC detection. And the best kernel type is RBF, which achieves an accuracy of 90.1% when k = 5, and 91.2% when k = 10. This study provides a new method for colorectal cancer prediction based on independent risky factors.

Flow chart depicting the method adopted in the study. LR (logistic regression) and ROC curve are used to select independent features as input of SVM. SVM kernel selection aims to find the best kernel function for classification by comparing Linear, RBF, Sigmoid, and Polynomial kernel types of SVM, and the result shows the best kernel is RBF. Classification performance of LR + RF, LR + NB, LR + KNN, and LR + ANNs models are compared with LR + SVM. After these steps, the cancer and healthy individuals can be classified, and the best model is selected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Zadeh SA, Sj SMC, Mohammadi Z (2017) A novel and reliable computational intelligence system for breast cancer detection. Germ J Med Biol Eng Comp 9:1–12

    Google Scholar 

  2. Pal JK, Ray SS, Pal SK (2015) Identifying relevant group of miRNAs in cancer using fuzzy mutual information. Germ J Medical & Biological Engineering & Computing 54:701–710

    Article  Google Scholar 

  3. Chan AT, Giovannucci EL (2010) Primary prevention of colorectal cancer. J Gastroenterol 138:2029–2043

    Article  CAS  Google Scholar 

  4. Saleh M, Trinchieri G (2010) Innate immune mechanisms of colitis and colitis-associated colorectal cancer. N Eng J Nature Rev Immunol 11:9–20

    Article  CAS  Google Scholar 

  5. Brennan CA, Garrett WS (2016) Gut microbiota, inflammation, and colorectal cancer. US J Ann Rev Microbiol 70:395–411

    Article  CAS  Google Scholar 

  6. Chatterjee S, Dey N, Shi F, Ashour AS et al (2017) Clinical application of modified bag-of-features coupled with hybrid neural-based classifier in dengue fever classification using gene expression data. Germ J Med Biol Eng Comp:1–12

  7. Ay A, Gong D, Kahveci T (2014) Network-based prediction of cancer under genetic storm. J Cancer Inform 13:15–31

    Google Scholar 

  8. Jung KJ, Won D, Jeon C et al (2015) A colorectal cancer prediction model using traditional and genetic risk scores in Koreans. N Eng J BMC Genet 16:1–7

    Article  CAS  Google Scholar 

  9. Cubiella J, Vega P, Salve M et al (2016) Development and external validation of a fecal immunochemical test-based prediction model for colorectal cancer detection in symptomatic patients. J BMC Med 14:128–140

    Article  CAS  Google Scholar 

  10. Coppedè F, Grossi E, Lopomo A et al (2015) Application of artificial neural networks to link genetic and environmental factors to DNA methylation in colorectal cancer. N Eng J Epigenomics 7:175–186

    Article  CAS  Google Scholar 

  11. Peng Y, Zhai Z, Li Z et al (2015) Role of blood tumor markers in predicting metastasis and local recurrence after curative resection of colon cancer. J Int J Clin Exp Med 8:982–990

    CAS  Google Scholar 

  12. Juan M, Philippe W, Nermin G et al (2016) An original stepwise multilevel logistic regression analysis of discriminatory accuracy: the case of neighborhoods and health. US J Plos One 11:e0153778

    Article  CAS  Google Scholar 

  13. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. US J Mach Learn 46:389–422

    Article  Google Scholar 

  14. Ahmad F, Mat Isa NA, Hussain Z, Osman MK, Sulaiman SN (2015) GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer. J Pattern Analysis Appl 18:861–870

    Article  Google Scholar 

  15. Peng S, Xu Q, Ling XB, Peng X, du W, Chen L (2003) Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. J Febs Lett 555:358–362

    Article  CAS  Google Scholar 

  16. Liu W, Zheng W L, Lu B L (2016) Emotion recognition using multimodal deep learning

  17. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. US J Inform Sci 282:111–135

    Article  Google Scholar 

  18. Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. N Eng J Bioinform 20:2429–2437

    Article  CAS  Google Scholar 

  19. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. J ACM Trans Intel Systems & Technol 2:1–27

    Article  Google Scholar 

  20. Park SI, Tae-Ho O (2016) Application of receiver operating characteristic (ROC) curve for evaluation of diagnostic test performance. J Vet Clin 33:97–108

    Article  Google Scholar 

  21. Kim KA, Choi JY, Yoo TK, Kim SK, Chung KS, Kim DW (2013) Mortality prediction of rats in acute hemorrhagic shock using machine learning techniques. Germ J Med Biol Eng Comp 51:1059–1067

    Article  Google Scholar 

  22. Chowdhury A R, Chatterjee T, Banerjee S (2018) A random forest classifier-based approach in the detection of abnormalities in the retina. Germ J Med Biol Eng Comp Available at doi:https://doi.org/10.1007/s11517-018-1878-0

  23. Zhang H, Yu P, Xiang ML, Li XB, Kong WB, Ma JY, Wang JL, Zhang JP, Zhang J (2016) Prediction of drug-induced eosinophilia adverse effect by using SVM and naïve Bayesian approaches. Germ J Med Biol Eng Comp 54(2–3):361–369

    Article  Google Scholar 

  24. Zhang S, Li X, Zong M et al (2018) Efficient KNN classification with different numbers of nearest neighbors. US J IEEE Trans Neural Networks Learn Systems (99):1–12

  25. Bertolaccini L, Solli P, Pardolesi A, Pasini A (2017) An overview of the use of artificial neural networks in lung cancer research. J Thorac Dis 9(4):924–931

    Article  PubMed  PubMed Central  Google Scholar 

  26. Siegel R, DeSantis C, Jemal A (2014) Colorectal cancer statistics, 2014. J CA: Cancer J Clin 64:104–117

    Google Scholar 

  27. Lee J, Meyerhardt JA, Giovannucci E, Jeon JY (2015) Association between body mass index and prognosis of colorectal cancer: a meta-analysis of prospective cohort studies. US J PloS one 10:e0120706

    Article  CAS  Google Scholar 

  28. Chu CM, Yao CT, Chang YT et al (2014) Gene expression profiling of colorectal tumors and normal mucosa by microarrays meta-analysis using prediction analysis of microarray, artificial neural network, classification, and regression trees. J Dis Markers 2014:459–462

    Google Scholar 

  29. Orang AV, Barzegari A (2014) MicroRNAs in colorectal cancer: from diagnosis to targeted therapy. Asian Pac J Cancer Prev 15:6989–6999

    Article  PubMed  Google Scholar 

  30. Philip AK, Lubner MG, Harms B (2011) Computed tomographic colonography. J Surg Clin North Am 91:127–139

    Article  Google Scholar 

  31. Zhang H, Qi J, Wu YQ, Zhang P, Jiang J, Wang QX, Zhu YQ (2014) Accuracy of early detection of colorectal tumors by stool methylation markers: a meta-analysis. World J Gastroenterol 20:14040–14050

    Article  PubMed  PubMed Central  Google Scholar 

  32. Ip S, Sokoro AA, Kaita L, Ruiz C, McIntyre E, Singh H (2014) Use of fecal occult blood testing in hospitalized patients: results of an audit. Can J Gastroenterol Hepatol 28:489–494

    Article  PubMed  PubMed Central  Google Scholar 

  33. Li H, Jin Z, Li X et al (2017) Associations between single-nucleotide polymorphisms and inflammatory bowel disease-associated colorectal cancers in inflammatory bowel disease patients: a meta-analysis. J Clinical & Transl Oncol 19:1–10

    Article  CAS  Google Scholar 

  34. Zhang B, Liang XL, Gao HY et al (2016) Models of logistic regression analysis, support vector machine, and back-propagation neural network based on serum tumor markers in colorectal cancer diagnosis. J Genetics Mol Res 15:1–10

    Google Scholar 

  35. Zeller G, Tap J, Voigt AY, Sunagawa S, Kultima JR, Costea PI, Amiot A, Bohm J, Brunetti F, Habermann N, Hercog R, Koch M, Luciani A, Mende DR, Schneider MA, Schrotz-King P, Tournigand C, Tran van Nhieu J, Yamada T, Zimmermann J, Benes V, Kloor M, Ulrich CM, von Knebel Doeberitz M, Sobhani I, Bork P (2014) Potential of fecal microbiota for early-stage detection of colorectal cancer. US J Mol Systems Biol 10:766–783

    Article  CAS  Google Scholar 

  36. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. N Eng J Bioinformatics 30:2114–2120

    Article  CAS  Google Scholar 

  37. Truong DT, Franzosa EA, Tickle EL et al (2015) MetaPhlAn2 for enhanced metagenomic taxonomic profiling. US J Nat Methods 12:902–903

    Article  CAS  Google Scholar 

  38. Vincent C, Manges AR (2015) Antimicrobial use, human gut microbiota and Clostridium difficile colonization and infection. J Antibiotics 4:230–253

    Article  CAS  Google Scholar 

  39. Endesfelder D, zu-Castell W, Ardissone A et al (2014) Compromised gut microbiota networks in children with anti-islet cell autoimmunity. US J Diabetes DB_131676 63:2006–2014

    CAS  Google Scholar 

  40. Gao R, Gao Z, Huang L, Qin H (2017) Gut microbiota and colorectal cancer. Eur J Eur J Clin Microbiol Infect Dis 36:1–13

    Article  Google Scholar 

  41. Zeevi D, Korem T, Zmora N, Israeli D, Rothschild D, Weinberger A, Ben-Yacov O, Lador D, Avnit-Sagi T, Lotan-Pompan M, Suez J, Mahdi JA, Matot E, Malka G, Kosower N, Rein M, Zilberman-Schapira G, Dohnalová L, Pevsner-Fischer M, Bikovsky R, Halpern Z, Elinav E, Segal E (2015) Personalized nutrition by prediction of glycemic responses. US J Cell 163:1079–1094

    Article  CAS  Google Scholar 

  42. Schmid D, Leitzmann M F (2014) Television viewing and time spent sedentary in relation to cancer risk: a meta-analysis. J Natl Cancer Instit

  43. Emmerzaal TL, Kiliaan AJ, Gustafson DR (2015) 2003-2013: a decade of body mass index, Alzheimer's disease, and dementia. J. J Alzheimers Dis 43:739–755

    Article  PubMed  Google Scholar 

  44. Alfa-Wali M, Boniface S, Sharma A et al (2015) Metabolic syndrome (Mets) and risk of colorectal cancer (CRC): a systematic review and meta-analysis. J World J Surg Med Radiat Oncol 4:41–52

    Google Scholar 

  45. Sears CL, Garrett WS (2014) Microbes, microbiota, and colon cancer. US J Cell Host Microbe 15:317–328

    Article  CAS  Google Scholar 

  46. Zhu Q, Jin Z, Wu W, Gao R et al (2014) Analysis of the intestinal lumen microbiota in an animal model of colorectal cancer. US J PLoS One e90849

  47. Zhao M, Fu C, Ji L, Tang K, Zhou M (2011) Feature selection and parameter optimization for support vector machines: a new approach based on genetic algorithm with feature chromosomes. J Expert Syst App l38:5197–5204

    Article  Google Scholar 

  48. Hu X, Wong KK, Young GS, Guo L, Wong ST (2011) Support vector machine multiparametric MRI identification of pseudoprogression from tumor recurrence in patients with resected glioblastoma. US J Journal of Magnetic Resonance Imaging 33:296–305

    Article  Google Scholar 

  49. Zhang H, Yu P, Xiang ML, Li XB, Kong WB, Ma JY, Wang JL, Zhang JP, Zhang J (2016) Prediction of drug-induced eosinophilia adverse effect by using SVM and naive Bayesian approaches. Germ J Medical & Biological Engineering & Computing 54:361–370

    Article  Google Scholar 

  50. Chen T, Cao Y, Zhang Y et al Random forest in clinical metabolomics for phenotypic discrimination and biomarker selection. Evidence-Based Complementray and Alternative Medicine 2013, 2013:298183–298193

  51. Saccá V, Campolo M, Mirarchi D et al (2018) On the classification of EEG signal by using an SVM based algorithm

  52. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118

    Article  CAS  Google Scholar 

Download references

Funding

This research is supported by the National Natural Science Foundation of China (61876102, 61472232, 61572300, 61402270, 61602286), Taishan Scholar Program of Shandong Province in China (TSHW201502038), and Natural Science Foundation of Shandong Province in China (ZR2016FB13).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, D., Liu, H., Zheng, Y. et al. A reliable method for colorectal cancer prediction based on feature selection and support vector machine. Med Biol Eng Comput 57, 901–912 (2019). https://doi.org/10.1007/s11517-018-1930-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-018-1930-0

Keywords

Navigation