Abstract
The prominent objective of cervical carcinoma (CC) prediction lies in the optimal feature selection and balanced data. The problem of majority and minority class samples are solved in the proposed work. The objective of the work lies in solving imbalanced data distribution, and of risk factor validation in cervical cancer prediction. Feature Weighted Synthetic Minority Oversampling Technique (FWSMOTE) algorithm solves the minority class issues. The missing data imputation is performed by the Mode and Median Missing Data imputation. For optimal feature selection, Hilbert–Schmidt Independence Criterion with Bacteria Forage Optimization (HSICBFO) algorithm is implemented. Ensemble Support Vector Machine with Interpolation classifier is used for cancer prediction. Various measures are deployed to analyze the performance of the proposed classifier and produces 94.77%, 93.38%, 93.86%, 94.07%, 93.60% and 93.62% for precision, recall, specificity, F-Measure, accuracy and G-mean that helps in identifying the risk level of cervical carcinoma development and guidance for further diagnosis.
Similar content being viewed by others
References
Anagaw A, Chang Y (2019) A new complement naïve Bayesian approach for biomedical data classification. J Ambient Intell Hum Comput 10:3889–3897. https://doi.org/10.1007/s12652-018-1160-1
Chandra B, Gupta M (2011) An efficient statistical feature selection approach for classification of gene expression data. J Biomed Inform 44(4):529–535
Chen H, Zhu Y, Hu K (2009) Cooperative bacterial foraging optimization. Discret Dyn Nat Soc 815247:1–17
Chen R, Shi YH, Zhang H, Hu JY, Luo Y (2018) Systematic prediction of target genes and pathways in cervical cancer from microRNA expression data. Oncol Lett 15(6):9994–10000
Claesen M, Smet FD, Suykens JA, Moor BD (2014) Ensemble SVM: a library for ensemble learning using support vector machines. J Mach Learn Res 15:141–145
Deng SP, Zhu L, Huang DS (2016) Predicting hub genes associated with cervical cancer through gene co-expression networks. IEEE/ACM Trans Comput Biol Bioinf 13(1):27–35
DiLeo MV, Strahan GD, Den Bakker M, Hoekenga OA (2011) Weighted correlation network analysis (WGCNA) applied to the tomato fruit metabolome. PLoS ONE 6(10):e26683
Fatlawi HK (2007) Enhanced classification model for cervical cancer dataset based on cost sensitive classifier. Int J Comput Tech 4(4):115–120
Fernandes K, Cardoso JS, Fernandes J (2017) Transfer learning with partial observability applied to cervical cancer screening. Proc Iberian Conf Pattern Recognit Image Anal 10255:243–250 (Springer International Publishing AG LNCS)
Geeitha S, Thangamani M (2018) Incorporating EBO-HSIC with SVM for gene selection associated with cervical cancer classification. J Med Syst Springer 42(11):225
Geeitha S, Thangamani M (2020) A cognizant study of machine learning in predicting cervical cancer at various levels—a data mining concept. Int J Emerg Technol 11(1):23–28
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing (ICIC) Springer Berlin Heidelberg Part I, LNCS, Vol. 3644, pp. 878–887
Hoi CH, Lyu MR (2004) Group-based relevance feedback with support vector machine ensembles. Proceedings of the 17th International Conference on Pattern Recognition Cambridge UK. Vol. 3, pp. 874–877
Hu X, Schwarz JK, Lewis JS Jr, Huettner PC, Rader JS, Deasy JO, Grigsby PW, Wang X (2010) A microRNA expression signature for cervical cancer prognosis. Cancer Res 70(4):1441–1448
Huang DS, Yu HJ (2013) Normalized feature vectors: a novel alignment-free sequence comparison method based on the numbers of adjacent amino acids IEEE/ACM Trans. Comput Biol Bioinformat 10(2):457–467
Itahana Y, Han R, Barbier S, Lei Z, Rozen S, Itahana K (2015) The uric acid transporter SLC2A9 is a direct target gene of the tumor suppressor p53 contributing to antioxidant defense. Oncogene 34(14):1799–1810
Jeatrakul P, Wong KW, Fung CC (2010) Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. In International Conference on Neural Information Processing (ICONIP), Springer, Berlin, Heidelberg, part II, LNCS Vol. 6444, pp. 152–159
Khan A, Shah R, Imran M et al (2019) An alternative approach to neural network training based on hybrid bio meta-heuristic algorithm. J Ambient Intell Human Comput 10:3821–3830
Kori Arga M (2018) Potential biomarkers and therapeutic targets in cervical cancer: Insights from the meta-analysis of transcriptomics data within network biomedicine perspective. PLoS ONE 13(7):e0200717
Kour P, Lal M, Panjaliya R, Dogra V, Gupta S (2010) Study of the risk factors associated with cervical caner. Biomed Pharmacol J 3(1):179–182
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinf 9(1):1–13
Langfelder P, Zhang B, Horvath S (2007) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5):719–720
Lo SL, Chiong R, Cornforth D (2015) Using support vector machine ensembles for target audience classification on Twitter. PLoS ONE 10(4):e0122855
Luengo J, Fernández A, Garcia S, Herrera F (2011) Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling. Soft Comput 15(10):1909–1936
Ly S, Charles C, Degre A (2013) Different methods for spatial interpolation of rainfall data for operational hydrology and hydrological modeling at watershed scale: a review. Biotechnol Agron Soc Environ 17(2):392–406
Maciejewski T, Stefanowski J (2011) Local neighbourhood extension of SMOTE for mining imbalanced data. IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111
Martin CM, Astbury K, McEvoy L, O'Toole S, Sheils O, O'Leary JJ (2009) Gene expression profiling in cervical cancer: Identification of novel markers for disease diagnosis and therapy. Methods Mol Biol 511:333–359
Melgani F, Bruzzone L (2004) Classification of hyper spectral remote sensing images with support vector machines. IEEE Trans Geo Sci Remote Sens 42(8):1778–1790
Nandagopal V, Geeitha S, Vinoth Kumar K, Anbarasi J (2019) Feasible analysis of gene expression—a computational based classification for breast cancer. Measurement (Elsevier) 140:120–125
Purnami SW, Khasanah PM, Sumartini SH, Chosuvivatwong V, Sriplung H (2016) Cervical cancer survival prediction using hybrid of SMOTE, CART and smooth support vector machine. In: AIP conference proceedings, AIP Publishing 1723(1)
Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM (2004) Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci 101(25):9309–9314
Segal E, Friedman N, Koller D, Regev A (2004) A module map showing conditional activity of expression modules in cancer. Nat Genet 36(10):1090–1098
Sharma M, Bruni L, Diaz M, Castellsague X, de Sanjose S, Bosch FX, Kim JJ (2013) Using HPV prevalence to predict cervical cancer incidence. Int J Cancer 132(8):1895–1900
Singh S, Narayan N, Sinha R, Sinha P, Sinha VP, Upadhye JJ (2018) Awareness about cervical cancer risk factos and symptoms. Int J Reprod Contracept Obstet Gynecol 7(12):4987–4991
Sorensen L, Nielsen M, Alzheimer's Disease Neuro imaging Initiative (2018) Ensemble support vector machine classification of dementia using structural MRI and mini-mental state examination. J Neurosci Methods 302:66–74
Tan MS, Chang SW, Cheah PL, Yap HJ (2018) Integrative machine learning analysis of multiple gene expression profiles in cervical cancer. PeerJ 6:e5285
Tjalma WA, Van Waes TR, Van den Eeden LE, Bogers JJ (2005) Role of human papillomavirus in the carcinogenesis of squamous cell carcinoma and adenocarcinoma of the cervix. Best Pract Res Clin Obstetr Gynaecol 19(4):469–483
Van der Laan M, Pollard K, Bryan J (2003) A new partitioning around medoids algorithm. J Stat Comput Simul 73(8):575–584
William TC, DS Miller (2012) Adenocarcinoma of the uterine corpus. Clin Gynecol Oncol, Eight Edition, Elsevier, Philadelphia, PA, ISBN No. 9780323074193, pp. 141–174
Wu W, Zhou H (2017) Data-driven diagnosis of cervical cancer with support vector machine-based approaches. IEEE Access. ISSN:2169–3536. Vol. 5, pp.25189–25195
Zhang YX, Zhao YL (2016) Pathogenic network analysis predicts candidate genes for cervical cancer. Comput Math Methods Med 3186051:1–8
Zheng CH, Zhang L, Ng VTY, Shiu SCK, Huang DS (2011) Molecular pattern discovery based on penalized matrix decomposition. IEEE/ACM Trans Comput Biol Bioinformat 8(6):1592–1603
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Geeitha, S., Thangamani, M. Integrating HSICBFO and FWSMOTE algorithm-prediction through risk factors in cervical cancer. J Ambient Intell Human Comput 12, 3213–3225 (2021). https://doi.org/10.1007/s12652-020-02194-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02194-6