Abstract
Cervical cancer is the fourth most communal malignant disease amongst women worldwide. In maximum circumstances, cervical cancer indications are not perceptible at its initial stages. There are a proportion of features that intensify the threat of emerging cervical cancer like human papilloma virus, sexual transmitted diseases, and smoking. Ascertaining those features and constructing a classification model to categorize, if the cases are cervical cancer or not is an existing challenging research. This learning intentions at using cervical cancer risk features to build classification model using Random Forest (RF) classification technique with the synthetic minority oversampling technique (SMOTE) and two feature reduction techniques recursive feature elimination and principle component analysis (PCA). Utmost medical data sets are frequently imbalanced since the number of patients is considerably fewer than the number of non-patients. For the imbalance of the used data set, SMOTE is cast-off to solve this problem. The data set comprises of 32 risk factors and four objective variables: Hinselmann, Schiller, Cytology and Biopsy. Accuracy, Sensitivity, Specificity, PPA and NPA of the four variables remains accurate after SMOTE when compared with values obtained before SMOTE. An RSOnto ontology has been created to visualize the progress in classification performance.
Similar content being viewed by others
References
Cancer Facts & Figures, American Cancer Society, Atlanta, GA, USA, 2018.
Saha, A., Chaudhury, A. N., Bhowmik, P., and Chatterjee, R., Awareness of cervical cancer among female students of premier colleges in Kolkata, India. Asian Paci c J. Cancer Prevention 11(4):1085 1090, 2010.
El-Moselhy, E. A., Borg, H. M., and Atlam, S. A., Cervical cancer: Sociode-mographic and clinical risk factors among adult Egyptian females. J. Oncol. Res. Treat. 1(1):7, 2016.
Siegel, R. L., Miller, K. D., and Jemal, A., Cancer statistics, 2018. CA,Cancer J. Clin. 68(1):7 30, Jan. 2018.
Vimal, S., Kalaivani, L., and Kaliappan, M., Collaborative approach on mitigating spectrum sensing data hijack attack and dynamic spectrum allocation based on CASG modeling in wireless cognitive radio networks. Cluster Computing, 2017. https://doi.org/10.1007/s10586-017-1092-0.
Mariappan. E, Kaliappan. M, Vimal S, “Energy Efficient Routing protocol using Grover’s searching algorithm using MANET”, Asian Journal of Information Technology, Vol: 15, no.24, 2016.
Kaliappan, M., and Paramasivan, B., Enhancing secure routing in Mobile Ad Hoc Networks using a Dynamic Bayesian Signalling Game model. Journal of Computers & Electrical Engineering 41:301–313, 2015.
B. Paramasivan, M.J VijuPrakash, M. Kaliappan, 2015 Development of a Secure Routing Protocol usingGame Theory Model in Mobile Ad Hoc Networks, Journal of Communications and Networks, Vol. 17, No. 1
Kaliappan, M., Augustine, S., and Paramasivan, B., Enhancing energy efficiency and load balancing in mobile ad hoc network using dynamic genetic algorithms. Journal of Network and Computer Applications 73:35–43, 2016.
SudhakarIlango, S., Vimal, S., Kaliappan, M., and Subbulakshmi, P., Optimization using Artificial Bee Colony based clustering approach for big data. Cluster Computing. https://doi.org/10.1007/s10586-017-1571-3.
Tseng, C.-J., Lu, C.-J., Chang, C.-C., and Chen, G.-D., Application of machine learning to predict the recurrence-proneness for cervical cancer. Neural Comput. Appl. 24(6):1311 1316, May 2014.
Hu, B. et al., A risk evaluation model of cervical cancer based on etiol-ogy and human leukocyte antigen allele susceptibility. Int. J. InfectionDiseases 28:8 12, 2014.
Sharma, S., Cervical cancer stage prediction using decision tree approach of machine learning. Int. J. Adv. Res. Comput. Commun. Eng. 5(4):345 348, 2016.
Sobar, S., Machmud, R., and Wijaya, A., Behavior determinant based cervical cancer early detection with machine learning algorithm, in Proc.4th Int. Conf. Internet Services Technol. Inf. Eng., vol. 4, pp. 3120 3123, Jun. 2016.
Kannan, N., Sivasubramanian, S., Kaliappan, M., Vimal, S., and Suresh, A., Predictive big data analytic on demonetization data using support vector machine. Cluster Comput, 2018. https://doi.org/10.1007/s10586-018-2384-8 March 2018.
Wu, W., and Zhou, H., Data-driven diagnosis of cervical cancer with support vector machine-based approaches. IEEE Access 5:25189 25195, 2017.
Lin, W.-Z., Fang, J.-A., Xiao, X., and Chou, K.-C., iDNA-Prot: Identica-tion of DNA binding proteins using random forest with grey model. PLoSONE 6(9):e24756, Sep. 2011.
Seera, M., and Lim, C. P., A hybrid intelligent system for medical data clas-sication. Expert Syst. Appl. 41(5):2239 2249, Apr. 2014.
Breiman, L., Random forests. Mach. Learn. 45(1):5–32, 2001.
Biau, G., Analysis of a random forests model, J. Mach. Learn. Res., vol. 13, pp. 1063 1095, Apr. 2012.
Breiman, L., Friedman, J. H., Olshen, R., and Stone, C. J., ClassicationandRegression Trees. Belmont, CA, USA: Wadsworth, 1984.
Genuer, R., Poggi, J.-M., and Tuleau, C., Random forests: Some method-ological insights, INRIA, Saclay, France, Res. Rep. RR-6729, Nov. 2008.
Liaw, A., and Wiener, M., Classication and regression by random forest. R Newslett 2(3):18 22, 2002.
Suresh, A., Udendhran, R., Balamurgan, M. et al., J Med Syst 43(165), 2019. https://doi.org/10.1007/s10916-019-1302-9.
Suresh, A., Udendhran, R., and Balamurgan, M., Soft Comput, 2019. https://doi.org/10.1007/s00500-019-04066-4.
Kotu, V., and Deshpande, B., Predictive Analytics and Data Mining. San Mateo, CA, USA: Morgan Kaufmann, 2015, 63 163.
Kavitha, R. and Kannan, E., An efcient framework for heart disease clas-sication using feature extraction and feature selection technique in data mining, in Proc. Int. Conf. Emerg. Trends Eng., Technol. Sci. (ICETETS), Pudukkottai, India, pp. 1 5 2016.
Zhang, C., Li, Y., Yu, Z., and Tian, F., Feature selection of power system transient stability assessment based on random forest and recursive fea-ture elimination, in Proc. IEEE PES Asia Paci c Power Energy Eng.Conf. (APPEEC), Xi’an, China, pp. 1264 1268, 2016.
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V., Gene selection for cancer classication using support vector machines, Mach. Learn., vol. 46, nos. 1 3, pp. 389 422, 2002.
Díaz-Uriarte, R., and de AndrØs, S. A., Gene selection and classication of microarray data using random forest. BMC Bioinf. 7(1):3, Jan. 2006.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P., SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1):321 357, 2002.
Cieslak, D. A., Chawla, N. V., and Striegel, A., Combating imbalance in network intrusion datasets, in Proc. IEEE Int. Conf. Granular Comput., pp. 732 737, 2006.
Fallahi, A., and Jafari, S., An expert system for detection of breast cancer using data preprocessing and Bayesian network. Int. J. Adv. Sci. Technol. 34(9):65 70, 2011.
Liu, Y., Chawla, N. V., Harper, M. P., Shriberg, E., and Stolcke, A., A study in machine learning from imbalanced data for sentence boundary detection in speech. Comput. Speech Lang. 20:468 494, Oct. 2006.
Chase, D. M., Kalouyan, M., and DiSaia, P. J., Colposcopy to evaluate abnormal cervical cytology in 2008. Am. J. Obstet. Gynecol. 200(5):472–480, May 2009. https://doi.org/10.1016/j.ajog.2008.12.025.PMID19375565.
Schiller's test at Who Named It?
Vimal, S., Kalaivani, L., Kaliappan, M., Suresh, A., Gao, X.-Z., and Varatharajan, R., Development of secured data transmission using machine learning based discrete time partial observed markov model and energy optimization in Cognitive radio networks. Neural Comput&Applic, 2018. https://doi.org/10.1007/s00521-018-3788-3.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Patient Facing Systems
Rights and permissions
About this article
Cite this article
Geetha, R., Sivasubramanian, S., Kaliappan, M. et al. Cervical Cancer Identification with Synthetic Minority Oversampling Technique and PCA Analysis using Random Forest Classifier. J Med Syst 43, 286 (2019). https://doi.org/10.1007/s10916-019-1402-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-019-1402-6