Abstract
Prostate cancer is the fourth most common cancer among all cancers and the second most common cancer in men. The rate of increase in prostate cancer incidence is higher than the overall increase of cancer incidents. 68% of prostate cancer cases are from developed countries. There has been very little research on the most suitable techniques for analysing prostate cancer gene expression datasets to identify those genes that may be most related to prostate cancer. This paper attempts to identify significant (influential) attributes in a well-established prostate cancer gene expression dataset consisting of over 12,533 attributes for 102 samples (50 normal, 52 tumour). Several (7) different statistical and artificial intelligence (AI)-based feature selection methods were paired with four different classifiers, namely ANNs, Naive Bayes, AdaBoost and J48. Prediction experiments are carried using ANNs with unseen sample testing. In our experiments, ANNs outperformed all other approaches for classification with sequential forward feature selection (SFFS), achieving 100% accuracy. Naive Bayes and AdaBoost achieved best accuracy of 96.3 and 93.13% with support vector machine (SVM) attribute selection, whereas J48 could get only 89.21% with SFFS approach. For prediction experiments, ANNs obtained an accuracy of 95.1% with SVM attribute selection (correctly predicting 96 out of 102 samples). Finally, by investigating National Center for Biotechnology Information database it is found that 21 out of 24 attributes (87.5%) that belong to SVM attribute selection have a reference to cancer/tumour, thereby establishing a link between feature selection and biological plausibility. The main contribution of this paper is in identifying the importance of pairing the most appropriate feature selection strategy with the most appropriate classification strategy when dealing with significantly underdetermined data. This paper also emphasizes differences and similarities between the influence of classification and prediction of prostate cancer. There is another new approach we considered while doing the classification and prediction experiments. Apart from using 7 different feature selection approaches, we have derived new set of attributes by adding all attributes (union), selecting common attributes (intersection) and rest of the attributes (not common).
Similar content being viewed by others
References
Narayanan A, Keedwell E, Olsson B (2002) Artificial intelligence techniques for bioinformatics. Appl Bioinform 1:191–222
Narayanan A, Keedwell EC, Gamalielsson J, Tatineni S (2004) Single-layer artificial neural networks for gene expression analysis. Neurocomputing 61:217–240
Zhou Z-H, Jiang Y, Yang Y-B, Chen S-F (2002) Lung cancer cell identification based on artificial neural network ensembles. Artif Intell Med 24(1):25–36
Baker JA, Kornguth PJ, Lo JY, Williford ME, Floyd CE Jr (1995) Breast cancer: prediction with artificial neural network based on bi-rads standardized lexicon. Radiology 196(3):817–822
Bottaci L, Drew PJ, Hartley JE, Hadfield MB, Farouk R, Lee PW, Macintyre IM, Duthie GS, Monson JR (1997) Artificial neural networks applied to outcome prediction for colorectal cancer patients in separate institutions. The Lancet 350(9076):469–472
Ahmed FE (2005) Artificial neural networks for diagnosis and survival prediction in colon cancer. Mol Cancer 4(1):29
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Fakoor R, Ladhak F, Nazi A, Huber M (2013) Using deep learning to enhance cancer diagnosis and classification. In: Proceedings of the international conference on machine learning
Tirumala SS, Shahamiri SR, Garhwal AS, Wang R (2017) Speaker identification features extraction methods: a systematic review. Expert Syst Appl 90:250–271
Snow PB, Smith DS, Catalona WJ (1994) Artificial neural networks in the diagnosis and prognosis of prostate cancer: a pilot study. J Urol 152(5 Pt 2):1923–1926
Djavan B, Remzi M, Zlotta A, Seitz C, Snow P, Marberger M (2002) Novel artificial neural network for early detection of prostate cancer. J Clin Oncol 20(4):921–929
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Tirumala S.S, Narayanan A (2016) Attribute selection and classification of prostate cancer gene expression data using artificial neural networks. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 26–34
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15(11):1119–1125
Ververidis D, Kotropoulos C (2005) Sequential forward feature selection with low computational cost. In: 2005 13th European signal processing conference, IEEE, pp 1–4
Kononenko I, Simec E, Robnik-Sikonja M (1997) Overcoming the myopia of inductive learning algorithms with relieff. Appl Intell 7:39–55
Ahmad A, Dey L (2005) A feature selection technique for classificatory analysis. Pattern Recognit Lett 26(1):43–56
Saeys Y, Inza I, Larra naga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Ververidis D, Kotropoulos C (2009) Information loss of the Mahalanobis distance in high dimensions: application to feature selection. IEEE Trans Pattern Anal Mach Intell 31(12):2275–2281
Riedmiller HBM, Braun H (1993) Rprop: a fast and robust back propagation learning strategy. In: Proceedings of the ACNN
Tirumala SS, Narayanan A (2015) Hierarchical data classification using deep neural networks. In: Arik S, Huang T, Lai WK, Liu Q (eds) Neural information processing series, vol 9489. Lecture notes in computer science. Springer, Berlin, pp 492–500
NCBI (1993) National center for biotechnology information (ncbi) [Online]. http://www.ncbi.nlm.nih.gov/nuccore/X76061. Accessed 07 Nov 2015
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author(s) whose names are listed above certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Rights and permissions
About this article
Cite this article
Tirumala, S.S., Narayanan, A. Classification and diagnostic prediction of prostate cancer using gene expression and artificial neural networks. Neural Comput & Applic 31, 7539–7548 (2019). https://doi.org/10.1007/s00521-018-3589-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3589-8