Abstract
Microarray gene expression profile shall be exploited for the efficient and effective classification of cancers. This is a computationally challenging task because of large quantity of genes and relatively small amount of experiments in gene expression data. The repercussion of this work is to devise a framework of techniques based on supervised machine learning for discrimination of acute lymphoblastic leukemia and acute myeloid leukemia using microarray gene expression profiles. Artificial neural network (ANN) technique was employed for this classification. Moreover, ANN was compared with other five machine learning techniques. These methods were assessed on eight different classification performance measures. This article reports a significant classification accuracy of 98% using ANN with no error in identification of acute lymphoblastic leukemia and only one error in identification of acute myeloid leukemia on tenfold cross-validation and leave-one-out approach. Furthermore, models were validated on independent test data, and all samples were correctly classified.
Similar content being viewed by others
References
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Haferlach T, Kohlmann A, Schnittger S, Dugas M, Hiddemann W, Kern W, Schoch C (2005) Global approach to the diagnosis of leukemia using gene expression profiling. Blood 106(4):1189–1198
Mallick BK, Ghosh D, Ghosh M (2005) Bayesian classification of tumours by using gene expression data. J R Stat Soc Ser B Stat Methodol 67(2):219–234
Antonov AV, Tetko IV, Mader MT, Budczies J, Mewes HW (2004) Optimization models for cancer classification: extracting gene interaction information from microarray expression data. Bioinformatics 20(5):644–652
Lee Y, Lee C-K (2003) Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19(9):1132–1139
Peng S, Xu Q, Ling XB, Peng X, Du W, Chen L (2003) Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Lett 555(2):358–362
Berrar DP, Downes CS, Dubitzky W (2003) Multiclass cancer classification using gene expression profiling and probabilistic neural networks. In: Proceedings of the Pacific symposium on biocomputing, pp 5–16
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679
Dwivedi AK, Chouhan U (2016) Comparative study of artificial neural network for classification of hot and cold recombination regions in Saccharomyces cerevisiae. Neural Comput Appl. doi:10.1007/s00521-016-2466-6
Dwivedi AK, Chouhan U (2016) Comparative study of machine learning techniques for genome scale discrimination of recombinant HIV-1 strains. J Med Imaging Health Inform 6(2):425–430
Dwivedi AK (2016) Performance evaluation of different machine learning techniques for prediction of heart disease. Neural Comput Appl 27(7):1–9
García-Pedrajas N, Hervás-Martínez C, Ortiz-Boyer D (2005) Cooperative coevolution of artificial neural network ensembles for pattern classification. IEEE Trans Evol Comput 9(3):271–302
Yao X, Liu Y (1998) Making use of population information in evolutionary artificial neural networks. IEEE Trans Syst Man Cybern Part B Cybern 28(3):417–425
Haykin S (2010) Neural networks: a comprehensive foundation, 1994. McMillan, New Jersey
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York
Bahrammirzaee A (2010) A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems. Neural Comput Appl 19(8):1165–1195
Hoptroff RG (1993) The principles and practice of time series forecasting and business modelling using neural nets. Neural Comput Appl 1(1):59–66
Azar AT (2013) Fast neural network learning algorithms for medical applications. Neural Comput Appl 23(3–4):1019–1034
Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci 97(1):262–267
Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914
Ranawana R, Palade V (2005) A neural network based multi-classifier system for gene identification in DNA sequences. Neural Comput Appl 14(2):122–131
Yasdi R (2000) A literature survey on applications of neural networks for human-computer interaction. Neural Comput Appl 9(4):245–258
Dwivedi AK, Chouhan U (2014) On support vector machine ensembles for classification of recombination breakpoint regions in Saccharomyces Cerevisiae. Int J Comput Appl 108(13):44–48
Dwivedi AK, Chouhan U (2016) Genome-scale classification of recombinant and non-recombinant HIV-1 sequences using artificial neural network ensembles. Curr Sci 111(5):853
Vapnik VN, Vapnik V (1998) Statistical learning theory, vol 2. Wiley, New York
Vapnik VN (2000) The nature of statistical learning theory, ser. Statistics for engineering and information science, vol 21. Springer, New York, pp 1003–1008
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167
Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243
Jensen FV (1996) An introduction to Bayesian networks, vol 210. UCL Press, London
Peral J (1988) Probabilistic reasoning in intelligent systems, vol 12. Morgan Kaufmann, San Mateo, pp 241–288
Castillo E (1997) Expert systems and probabilistic network models. Springer, Berlin
Shafer G, Pearl J (1990) Readings in uncertain reasoning. Morgan Kaufmann, San Francisco
Hosmer DW Jr, Lemeshow S (2004) Applied logistic regression, 2nd edn. Wiley, Columbus
Schumacher M, Roßner R, Vach W (1996) Neural networks and logistic regression: part I. Comput Stat Data Anal 21(6):661–682
Vach W, Roßner R, Schumacher M (1996) Neural networks and logistic regression: part II. Comput Stat Data Anal 21(6):683–701
Hajmeer M, Basheer I (2003) Comparison of logistic regression and neural network-based classifiers for bacterial growth. Food Microbiol 20(1):43–55
Aha DW (1997) Lazy learning. Kluwer, Norwell
Kanmani S, Uthariaraj VR, Sankaranarayanan V, Thambidurai P (2007) Object-oriented software fault prediction using neural networks. Inf Softw Technol 49(5):483–492
Geisser S (1993) Predictive inference, vol 55. CRC Press, New York
Acknowledgements
The authors are extremely thankful to department of Biotechnology, New Delhi for providing Bioinformatics Infrastructure Facility of DBT at MANIT Bhopal.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Rights and permissions
About this article
Cite this article
Dwivedi, A.K. Artificial neural network model for effective cancer classification using microarray gene expression data. Neural Comput & Applic 29, 1545–1554 (2018). https://doi.org/10.1007/s00521-016-2701-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-016-2701-1