Abstract
Artificial intelligence (AI) has been shown to be a formidable instrument in managing Big Healthcare Data, and it has seen considerable success in bioinformatics. The advancement of big data in biological sciences has given rise to big data analytics (BDA) and artificial intelligence (AI). Because the AI methodologies used in bioinformatics are parallel and iterative, scalable big data management employing distributed and parallel technology is possible. The growth of bioinformatics has resulted in significant storage and administration issues; to share information, such large amounts of data must be handled efficiently. Computational developments in information technology have enabled analytical systems to cope with such data. Therefore, this study emphasizes the impact of big data and BDA in bioinformatics. A practical use of BDAs and AI in cancer classification was given, combining a unique Analysis of Variance (ANOVA) approach with Ant Colony Optimization (ACO) as a hybrid feature selection to pick significant genes while minimizing gene redundancy. Deep Convolutional Neural Networks (DCNN) were employed to classify the datasets. It is because microarray data are produced from gene expression data, and it frequently has a limited number of samples but a huge feature collection size. Using the same datasets, the suggested system outperformed earlier state-of-the-art approaches. The results of the proposed model on all the Leukemia, DLBCL, Colon, and SRDCT datasets revealed an average classification accuracy of 97.7%, 99.9%, 99.9% and 100%, respectively.
Similar content being viewed by others
Data availability
The dataset used in this paper is publicly available as follows: (a) Small Round Blue Cell Tumor (SRBCT) Dataset: https://rdrr.io/github/sarahromanes/multiDA/man/SRBCT.html. (b) Breast Cancer Dataset: https://arup.utah.edu/database/BRCA/, https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data. (c) Leukemia Dataset: https://www.openintro.org/data/index.php?data=golub, https://www.kaggle.com/datasets/crawford/gene-expression. (d) DLBCL Dataset: https://ecotyper.stanford.edu/lymphoma/. (e) Colon Cancer Dataset: http://biogps.org/dataset/tag/colon%20cancer/, https://www.kaggle.com/datasets/andrewmvd/lung-and-colon-cancer-histopathological-images.
References
Kashyap H, Ahmed HA, Hoque N, Roy S, Bhattacharyya DK (2016) Big data analytics in bioinformatics: architectures, techniques, tools and issues. Netw Model Anal Health Inform Bioinform 5:1–28
Awotunde JB, Adeniyi AE, Ogundokun RO, Ajamu GJ, Adebayo PO (2021) MIoT-based big data analytics architecture, opportunities and challenges for enhanced telemedicine systems. Enhanc Telemed e-Health. https://doi.org/10.1007/978-3-030-70111-6_10
Abiodun MK, Awotunde JB, Ogundokun RO, Adeniyi EA, Arowolo MO (2021) Security and information assurance for IoT-based big data. Artificial intelligence for cyber security: methods issues and possible horizons or opportunities. Springer, Cham, pp 189–211
Marjani M, Nasaruddin F, Gani A, Karim A, Hashem IAT, Siddiqa A, Yaqoob I (2017) Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5:5247–5261
Gericke NM, Smith MU (2014) Twenty-first-century genetics and genomics: contributions of HPS-informed research and pedagogy. International handbook of research in history, philosophy and science teaching. Springer, Cham, pp 423–467
Sheikh A, Anderson M, Albala S, Casadei B, Franklin BD, Richards M, Taylor D, Tibble H, Mossialos E (2021) Health information technology and digital innovation for national learning health and care systems. Lancet Digit Health 3:e383–e396
Broza YY, Zhou X, Yuan M, Qu D, Zheng Y, Vishinkin R, Haick H (2019) Disease detection with molecular biomarkers: from chemistry of body fluids to nature-inspired chemical sensors. Chem Rev 119(22):11761–11817
Martinkova J, Gadher SJ, Hajduch M, Kovarova H (2009) Challenges in cancer research and multifaceted approaches for cancer biomarker quest. FEBS Lett 583(11):1772–1784
Poirion OB, Jing Z, Chaudhary K, Huang S, Garmire LX (2021) DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med 13(1):1–15
Boldú L, Merino A, Acevedo A, Molina A, Rodellar J (2021) A deep learning model (ALNet) for the diagnosis of acute leukaemia lineage using peripheral blood cell images. Comput Methods Programs Biomed 202:105999
Bibi N, Sikandar M, Ud Din I, Almogren A, Ali S (2020) IoMT-based automated detection and classification of leukemia using deep learning. J Healthc Eng 2020:1–12
Mallick PK, Mohapatra SK, Chae GS, Mohanty MN (2023) Convergent learning–based model for leukemia classification from gene expression. Pers Ubiquit Comput 27(3):1103–1110
Saeed A, Shoukat S, Shehzad K, Ahmad I, Eshmawi AA, Amin AH, Tag-Eldin E (2022) A deep learning-based approach for the diagnosis of acute lymphoblastic leukemia. Electronics 11(19):3168
Vogado LH, Veras RM, Araujo FH, Silva RR, Aires KR (2018) Leukemia diagnosis in blood slides using transfer learning in CNNs and SVM for classification. Eng Appl Artif Intell 72:415–422
Mohlman JS, Leventhal SD, Hansen T, Kohan J, Pascucci V, Salama ME (2020) Improving augmented human intelligence to distinguish Burkitt lymphoma from diffuse large B-cell lymphoma cases. Am J Clin Pathol 153:743–759
Mandal M, Mukhopadhyay A (2013) A PSO-based rank aggregation algorithm for ranking genes from microarray data. In: Proceedings of the 17th panhellenic conference on informatics, pp 166–173
Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) Hidden Markov models for cancer classification using gene expression profiles. Inf Sci 316:293–307
Kumar A, Halder A (2020) Ensemble-based active learning using fuzzy-rough approach for cancer sample classification. Eng Appl Artif Intell 91:103591
Shah SH, Iqbal MJ, Ahmad I, Khan S, Rodrigues JJ (2020) Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05367-8
Rezaee K, Jeon G, Khosravi MR, Attar HH, Sabzevari A (2022) Deep learning-based microarray cancer classification and ensemble gene selection approach. IET Syst Biol 16:120–131
Basavegowda HS, Dagnew G (2020) Deep learning approach for microarray cancer data classification. CAAI Trans Intell Technol 5:22–33
Salimy S, Lanjanian H, Abbasi K, Salimi M, Najafi A, Tapak L, Masoudi-Nejad A (2023) A deep learning-based framework for predicting survival-associated groups in colon cancer by integrating multi-omics and clinical data. Heliyon 9:e17653
Yardimci AH, Kocak B, Sel I, Bulut H, Bektas CT, Cin M, Kilickesmez O (2023) Radiomics of locally advanced rectal cancer: machine learning-based prediction of response to neoadjuvant chemoradiotherapy using pre-treatment sagittal T2-weighted MRI. Jpn J Radiol 41(1):71–82
Koppad S, Basava A, Nash K, Gkoutos GV, Acharjee A (2022) Machine learning-based identification of colon cancer candidate diagnostics genes. Biology 11(3):365
Talukder MA, Islam MM, Uddin MA, Akhter A, Hasan KF, Moni MA (2022) Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning. Expert Syst Appl 205:117695
Rezaee K, Jeon G, Khosravi MR, Attar HH, Sabzevari A (2022) Deep learning-based microarray cancer classification and ensemble gene selection approach. IET Syst Biol 16(3–4):120–131
Meenachi L, Ramakrishnan S (2021) Metaheuristic search based feature selection methods for classification of cancer. Pattern Recogn 119:108079
Saberi-Movahed F, Rostami M, Berahmand K, Karami S, Tiwari P, Oussalah M, Band SS (2022) Dual regularized unsupervised feature selection based on matrix factorization and minimum redundancy with application in gene selection. Knowl Based Syst 256:109884
Awotunde JB, Panigrahi R, Khandelwal B, Garg A, Bhoi AK (2023) Breast cancer diagnosis based on hybrid rule-based feature selection with deep learning algorithm. Res Biomed Eng 39(1):115–127
Mallika R, Saravanan V (2010) An svm based classification method for cancer data using minimum microarray gene expressions. Int J Comput Inf Eng 4:266–270
Adebiyi MO, Arowolo MO, Olugbara O (2021) A genetic algorithm for prediction of RNA-seq malaria vector gene expression data classification using SVM kernels. Bull Electr Eng Inform 10:1071–1079
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839
Sun L, Kong X, Xu J, Xue Z, Zhai R, Zhang S (2019) A hybrid gene selection method based on relieff and ant colony optimization algorithm for tumor classification. Sci Rep 9:1–14
Yu H, Gu G, Liu H, Shen J, Zhao J (2009) A modified ant colony optimization algorithm for tumor marker gene selection. Genomics Proteomics Bioinform 7:200–208
Arowolo MO, Adebiyi MO, Adebiyi AA, Olugbara O (2021) Optimized hybrid investigative based dimensionality reduction methods for malaria vector using KNN classifier. J Big Data 8:1–14
Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P et al (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344:539–548
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Alizadeh AA, Eisen MB, Eric Davis R, Ma C, Lossos IS, Rosenwald A, Boldrick JC et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750
Díaz-Uriarte R, de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7:1–13
Vimaladevi M, Kalaavathi B (2014) A microarray gene expression data classification using hybrid back propagation neural network. Genetika 46:1013–1026
Ludwig SA, Jakobovic D, Picek S (2015) Analyzing gene expression data: fuzzy decision tree algorithm applied to the classification of cancer data. In: 2015 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8
Medjahed SA, Saadi TA, Benyettou A, Ouali M (2017) Kernel-based learning and feature selection analysis for cancer diagnosis. Appl Soft Comput 51:39–48
Salem H, Attiya G, El-Fishawy N (2017) Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 50:124–134
Liu J, Wang X, Cheng Y, Zhang L (2017) Tumor gene expression data classification via sample expansion-based deep learning. Oncotarget 8:109646
Chattopadhyay S, Singh PK, Ijaz MF, Kim S, Sarkar R (2023) SnapEnsemFS: a snapshot ensembling-based deep feature selection model for colorectal cancer histological analysis. Sci Rep 13(1):9937
Wang Y, Yang X-G, Lu Y (2019) Informative gene selection for microarray classification via adaptive elastic net with conditional mutual information. Appl Math Model 71:286–297
Alanni R, Hou J, Azzawi H, Xiang Y (2019) A novel gene selection algorithm for cancer classification using microarray datasets. BMC Med Genomics 12:1–12
Deif MA, Hammam RE, Solyman A (2021) Gradient boosting machine based on PSO for prediction of leukemia after a breast cancer diagnosis. Int J Adv Sci Eng Inf Technol 11:508–515
Wang L, Zhao ZQ, Luo YH, Hong YM, Wu SQ, Ren XL, Zheng CC, Huang XQ (2020) Classifying 2-year recurrence in patients with dlbcl using clinical variables with imbalanced data and machine learning methods. Comput Methods Programs Biomed 196:105567
Shukla AK, Singh P, Vardhan M (2020) Gene selection for cancer types classification using novel hybrid metaheuristics approach. Swarm Evol Comput 54:100661
Ocheme FO, Sulaimon HA, Isah AA (2021) A deep neural network approach for cancer types classification using gene selection. Commun Phys Sci 7:388–397
Nagpal A, Singh V (2018) Identification of significant features using random forest for high dimensional microarray data. J Eng Sci Technol 13:2446–2463
Vural H, Subaşı A (2015) Data-mining techniques to classify microarray gene expression data using gene selection by SVD and information gain. Model Artif Intell 2:171–182
Alshamlan HM, Badr GH, Alohali YA (2016) Abc-svm: artificial bee colony and svm method for microarray gene selection and multi class cancer classification. Int J Mach Learn Comput 6:184
Panda M (2020) Elephant search optimization combined with deep neural network for microarray data analysis. J King Saud Univ Comput Inf Sci 32:940–948
Baliarsingh SK, Muhammad K, Bakshi S (2021) SARA: a memetic algorithm for high-dimensional biomedical data. Appl Soft Comput 101:107009
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All the authors have designed the study, developed the methodology, performed the analysis, and written the manuscript. All authors have contributed equally.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Awotunde, J.B., Panigrahi, R., Shukla, S. et al. Big data analytics enabled deep convolutional neural network for the diagnosis of cancer. Knowl Inf Syst 66, 905–931 (2024). https://doi.org/10.1007/s10115-023-01971-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01971-x