Skip to main content
Log in

Big data analytics enabled deep convolutional neural network for the diagnosis of cancer

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Artificial intelligence (AI) has been shown to be a formidable instrument in managing Big Healthcare Data, and it has seen considerable success in bioinformatics. The advancement of big data in biological sciences has given rise to big data analytics (BDA) and artificial intelligence (AI). Because the AI methodologies used in bioinformatics are parallel and iterative, scalable big data management employing distributed and parallel technology is possible. The growth of bioinformatics has resulted in significant storage and administration issues; to share information, such large amounts of data must be handled efficiently. Computational developments in information technology have enabled analytical systems to cope with such data. Therefore, this study emphasizes the impact of big data and BDA in bioinformatics. A practical use of BDAs and AI in cancer classification was given, combining a unique Analysis of Variance (ANOVA) approach with Ant Colony Optimization (ACO) as a hybrid feature selection to pick significant genes while minimizing gene redundancy. Deep Convolutional Neural Networks (DCNN) were employed to classify the datasets. It is because microarray data are produced from gene expression data, and it frequently has a limited number of samples but a huge feature collection size. Using the same datasets, the suggested system outperformed earlier state-of-the-art approaches. The results of the proposed model on all the Leukemia, DLBCL, Colon, and SRDCT datasets revealed an average classification accuracy of 97.7%, 99.9%, 99.9% and 100%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The dataset used in this paper is publicly available as follows: (a) Small Round Blue Cell Tumor (SRBCT) Dataset: https://rdrr.io/github/sarahromanes/multiDA/man/SRBCT.html. (b) Breast Cancer Dataset: https://arup.utah.edu/database/BRCA/, https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data. (c) Leukemia Dataset: https://www.openintro.org/data/index.php?data=golub, https://www.kaggle.com/datasets/crawford/gene-expression. (d) DLBCL Dataset: https://ecotyper.stanford.edu/lymphoma/. (e) Colon Cancer Dataset: http://biogps.org/dataset/tag/colon%20cancer/, https://www.kaggle.com/datasets/andrewmvd/lung-and-colon-cancer-histopathological-images.

References

  1. Kashyap H, Ahmed HA, Hoque N, Roy S, Bhattacharyya DK (2016) Big data analytics in bioinformatics: architectures, techniques, tools and issues. Netw Model Anal Health Inform Bioinform 5:1–28

    Article  Google Scholar 

  2. Awotunde JB, Adeniyi AE, Ogundokun RO, Ajamu GJ, Adebayo PO (2021) MIoT-based big data analytics architecture, opportunities and challenges for enhanced telemedicine systems. Enhanc Telemed e-Health. https://doi.org/10.1007/978-3-030-70111-6_10

    Article  Google Scholar 

  3. Abiodun MK, Awotunde JB, Ogundokun RO, Adeniyi EA, Arowolo MO (2021) Security and information assurance for IoT-based big data. Artificial intelligence for cyber security: methods issues and possible horizons or opportunities. Springer, Cham, pp 189–211

    Chapter  Google Scholar 

  4. Marjani M, Nasaruddin F, Gani A, Karim A, Hashem IAT, Siddiqa A, Yaqoob I (2017) Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5:5247–5261

    Article  Google Scholar 

  5. Gericke NM, Smith MU (2014) Twenty-first-century genetics and genomics: contributions of HPS-informed research and pedagogy. International handbook of research in history, philosophy and science teaching. Springer, Cham, pp 423–467

    Chapter  Google Scholar 

  6. Sheikh A, Anderson M, Albala S, Casadei B, Franklin BD, Richards M, Taylor D, Tibble H, Mossialos E (2021) Health information technology and digital innovation for national learning health and care systems. Lancet Digit Health 3:e383–e396

    Article  Google Scholar 

  7. Broza YY, Zhou X, Yuan M, Qu D, Zheng Y, Vishinkin R, Haick H (2019) Disease detection with molecular biomarkers: from chemistry of body fluids to nature-inspired chemical sensors. Chem Rev 119(22):11761–11817

    Article  Google Scholar 

  8. Martinkova J, Gadher SJ, Hajduch M, Kovarova H (2009) Challenges in cancer research and multifaceted approaches for cancer biomarker quest. FEBS Lett 583(11):1772–1784

    Article  Google Scholar 

  9. Poirion OB, Jing Z, Chaudhary K, Huang S, Garmire LX (2021) DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med 13(1):1–15

    Article  Google Scholar 

  10. Boldú L, Merino A, Acevedo A, Molina A, Rodellar J (2021) A deep learning model (ALNet) for the diagnosis of acute leukaemia lineage using peripheral blood cell images. Comput Methods Programs Biomed 202:105999

    Article  Google Scholar 

  11. Bibi N, Sikandar M, Ud Din I, Almogren A, Ali S (2020) IoMT-based automated detection and classification of leukemia using deep learning. J Healthc Eng 2020:1–12

    Article  Google Scholar 

  12. Mallick PK, Mohapatra SK, Chae GS, Mohanty MN (2023) Convergent learning–based model for leukemia classification from gene expression. Pers Ubiquit Comput 27(3):1103–1110

    Article  Google Scholar 

  13. Saeed A, Shoukat S, Shehzad K, Ahmad I, Eshmawi AA, Amin AH, Tag-Eldin E (2022) A deep learning-based approach for the diagnosis of acute lymphoblastic leukemia. Electronics 11(19):3168

    Article  Google Scholar 

  14. Vogado LH, Veras RM, Araujo FH, Silva RR, Aires KR (2018) Leukemia diagnosis in blood slides using transfer learning in CNNs and SVM for classification. Eng Appl Artif Intell 72:415–422

    Article  Google Scholar 

  15. Mohlman JS, Leventhal SD, Hansen T, Kohan J, Pascucci V, Salama ME (2020) Improving augmented human intelligence to distinguish Burkitt lymphoma from diffuse large B-cell lymphoma cases. Am J Clin Pathol 153:743–759

    Article  Google Scholar 

  16. Mandal M, Mukhopadhyay A (2013) A PSO-based rank aggregation algorithm for ranking genes from microarray data. In: Proceedings of the 17th panhellenic conference on informatics, pp 166–173

  17. Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) Hidden Markov models for cancer classification using gene expression profiles. Inf Sci 316:293–307

    Article  Google Scholar 

  18. Kumar A, Halder A (2020) Ensemble-based active learning using fuzzy-rough approach for cancer sample classification. Eng Appl Artif Intell 91:103591

    Article  Google Scholar 

  19. Shah SH, Iqbal MJ, Ahmad I, Khan S, Rodrigues JJ (2020) Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05367-8

    Article  Google Scholar 

  20. Rezaee K, Jeon G, Khosravi MR, Attar HH, Sabzevari A (2022) Deep learning-based microarray cancer classification and ensemble gene selection approach. IET Syst Biol 16:120–131

    Article  Google Scholar 

  21. Basavegowda HS, Dagnew G (2020) Deep learning approach for microarray cancer data classification. CAAI Trans Intell Technol 5:22–33

    Article  Google Scholar 

  22. Salimy S, Lanjanian H, Abbasi K, Salimi M, Najafi A, Tapak L, Masoudi-Nejad A (2023) A deep learning-based framework for predicting survival-associated groups in colon cancer by integrating multi-omics and clinical data. Heliyon 9:e17653

    Article  Google Scholar 

  23. Yardimci AH, Kocak B, Sel I, Bulut H, Bektas CT, Cin M, Kilickesmez O (2023) Radiomics of locally advanced rectal cancer: machine learning-based prediction of response to neoadjuvant chemoradiotherapy using pre-treatment sagittal T2-weighted MRI. Jpn J Radiol 41(1):71–82

    Article  Google Scholar 

  24. Koppad S, Basava A, Nash K, Gkoutos GV, Acharjee A (2022) Machine learning-based identification of colon cancer candidate diagnostics genes. Biology 11(3):365

    Article  Google Scholar 

  25. Talukder MA, Islam MM, Uddin MA, Akhter A, Hasan KF, Moni MA (2022) Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning. Expert Syst Appl 205:117695

    Article  Google Scholar 

  26. Rezaee K, Jeon G, Khosravi MR, Attar HH, Sabzevari A (2022) Deep learning-based microarray cancer classification and ensemble gene selection approach. IET Syst Biol 16(3–4):120–131

    Article  Google Scholar 

  27. Meenachi L, Ramakrishnan S (2021) Metaheuristic search based feature selection methods for classification of cancer. Pattern Recogn 119:108079

    Article  Google Scholar 

  28. Saberi-Movahed F, Rostami M, Berahmand K, Karami S, Tiwari P, Oussalah M, Band SS (2022) Dual regularized unsupervised feature selection based on matrix factorization and minimum redundancy with application in gene selection. Knowl Based Syst 256:109884

    Article  Google Scholar 

  29. Awotunde JB, Panigrahi R, Khandelwal B, Garg A, Bhoi AK (2023) Breast cancer diagnosis based on hybrid rule-based feature selection with deep learning algorithm. Res Biomed Eng 39(1):115–127

    Article  Google Scholar 

  30. Mallika R, Saravanan V (2010) An svm based classification method for cancer data using minimum microarray gene expressions. Int J Comput Inf Eng 4:266–270

    Google Scholar 

  31. Adebiyi MO, Arowolo MO, Olugbara O (2021) A genetic algorithm for prediction of RNA-seq malaria vector gene expression data classification using SVM kernels. Bull Electr Eng Inform 10:1071–1079

    Article  Google Scholar 

  32. Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839

    Article  MathSciNet  Google Scholar 

  33. Sun L, Kong X, Xu J, Xue Z, Zhai R, Zhang S (2019) A hybrid gene selection method based on relieff and ant colony optimization algorithm for tumor classification. Sci Rep 9:1–14

    Google Scholar 

  34. Yu H, Gu G, Liu H, Shen J, Zhao J (2009) A modified ant colony optimization algorithm for tumor marker gene selection. Genomics Proteomics Bioinform 7:200–208

    Article  Google Scholar 

  35. Arowolo MO, Adebiyi MO, Adebiyi AA, Olugbara O (2021) Optimized hybrid investigative based dimensionality reduction methods for malaria vector using KNN classifier. J Big Data 8:1–14

    Article  Google Scholar 

  36. Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P et al (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344:539–548

    Article  Google Scholar 

  37. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  Google Scholar 

  38. Alizadeh AA, Eisen MB, Eric Davis R, Ma C, Lossos IS, Rosenwald A, Boldrick JC et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511

    Article  Google Scholar 

  39. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750

    Article  Google Scholar 

  40. Díaz-Uriarte R, de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7:1–13

    Article  Google Scholar 

  41. Vimaladevi M, Kalaavathi B (2014) A microarray gene expression data classification using hybrid back propagation neural network. Genetika 46:1013–1026

    Article  Google Scholar 

  42. Ludwig SA, Jakobovic D, Picek S (2015) Analyzing gene expression data: fuzzy decision tree algorithm applied to the classification of cancer data. In: 2015 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8

  43. Medjahed SA, Saadi TA, Benyettou A, Ouali M (2017) Kernel-based learning and feature selection analysis for cancer diagnosis. Appl Soft Comput 51:39–48

    Article  Google Scholar 

  44. Salem H, Attiya G, El-Fishawy N (2017) Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 50:124–134

    Article  Google Scholar 

  45. Liu J, Wang X, Cheng Y, Zhang L (2017) Tumor gene expression data classification via sample expansion-based deep learning. Oncotarget 8:109646

    Article  Google Scholar 

  46. Chattopadhyay S, Singh PK, Ijaz MF, Kim S, Sarkar R (2023) SnapEnsemFS: a snapshot ensembling-based deep feature selection model for colorectal cancer histological analysis. Sci Rep 13(1):9937

    Article  Google Scholar 

  47. Wang Y, Yang X-G, Lu Y (2019) Informative gene selection for microarray classification via adaptive elastic net with conditional mutual information. Appl Math Model 71:286–297

    Article  MathSciNet  Google Scholar 

  48. Alanni R, Hou J, Azzawi H, Xiang Y (2019) A novel gene selection algorithm for cancer classification using microarray datasets. BMC Med Genomics 12:1–12

    Article  Google Scholar 

  49. Deif MA, Hammam RE, Solyman A (2021) Gradient boosting machine based on PSO for prediction of leukemia after a breast cancer diagnosis. Int J Adv Sci Eng Inf Technol 11:508–515

    Article  Google Scholar 

  50. Wang L, Zhao ZQ, Luo YH, Hong YM, Wu SQ, Ren XL, Zheng CC, Huang XQ (2020) Classifying 2-year recurrence in patients with dlbcl using clinical variables with imbalanced data and machine learning methods. Comput Methods Programs Biomed 196:105567

    Article  Google Scholar 

  51. Shukla AK, Singh P, Vardhan M (2020) Gene selection for cancer types classification using novel hybrid metaheuristics approach. Swarm Evol Comput 54:100661

    Article  Google Scholar 

  52. Ocheme FO, Sulaimon HA, Isah AA (2021) A deep neural network approach for cancer types classification using gene selection. Commun Phys Sci 7:388–397

    Google Scholar 

  53. Nagpal A, Singh V (2018) Identification of significant features using random forest for high dimensional microarray data. J Eng Sci Technol 13:2446–2463

    Google Scholar 

  54. Vural H, Subaşı A (2015) Data-mining techniques to classify microarray gene expression data using gene selection by SVD and information gain. Model Artif Intell 2:171–182

    Article  Google Scholar 

  55. Alshamlan HM, Badr GH, Alohali YA (2016) Abc-svm: artificial bee colony and svm method for microarray gene selection and multi class cancer classification. Int J Mach Learn Comput 6:184

    Article  Google Scholar 

  56. Panda M (2020) Elephant search optimization combined with deep neural network for microarray data analysis. J King Saud Univ Comput Inf Sci 32:940–948

    Google Scholar 

  57. Baliarsingh SK, Muhammad K, Bakshi S (2021) SARA: a memetic algorithm for high-dimensional biomedical data. Appl Soft Comput 101:107009

    Article  Google Scholar 

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All the authors have designed the study, developed the methodology, performed the analysis, and written the manuscript. All authors have contributed equally.

Corresponding authors

Correspondence to Ranjit Panigrahi or Akash Kumar Bhoi.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Awotunde, J.B., Panigrahi, R., Shukla, S. et al. Big data analytics enabled deep convolutional neural network for the diagnosis of cancer. Knowl Inf Syst 66, 905–931 (2024). https://doi.org/10.1007/s10115-023-01971-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01971-x

Keywords

Navigation