Skip to main content

Advertisement

Log in

A survey on cancer prediction and detection with data analysis

  • Review Article
  • Published:
Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Abstract

World Health Organization reports cancer as a leading cause worldwide in mortality and morbidity. Accurate and early cancer risk assessment in average- to high-risk population is vital in controlling the cancer-related suffering and mortality. Advanced bioinformatics and data mining techniques along with computer-aided cancer prediction and risk assessment are used extensively to assist in identifying the high-risk population as well as individual cancer diagnosis and treatment. An early detection minimizes the risk of cancer spreading to secondary sites and ensures appropriate treatment at the onset of the malignancy. The scope of our survey was to review over 90 publications centered around works done in the area of data analysis studies in the field of cancer prediction and detection. The motivation was to accumulate and categorize knowledge on the usage of data analytics for cancer prediction and detection. The aim was to do a comparative study of few of the major analytical approaches in cancer data analysis and highlight their effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Chaurasia V, Pal S (2017) A novel approach for breast cancer detection using data mining techniques. Int J Innov Res Comput Commun Eng 2(1)

  2. Chaurasia V, Pal S (2014) Data mining techniques: to predict and resolve breast cancer survivability. Int J Comput Sci Mob Comput 3(1):10–22

    Google Scholar 

  3. Priyanga A, Prakasam S (2013) The role of data mining-based cancer prediction System (DMBCPS) in cancer awareness. Int J Comput Sci Eng Commun IJCSEC 1(1):54–61

    Google Scholar 

  4. http://gco.iarc.fr/tomorrow/home

  5. Zand HKK (2015) A comparative survey on data mining techniques for breast cancer diagnosis and prediction. Indian J Fundam Appl Life Sci 5(2005):4330–4339

    Google Scholar 

  6. Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary A (2011) A lung cancer outcome calculator using ensemble data mining on SEER data. BIOKDD 2011, San Diego, CA, USA. ACM, New York, pp 1–9

  7. Khan MT, Qamar S, Massin LF (2012) A prototype of cancer/heart disease prediction model using data mining. Int J Appl Eng Res 7(11):1241–1249

    Google Scholar 

  8. Suji RJ, Rajagopalan DS (2013) An automatic oral cancer classification using data mining techniques. Int J Adv Res Comput Commun Eng 2(10):3759–3765

    Google Scholar 

  9. Abdelaal MMA, Sena HA, Farouq MW, Salem AM (2010) Using data mining for assessing diagnosis of breast cancer. In: Proceedings of the international multiconference on computer science and information technology, Wisla, pp 11–17

  10. https://www.cancerresearchuk.org/about-cancer/cancer-symptoms/why-is-early-diagnosis-important

  11. Kharya S (2012) Using data mining techniques for diagnosis and prognosis of cancer disease. Int J Comput Sci Eng Inf Technol 2(2):55–66

    Google Scholar 

  12. Christopher T, Banu JJ (2016) Study of classification algorithm for lung cancer prediction. Int J Innov Sci Eng Technol 3(2):42–49

    Google Scholar 

  13. Kumar GR, Ramachandra GA, Nagamani K (2013) An efficient prediction of breast cancer data using data mining techniques. IJIET 2(4):139–144

    Google Scholar 

  14. Ada KR (2013) A study of detection of lung cancer using data mining classification techniques. Int J Adv Res Comput Sci Softw Eng 3(3):2277

    Google Scholar 

  15. Thein HTT, Tun KMM (2015) An approach for breast cancer diagnosis classification using neural network. Adv Comput Int J 6(1):1–11

    Google Scholar 

  16. Balachandran K, Anitha R (2010) Supervised learning processing techniques for pre-diagnosis of lung cancer disease. Int J Comput Appl 1(5):17–21

    Google Scholar 

  17. Yeh WC, Chang WW, Chung YY (2009) A new hybrid approach for mining breast cancer pattern using discrete particle swarm optimization and statistical method. Expert Syst Appl 36(4):8204–8211

    Google Scholar 

  18. Arutchelvan K, Periasamy R (2015) Analysis of cancer detection system using data mining approach. Int J Innov Res Adv Eng 11(2):57–60

    Google Scholar 

  19. Williams K, Idowu P, Balogun J, Oluwaranti A (2015) Breast cancer risk prediction using data mining classification techniques. Trans Netw Commun 3(2):1–11

    Google Scholar 

  20. Shah S, Kusiak A (2007) Cancer gene search with data-mining and genetic algorithms. Comput Biol Med 37(2):251–261

    Google Scholar 

  21. Gupta S (2011) Data mining classification techniques applied for breast cancer diagnosis and prognosis. Indian J Comput Sci Eng 2(2):188–195

    Google Scholar 

  22. Moschopoulos C (2013) A genetic algorithm for pancreatic cancer diagnosis. In: International conference on engineering applications of neural networks, pp 222–230

  23. Deoskar P, Singh DD, Singh DA (2013) Mining lung cancer data and other diseases data using data mining techniques: a survey. Int J Comput Eng Technol 4(2):508–516

    Google Scholar 

  24. Sowmiya T, Gopi M, Robinson LT (2014) Optimization of lung cancer using modern data mining techniques. Int J Eng Res 3(5):309–314

    Google Scholar 

  25. Machraoui AN, Cherni MA, Sayadi M (2013) Ant Colony optimization algorithm for breast cancer cells classification. In: 2013 International conference on electrical engineering and software applications, ICEESA 2013, pp 1–6

  26. Gopalakrishnan RC, Kuppusamy V (2014) Ant colony optimization approaches to clustering of lung nodules from CT images. Comput Math Methods Med 2014:1–16

    Google Scholar 

  27. Sharma N, Om H (2014) Extracting significant patterns for oral cancer detection using apriori algorithm. Intell Inf Manag 6(2):30–37

    MathSciNet  Google Scholar 

  28. Jesmin T, Ahmed K, Rehman MZ, Miah MBA (2013) Brain cancer risk prediction tool using data mining. Int J Comput Appl 61(12):22–27

    Google Scholar 

  29. Bharathi H, Arulananth TS (2017) A review of lung cancer prediction system using data mining techniques and self organizing map (SOM). Int J Appl Eng Res 12(10):2190–2195

    Google Scholar 

  30. Anunciação O, Gomes BC, Vinga S, Gaspar J, Oliveira AL, Rueff J (2010) A data mining approach for the detection of high-risk breast cancer groups . In: Rocha MP, Riverola FF, Shatkay H, Corchado JM (eds) Advances in bioinformatics. Advances in Intelligent and Soft Computing, vol 74. Springer, Berlin

  31. Lavanya D, Rani KU (2011) Analysis of feature selection with classification: breast cancer datasets. Indian J Comput Sci Eng (IJCSE) 2(5):756–763

    Google Scholar 

  32. Sarvestani AS, Safavi AA, Parandeh NM, Salehi M (2010) Predicting breast cancer survivability using data mining techniques. In: Proceedings of the ICSTE 2010–2010 2nd international conference on software technology and engineering, vol 2, pp 227–231

  33. Wu B, Abbott T, Fishman D, McMurray W, Mor G, Stone K et al (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19(13):1636–1643

    Google Scholar 

  34. Azizi N, Tlili-Guiassa Y, Zemmal N (2013) A computer-aided diagnosis system for breast cancer combining features complementarily and new scheme of SVM classifiers fusion. Int J Multimed Ubiquitous Eng 8(4):45–58

    Google Scholar 

  35. Hassanien A, Ali J (2004) Rough set approach for generation of classification rules of breast cancer data. Informatica 15(1):23–38

    MATH  Google Scholar 

  36. Anuradha K, Sankaranarayanan K (2015) Oral cancer detection using improved segmentation algorithm. Int J Adv Res Comput Sci Softw Eng 5(1):451–456

    Google Scholar 

  37. Yasui Y, Pepe M, Thompson ML, Adam BL, Wright GL, Qu Y et al (2003) A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics (Oxford, England) 4(3):449–463

    MATH  Google Scholar 

  38. Padmavathi J (2011) A comparative study on breast cancer prediction using RBF and MLP. Int J Sci Eng Res 2(1):1–5

    Google Scholar 

  39. Ada KR (2013) Using some data mining techniques to predict the survival year of lung cancer patient. Int J Comput Sci Mob Comput IJCSMC 2(4):1–6

    Google Scholar 

  40. Campadelli P, Casiraghi E, Artioli D (2006) A fully automated method for lung nodule detection from postero-anterior chest radiographs. IEEE Trans Med Imaging 25(12):1588–1603

    Google Scholar 

  41. Rajendran P, Madheswaran M (2010) An improved image mining technique for brain tumour classification using efficient classifier. Int J Comput Sci Inf Secur (IJCSIS) 6(3):107–116

    Google Scholar 

  42. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–527

    Google Scholar 

  43. Cher ML, Bova GS, Moore DH, Small EJ, Carroll PR, Pin SS et al (1996) Genetic alterations in untreated metastases and androgen-independent prostate cancer detected by comparative genomic hybridization and allelotyping. Cancer Res 56(13):3091–3102

    Google Scholar 

  44. Isola JJ, Kallioniemi OP, Chu LW, Fuqua SA, Hilsenbeck SG, Osborne CK, Waldman FM (1995) Genetic aberrations detected by comparative genomic hybridization predict outcome in node-negative breast cancer. Am J Pathol 147(4):905–911

    Google Scholar 

  45. Kallioniemi A, Kallioniemi OP, Piper J, Tanner M, Stokke T, Chen L et al (1994) Detection and mapping of amplified DNA sequences in breast cancer by comparative genomic hybridization. Proc Natl Acad Sci USA 91(6):2156–60

    Google Scholar 

  46. Sourisseau T, Maniotis D, Mccarthy A, Tang C, Lord CJ, Ashworth A, Linardopoulos S (2010) Aurora-A expressing tumour cells are deficient for homology-directed DNA double strand-break repair and sensitive to PARP inhibition. EMBO Mol Med 2(4):130–142

    Google Scholar 

  47. Villanueva J, Philip J, Denoyer L, Tempst P (2007) Data analysis of assorted serum peptidome profiles. Nat Protoc 2(3):588–602

    Google Scholar 

  48. Al-Ruwaili JA, Larkin SET, Zeidan BA, Taylor MG, Adra CN, Aukim-Hastie CL, Townsend PA (2010) Discovery of serum protein biomarkers for prostate cancer progression by proteomic analysis. Cancer Genom Proteom 7(2):93–103

    Google Scholar 

  49. Petricoin EF (2002) Serum proteomic patterns for detection of prostate cancer. Cancer Spectr Knowl Environ 94(20):1576–1578

    Google Scholar 

  50. Yanagisawa K, Tomida S, Shimada Y, Yatabe Y, Mitsudomi T, Takahashi T (2007) A 25-signal proteomic signature and outcome for patients with resected non-small-cell lung cancer. J Natl Cancer Inst 99(11):858–867

    Google Scholar 

  51. Gonçalves A, Charafe-Jauffret E, Bertucci F, Audebert S, Toiron Y, Esterni B et al (2008) Protein profiling of human breast tumor cells identifies Novel biomarkers associated with molecular subtypes. Mol Cell Proteom 7(8):1420–1433

    Google Scholar 

  52. Baitharu TR, Pani SK (2015) A comparative study of data mining classification techniques using lung cancer data. Int J Comput Trends Technol 22(2):91–95

    Google Scholar 

  53. Christopher T, Banu JJ (2015) A study on mining lung cancer data for increasing or decreasing disease prediction value by using ant colony optimization techniques. In: Proceedings of the UGC sponsored national conference on advanced networking and applications, pp 150–153

  54. Ge G, Wong GW (2008) Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles. BMC Bioinform 9:275

    Google Scholar 

  55. Smith FM, Gallagher WM, Fox E, Stephens RB, Rexhepaj E, Petricoin EF et al (2007) Combination of SELDI-TOF-MS and data mining provides early-stage response prediction for rectal tumors undergoing multimodal neoadjuvant therapy. Ann Surg 245(2):259–266

    Google Scholar 

  56. Vanneschi L, Farinaccio A, Mauri G, Antoniotti M, Provero P, Giacobini M (2011) A comparison of machine learning techniques for survival prediction in breast cancer. BioData Min 4(1):12

    Google Scholar 

  57. Ahmed K, Abdullah-Al-Emran A-A-E, Jesmin T, Mukti RF, Rahman MZ, Ahmed F (2013) Early Detection of Lung Cancer Risk Using Data Mining. Asian Pacif J Cancer Prev 14(1):595–598

    Google Scholar 

  58. Thangaraju P, Barkavi G, Karthikeyan T (2014) Mining lung cancer data for smokers and non-smokers by using data mining techniques. Int J Adv Res Comput Commun Eng 3(7):7622–7626

    Google Scholar 

  59. Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–127

    Google Scholar 

  60. Garg C, Bhadauria M (2015) An analysis of skin cancer detection using imagery techniques (2010–2015). Int J Adv Res Comput Sci Softw Eng 5(10):470–474

    Google Scholar 

  61. Yu JK, Chen YD, Zheng S (2004) An integrated approach to the detection of colorectal cancer utilizing proteomics and bioinformatics. World J Gastroenterol 10(21):3127–3131

    Google Scholar 

  62. Subasini A, Abubacker NF (2014) Analysis of classifier to improve medical diagnosis for breast cancer detection using data mining techniques. Int J Adv Netw Appl 5(6):2117–2122

    Google Scholar 

  63. Khan P, Singh A, Maheshwari S (2014) Automated brain tumor detection in medical brain images and clinical parameters using data mining techniques: a review. Int J Comput Appl 98(21):13–19

    Google Scholar 

  64. You H, Rumbe G (2012) Comparative study of classification techniques on breast cancer FNA biopsy data. Int J Interact Multimed Artif Intell 1(3):5

    Google Scholar 

  65. Li L, Tang H, Wu Z, Gong J, Gruidl M, Zou J et al (2004) Data mining techniques for cancer detection using serum proteomic profiling. Artif Intell Med 32(2):71–83

    Google Scholar 

  66. Gorunescu F (2007) Data mining techniques in computer-aided diagnosis: non-invasive cancer detection. World Acad Sci Eng Technol 34:280–283

    Google Scholar 

  67. Ahmed K, Jesmin T, Rahman MZ (2013) Early prevention and detection of skin cancer risk using data mining. Int J Comput Appl 62(4):1–6

    Google Scholar 

  68. El-Sebakhy EA, Faisal KA, Helmy T, Azzedin F, Al-Suhaim A (2006) Evaluation of breast cancer tumor classification with unconstrained functional networks classifier. IEEE international conference on computer systems and applications, Dubai, UAE, pp 281–287

  69. Sharma N, Om H (2012) Framework for early detection and prevention of oral cancer using data mining. Int J Adv Eng Technol 4(2):2231–1963

    Google Scholar 

  70. Messadi M, Ammar M, Cherifi H, Chikh MA, Bessaid A (2014) Interpretable aide diagnosis system for melanoma recognition. J Bioeng Biomed Sci 4(1):1

    Google Scholar 

  71. Ayyadurai P, Kiruthiga P, Amritha S (2013) Respiratory cancerous cells detection using TRISS model and association rule mining. Int J Adv Res Comput Eng Technol 2(3):1030–1035

    Google Scholar 

  72. Kalaiarasai A, Amanulla KM (2015) Unconscious oral cancer detection using data mining classification approaches. Int J Adv Res Comput Eng Technol 4(7):3177–3184

    Google Scholar 

  73. Radhakrishnan S, Priyaa S (2015) A critical study on data mining techniques in health-care dataset. Int Res J Eng Technol 2(5):157–166

    Google Scholar 

  74. Nithya R, Santhi B (2014) A data mining techniques for diagnosis of breast cancer disease. World Appl Sci J 29:18–23

    Google Scholar 

  75. Rajesh K, Anand S (2012) Analysis of SEER dataset for breast cancer diagnosis using C4.5 classification algorithm. Int J Adv Res Comput Commun Eng 1(2):72–77

    Google Scholar 

  76. Karsan A, Eigl BJ, Flibotte S, Gelmon K, Switzer P, Hassell P et al (2005) Analytical and preanalytical biases in serum proteomic pattern analysis for breast cancer diagnosis. Clin Chem 51(8):1525–1528

    Google Scholar 

  77. Glotsos D, Tohka J, Ravazoula P, Cavouras D, Nikiforidis G (2005) Automated diagnosis of brain tumours astrocytomas using probabilistic neural network clustering and support vector machines. Int J Neural Syst 15(01n02):1–11

    Google Scholar 

  78. Salama G, Abdhelhalim MB, Zeid M (2012) Breast cancer diagnosis on three different datasets using multi-classifiers. Int J Comput Inf Technol 1(1):36–43

    Google Scholar 

  79. Kuo WJ, Chang RF, Chen DR, Lee CC (2001) Data mining with decision trees for diagnosis of breast tumor in medical ultrasonic images. Breast Cancer Res Treat 66(1):51–57

    Google Scholar 

  80. Krishnaiah V, Narsimha G, Subhash N (2013) Diagnosis of lung cancer prediction system using data mining classification techniques. Int J Comput Sci Inf Technol (IJCSIT) 4(1):39–45

    Google Scholar 

  81. Rajan JR, Chelvan CC (2013) A survey on mining techniques for early lung cancer diagnoses. In: Proceedings of the 2013 international conference on green computing, communication and conservation of energy, ICGCE 2013. IEEE Computer Society, pp 918–922

  82. Zubi ZS, Saad RA (2014) Improves treatment programs of lung cancer using data mining techniques. J Softw Eng Appl 7(2):69–77

    Google Scholar 

  83. Chou SM, Lee TS, Shao YE, Chen IF (2004) Mining the breast cancer pattern using artificial neural networks and multivariate adaptive regression splines. Expert Syst Appl 27(1):133–142

    Google Scholar 

  84. Lakshmi KR (2013) Utilization of data mining techniques for prediction and diagnosis of tuberculosis disease survivability. Int J Mod Educ Comput Sci 5(8):8–17

    Google Scholar 

  85. Zubi ZS, Saad RA (2011) Using some data mining techniques for early diagnosis of Lung cancer. In: Proceedings of the 10th WSEAS international conference on artificial intelligence knowledge engineering and data bases, pp 32–37

  86. Shrivastava SS, Sant A, Aharwal RP (2013) An overview on data mining approach on breast cancer data. Int J Adv Comput Res 3:256–262

    Google Scholar 

  87. Delen D (2009) Analysis of cancer data: a data mining approach. Expert Syst 26(1):100–112

    Google Scholar 

  88. Pendharkar PC, Rodger JA, Yaverbaum GJ, Herman N, Benner M (1999) Association, statistical, mathematical and neural approaches for mining breast cancer patterns. Expert Syst Appl 17:223–232

    Google Scholar 

  89. Xu L, Jackowski M, Goshtasby A, Roseman D, Bines S, Yu C et al (2002) Segmentation of skin cancer images. Image Vis Comput 17(1):65–74

    Google Scholar 

  90. Markey MK, Lo JY, Tourassi GD, Floyd CE (2003) Self-organizing map for cluster analysis of a breast cancer database. Artif Intell Med 27(2):113–127

    Google Scholar 

  91. Jerez-Aragonés JM, Gómez-Ruiz JA, Ramos-Jiménez G, Muñoz-Pérez J, Alba-Conejo E (2003) A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif Intell Med 27(1):45–63

    Google Scholar 

  92. Joshi J, Doshi R, Patel J (2014) Diagnosis and prognosis breast cancer using classification rules. Int J Eng Res Gen Sci 2(6):315–323

    Google Scholar 

  93. Khan AA, Ahmed S (2015) Comparative analysis of data mining tools for lungs cancer patients. J Inf Commun Technol 9(1):33–40

    MathSciNet  Google Scholar 

  94. Frank E, Hall MA, Witten IH (2016) The WEKA workbench. In: Online appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th edn. Morgan Kaufmann

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kartick Chandra Mondal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nath, A.S., Pal, A., Mukhopadhyay, S. et al. A survey on cancer prediction and detection with data analysis. Innovations Syst Softw Eng 16, 231–243 (2020). https://doi.org/10.1007/s11334-019-00350-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11334-019-00350-6

Keywords

Navigation