Skip to main content

Advertisement

Log in

A mapping study of ensemble classification methods in lung cancer decision support systems

  • Review Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Achieving a high level of classification accuracy in medical datasets is a capital need for researchers to provide effective decision systems to assist doctors in work. In many domains of artificial intelligence, ensemble classification methods are able to improve the performance of single classifiers. This paper reports the state of the art of ensemble classification methods in lung cancer detection. We have performed a systematic mapping study to identify the most interesting papers concerning this topic. A total of 65 papers published between 2000 and 2018 were selected after an automatic search in four digital libraries and a careful selection process. As a result, it was observed that diagnosis was the task most commonly studied; homogeneous ensembles and decision trees were the most frequently adopted for constructing ensembles; and the majority voting rule was the predominant combination rule. Few studies considered the parameter tuning of the techniques used. These findings open several perspectives for researchers to enhance lung cancer research by addressing the identified gaps, such as investigating different classification methods, proposing other heterogeneous ensemble methods, and using new combination rules.

Main features of the mapping study performed in ensemble classification methods applied on lung cancer decision support systems

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Adetiba E, Olugbara OO (2015) Lung Cancer prediction using neural network ensemble with histogram of oriented gradient genomic features. Sci World J 2015:1–17. https://doi.org/10.1155/2015/786013

    Article  CAS  Google Scholar 

  2. Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary A (2012) Lung cancer survival prediction using ensemble data mining on SEER data. Sci Program 20:9–16. https://doi.org/10.3233/SPR-2012-0335

    Article  Google Scholar 

  3. Alexandropoulos S-AN, Kotsiantis SB, Vrahatis MN (2019) Data preprocessing in predictive data mining. Knowl Eng Rev 34:e1. doi: https://doi.org/10.1017/S026988891800036X

  4. Arshadi N, Jurisica I (2005) Data mining for case-based reasoning in high-dimensional biological domains. IEEE Trans Knowl Data Eng 17:1127–1137. https://doi.org/10.1109/TKDE.2005.124

    Article  Google Scholar 

  5. Aslandogan YA, Mahajani GA, Taylor S (2004) Evidence combination in medical data mining. In: Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’04). pp 2–6

  6. Balachandran K, Anitha R (2013) Ensemble based optimal classification model for pre-diagnosis of lung cancer. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT). IEEE, pp 1–7

  7. Bauer M (2015) Health outcome prediction with multiple models and Dempster-Shafer theory. In: International Conference on Computational Science and Computational Intelligence Health

  8. Bayer I, Groth P, Schneckener S (2013) Prediction errors in learning drug response from gene expression data - influence of labeling, sample size, and machine learning algorithm. PLoS One 8:e70294. https://doi.org/10.1371/journal.pone.0070294

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Breiman L (1996) Bagging predictors. Mach Learn 26:123–140. https://doi.org/10.1023/A:1018054314350

    Article  Google Scholar 

  10. Budnik M, Krawczyk B (2013) On optimal settings of classification tree ensembles for medical decision support. Health Informatics J 19:3–15. https://doi.org/10.1177/1460458212446096

    Article  PubMed  Google Scholar 

  11. Chandra A, Yao X (2006) Ensemble learning using multi-objective evolutionary algorithms. J Math Model Algorithms 5:417–445. https://doi.org/10.1007/s10852-005-9020-3

    Article  Google Scholar 

  12. Chen X, Harrison R, Zhang Y-Q (2005) Fuzzy support vector machines for biomedical data analysis. In: 2005 IEEE International Conference on granular computing. IEEE, pp 131–134, vol. 1

  13. Chen Z, Xu W, Yang Y, Yan J, Chen Q (2016) Study on the infectious regularity of patients with advanced lung cancer. In: 2016 8th International Conference on Information Technology in Medicine and Education (ITME). IEEE, pp 299–301

  14. Dela Cruz CS, Tanoue LT, Matthay RA (2011) Lung Cancer: epidemiology, etiology, and prevention. Clin Chest Med 32:605–644. https://doi.org/10.1016/j.ccm.2011.09.001

    Article  Google Scholar 

  15. Das SK, Chen S, Deasy JO, Zhou S, Yin F-F, Marks LB (2008) Decision fusion of machine learning models to predict radiotherapy-induced lung pneumonitis. In: 2008 Seventh International Conference on Machine Learning and Applications. IEEE, pp 545–550

  16. Dettling M (2004) BagBoosting for tumor classification with gene expression data. Bioinformatics 20:3583–3593. https://doi.org/10.1093/bioinformatics/bth447

    Article  CAS  PubMed  Google Scholar 

  17. Dhakate PP, Rajeswari K, Abin D (2015) An ensemble approach for cancerious dataset analysis using feature selection. In: 2015 Global Conference on Communication Technologies (GCCT). IEEE, pp 479–482

  18. Dragomir A, Maraziotis I, Bezerianos A (2006) An ensemble approach for phenotype classification based on fuzzy partitioning of gene expression data. In: Annual International Conference of the IEEE Engineering in Medicine and Biology - Proceedings. IEEE, pp 5834–5837

  19. Du G, Su F, Cai A (2009) Face recognition using SURF features. In: Proceedings Volume 7496, MIPPR 2009: Pattern Recognition and Computer Vision; 8:749628. doi: https://doi.org/10.1117/12.832636

  20. Esfandiari N, Babavalian MR, Moghadam AME, Tabar VK (2014) Knowledge discovery in medicine: current issue and future trend. Expert Syst Appl 41:4434–4463

    Article  Google Scholar 

  21. Gasperskaja E, Kučinskas V (2017) The most common technologies and tools for functional genome analysis. Acta medica Litu 24:1–11. https://doi.org/10.6001/actamedica.v24i1.3457

    Article  Google Scholar 

  22. Ghorai S, Mukherjee A, Sengupta S, Dutta PK (2011) Cancer classification from gene expression data by NPPC ensemble. IEEE/ACM Trans Comput Biol Bioinforma 8:659–671. https://doi.org/10.1109/TCBB.2010.36

    Article  Google Scholar 

  23. Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12:993–1001. https://doi.org/10.1109/34.58871

    Article  Google Scholar 

  24. Hastie T, Friedman J, Tibshirani R (2009) The elements of statistical learning: data mining, Inference and Prediction, second edi. Springer, New York

  25. Hengpraprohm S, Chongstitvatana P (2008) A genetic programming ensemble approach to cancer microarray data classification. In: 2008 3rd International Conference on Innovative Computing Information and Control. IEEE, pp 340–340

  26. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844. https://doi.org/10.1109/34.709601

    Article  Google Scholar 

  27. Hong Hu, Jiu-Yong Li, Hua Wang, Grant Daggard, Li-Zhen Wang (2008) Robustness analysis of diversified ensemble decision tree algorithms for Microarray data classification. In: 2008 International Conference on Machine Learning and Cybernetics. IEEE, pp 115–120

  28. Hosni M, Idri A (2018) Software development effort estimation using feature selection techniques. In: New trends in intelligent software methodologies, tools and techniques

    Google Scholar 

  29. Hosni M, Idri A, Abran A Evaluating filter fuzzy analogy homogenous ensembles for software development effort estimation. doi: https://doi.org/10.1002/smr.2117

  30. Hosni M, Idri A, Abran A (2017) Investigating heterogeneous ensembles with filter feature selection for software effort estimation. In: Proceedings of the 27th International workshop on software measurement and 12th International Conference on Software Process and Product Measurement. ACM, New York, NY, USA, pp 207–220

  31. Hosni M, Idri A, Abran A, Nassif AB (2017) On the value of parameter tuning in heterogeneous ensembles effort estimation. Soft Comput:1–34

  32. Hu H (2010) Mining patterns in disease classification forests. J Biomed Inform 43:820–827. https://doi.org/10.1016/j.jbi.2010.06.004

    Article  PubMed  Google Scholar 

  33. Huang H, Hu G, Zhu L (2010) Ensemble of support vector machines for heartbeat classification. In: IEEE 10th International Conference on Signal Processing Proceedings. IEEE, pp 1327–1330

  34. IASLC (2019) IASLC. In: Int. Assoc. Study Lung Cancer

  35. Idri A, Chlioui I, El Ouassif B (2018) A systematic map of data analytics in breast cancer. In: Australasian Computer ScienceWeek 2018

  36. Idri A, Hosni M, Abnane I (2019) Impact of parameter tuning on machine learning based breast cancer classification. Springer, Cham, pp 115–125

    Google Scholar 

  37. Idri A, Hosni M, Abran A (2016) Improved estimation of software development effort using classical and fuzzy analogy ensembles. Appl Soft Comput 49:990–1019. https://doi.org/10.1016/j.asoc.2016.08.012

    Article  Google Scholar 

  38. Idri A, Hosni M, Abran A (2016) Systematic mapping study of ensemble effort estimation. In: Proceedings of the 11th International Conference on evaluation of novel software approaches to software engineering. pp 132–139

  39. Idri A, Hosni M, Abran A (2016) Systematic literature review of ensemble effort estimation. J Syst Softw 118:151–175. https://doi.org/10.1016/j.jss.2016.05.016

    Article  Google Scholar 

  40. Ilhan HO, Celik E (2016) The mesothelioma disease diagnosis with artificial intelligence methods. In: 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT). IEEE, pp 1–5

  41. Jha SK, Pan Z, Elahi E, Patel N (2018) A comprehensive search for expert classification methods in disease diagnosis and prediction. Expert Syst 36:e12343. https://doi.org/10.1111/exsy.12343

    Article  Google Scholar 

  42. Kadi I, Idri A, Fernandez-Aleman JL (2017) Systematic mapping study of data mining–based empirical studies in cardiology. Health Informatics J 25:770. https://doi.org/10.1177/1460458217717636

    Article  Google Scholar 

  43. Kadi I, Idria A (2016) Knowledge discovery in cardiology: a systematic literature review. Int J Med Inform 97:12–32. https://doi.org/10.1016/j.ijmedinf.2016.09.005

    Article  PubMed  Google Scholar 

  44. Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. Engineering 2:1051. https://doi.org/10.1145/1134285.1134500

    Article  Google Scholar 

  45. Klassen M (2010) Learning microarray cancer datasets by random forests and support vector machines. 2010 5th Int Conf Futur Inf Technol Futur 2010 - Proc. doi: https://doi.org/10.1109/FUTURETECH.2010.5482716

  46. Kocaguneli E, Menzies T, Keung JW (2012) On the value of ensemble effort estimation. IEEE Trans Softw Eng 38:1403–1416. https://doi.org/10.1109/TSE.2011.111

    Article  Google Scholar 

  47. Kouzani AZ, Lee SLA, Hu EJ (2008) Lung nodules detection by ensemble classification. In: 2008 IEEE International Conference on Systems, Man and Cybernetics. IEEE, pp 324–329

  48. Krawczyk B, Schaefer G (2012) Ensemble fusion methods for medical data classification. In: 11th Symposium on Neural Network Applications in Electrical Engineering. IEEE, pp 143–146

  49. Kumar A, Sarkar BK (2018) A hybrid predictive model integrating C4.5 and decision table classifiers for medical data sets. J Inf Technol Res 11:150–167. https://doi.org/10.4018/JITR.2018040109

    Article  Google Scholar 

  50. Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 2003:181–207. https://doi.org/10.1023/A:1022859003006

    Article  Google Scholar 

  51. Lavanya D (2012) Ensemble decision tree classifier for breast Cancer data. Int J Inf Technol Converg Serv 2:17–24. https://doi.org/10.5121/ijitcs.2012.2103

    Article  Google Scholar 

  52. Li J, Zhao Z, Liu Y, Cheng Z (2018) A comparative study on machine classification model in lung cancer cases analysis. In: Hung JC, Yen NY, Hui L (eds) Frontier computing. Springer, Singapore

    Google Scholar 

  53. Liu B, Cui Q, Jiang T, Ma S (2004) A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinformatics 5:1–12. https://doi.org/10.1186/1471-2105-5-136

    Article  CAS  Google Scholar 

  54. Liu Z, Tang D, Cai Y, Wang R, Chen F (2017) A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data. Neurocomputing 266:641–650. https://doi.org/10.1016/j.neucom.2017.05.066

    Article  Google Scholar 

  55. Lynch CM, Abdollahi B, Fuqua JD, de Carlo AR, Bartholomai JA, Balgemann RN, van Berkel VH, Frieboes HB (2017) Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int J Med Inform 108:1–8. https://doi.org/10.1016/j.ijmedinf.2017.09.013

    Article  PubMed  PubMed Central  Google Scholar 

  56. Macias JA, Sierra A, Corbacho F, Informatica ETS De (2000) Evolving and assembling functional link networks. In: Proceedings of the 2000 Congress on Evolutionary Computation

  57. Mathan K, Kumar PM, Panchatcharam P, Manogaran G, Varadharajan R (2018) A novel Gini index decision tree data mining method with neural network classifiers for prediction of heart disease. Des Autom Embed Syst 22:225–242. https://doi.org/10.1007/s10617-018-9205-4

    Article  Google Scholar 

  58. Meesri S, Phimoltares S (2017) Diagnosis of heart disease using a mixed classifier. 21st Int Comput Sci Eng Conf 6:1–5

    Google Scholar 

  59. Mei Ming Kuan, Chee Peng Lim, Morad N, Harrison RF (2000) An experimental study of original and ordered fuzzy ARTMAP neural networks in pattern classification tasks. In: 2000 TENCON Proceedings. Intelligent Systems and Technologies for the New Millennium (Cat. No. 00CH37119). IEEE, pp 392–397

  60. Mei X (2017) Predicting five-year overall survival in patients with non-small cell lung cancer by relief algorithm and random forests. In: 2017 IEEE 2nd advanced information technology, Electronic and Automation Control Conference (IAEAC). IEEE, pp 2527–2530

  61. Ochs RA, Goldin JG, Abtin F, Kim HJ, Brown K, Batra P, Roback D, McNitt-Gray MF, Brown MS (2007) Automated classification of lung bronchovascular anatomy in CT using AdaBoost. Med Image Anal 11:315–324. https://doi.org/10.1016/j.media.2007.03.004

    Article  PubMed  PubMed Central  Google Scholar 

  62. Oh JH, Al-Lozi R, El Naqa I (2009) Application of machine learning techniques for prediction of radiation pneumonitis in lung cancer patients. In: 2009 International Conference on Machine Learning and Applications. IEEE, pp 478–483

  63. Paing MP, Choomchuay S (2018) Improved random forest (RF) classifier for imbalanced classification of lung nodules. In: 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST). IEEE, pp 1–4

  64. Paul TK, Iba H (2009) Prediction of cancer class with majority voting genetic programming classifier using gene expression data. IEEE/ACM Trans Comput Biol Bioinforma 6:353–367. https://doi.org/10.1109/TCBB.2007.70245

    Article  CAS  Google Scholar 

  65. Petersen K, Feldt R, Mujtaba S, Mattsson M (2008) Systematic mapping studies in software engineering. In: 12Th International Conference on Evaluation and Assessment in Software Engineering. p 10

  66. Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: an update. Inf Softw Technol 64:1–18. https://doi.org/10.1016/j.infsof.2015.03.007

    Article  Google Scholar 

  67. Ruta D, Gabrys B (2000) An overview of classifier fusion methods. Comput Inf Syst 7:1–10

    Google Scholar 

  68. Safiyari A, Javidan R (2017) Predicting lung cancer survivability using ensemble learning methods. In: 2017 Intelligent Systems Conference (IntelliSys). IEEE, pp 684–688

  69. Schapire RE (1990) The strength of weak ties. J Mach Learn 1:197–227. https://doi.org/10.1023/A:1022648800760

    Article  Google Scholar 

  70. Schapire RE (1999) A brief introduction to boosting. Proc Sixt Int Jt Conf Artif Intell. doi: citeulike-article-id:765005

  71. Sehgal MSB, Gondal I, Dooley L (2005) Stacked regression ensemble for cancer class prediction. In: 2005 3rd IEEE International Conference on Industrial Informatics, INDIN. IEEE, pp 831–835

  72. Seni G, Elder JF (2010) Ensemble methods in data mining: improving accuracy through combining predictions

  73. Smith E, Stein P, Furst J, Raicu DS (2013) Weak segmentations and ensemble learning to predict semantic ratings of lung nodules. In: 2013 12th International Conference on Machine Learning and Applications. IEEE, pp 519–524

  74. Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinforma 2:1–10. https://doi.org/10.1186/1471-2105-9-275

    Article  CAS  Google Scholar 

  75. Tartar A, Akan A, Kilic N (2014) A novel approach to malignant-benign classification of pulmonary nodules by using ensemble learning classifiers. In: 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, pp 4651–4654

  76. Tike Thein HT, Mo Tun KM (2015) An approach for breast cancer diagnosis classification using neural network. Adv Comput An Int J 6:1–11. https://doi.org/10.5121/acij.2015.6101

    Article  Google Scholar 

  77. Valdes G, Solberg TD, Heskel M, Ungar L, Simone CB (2016) Using machine learning to predict radiation pneumonitis in patients with stage I non-small cell lung cancer treated with stereotactic body radiation therapy. Phys Med Biol 61:6105–6120. https://doi.org/10.1088/0031-9155/61/16/6105

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Vapnik VN (1998) Statistical learning theory. John Wiley & Sons, Inc

  79. Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10:988–999. https://doi.org/10.1109/72.788640

    Article  CAS  PubMed  Google Scholar 

  80. Wen J, Li S, Lin Z, Hu Y, Huang C (2012) Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 54:41–59. https://doi.org/10.1016/j.infsof.2011.09.002

    Article  Google Scholar 

  81. WHO (2019) WHO

  82. Wu W, Parmar C, Grossmann P, Quackenbush J, Lambin P, Bussink J, Mak R, Aerts HJWL (2016) Exploratory study to identify radiomics classifiers for lung cancer histology. Front Oncol 6:1–11. https://doi.org/10.3389/fonc.2016.00071

    Article  Google Scholar 

  83. Yang X, Yuan B, Liu W (2009) Dynamic weighting ensembles for incremental learning. In: Proceedings of the 2009 Chinese Conference on Pattern Recognition, CCPR 2009, and the 1st CJK Joint Workshop on Pattern Recognition, CJKPR pp 98–102

  84. Yun J, Zhanhuai L, Yong W, Longbo Z (2006) A better classifier based on rough set and neural network for medical images. In: Sixth IEEE Int Conf Data Min - Work 853–857. doi: https://doi.org/10.1109/ICDMW.2006.1

  85. Zeng XD, Chao S, Wong F (2011) Ensemble learning on heartbeat type classification. In: Proceedings 2011 International Conference on System Science and Engineering. IEEE, pp 320–325

  86. Zhou Z-H (2012) Ensemble methods. CRC Press

  87. Zhou Z-H (2012) Ensemble methods : foundations and algorithms. CRC Press

  88. Zinovev D, Furst J, Raicu D (2011) Building an ensemble of probabilistic classifiers for lung nodule interpretation. In: 2011 10th International Conference on Machine Learning and Applications and Workshops. IEEE, pp 155–161

Download references

Funding

The work of the first author is partially supported by the European Commission trough the Erasmus+ International Mobility Program (KA107). This work was partly supported by the Spanish MICINN, as well as the European Commission FEDER funds, under grants RTI2018-098156-B-C53 and RTI2018-098309-B-C33.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. The literature search and data analysis were performed by Mohamed Hosni, Ali Idri, and José Luis Fernández-Alemán. The first draft of the manuscript was written by Mohamed Hosni, Ginés García-Mateos, and Juan M. Carrillo-de-Gea. All authors commented on previous versions of the manuscript, and all authors read and approved the final manuscript.

Corresponding author

Correspondence to Ginés García-Mateos.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hosni, M., García-Mateos, G., Carrillo-de-Gea, J.M. et al. A mapping study of ensemble classification methods in lung cancer decision support systems. Med Biol Eng Comput 58, 2177–2193 (2020). https://doi.org/10.1007/s11517-020-02223-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-020-02223-8

Keywords

Navigation